TCP Congestion Control
TCP header Structure:
1. source port
2. destination port
3. sequence number
4. ack number
5. header length
6. flags (SYN, ACK, FIN, PUSH, RST, URG)
7. advertised window size
8. checksum
9. urgent pointer
10. options
11. padding
TCP segment size is limited by the MTU size.
TCP windows size is the total number of octets that can be transmitted before an ACK is received. If the TCP windows size is big enough, multiple segments can be transmitted.
TCP contains four congestion control algorithms:
1. Slow Start
Old TCPs would start a connection with the sender injecting multiple segments into the network, up to the window size advertised by the receiver. This is problematic If there are routers and slower links between the sender and the receiver, because intermediate routers may need to queue the packets, and may run out of space.
The algorithm to avoid this is called slow start. It adds another window to the sender's TCP: the congestion window, called "cwnd". When a new connection is established with a remote host, the congestion window is initialized to the size of one segment. Each time an ACK is received, the congestion window is doubled.
The sender can transmit up to the minimum of the congestion window and the advertised window. The congestion window is flow control imposed by the sender, while the advertised window is flow control imposed by the receiver. The former is based on the sender's assessment of perceived network congestion; the latter is related to the amount of available buffer space at the receiver for this connection.
2. Congestion Avoidance
The loss of a packet typically signals congestion on the network between the source and destination. There are two indications of packet loss: a timeout occurring and the receipt of duplicate ACKs.
Congestion avoidance and slow start are independent algorithms with different objectives. But when congestion occurs TCP must slow down its transmission rate of packets into the network, and then invoke slow start to get things going again. In practice they are implemented together. Slow increases the window size exponentially, and congestion avoidance increases the window size linearly.
Congestion avoidance and slow start require that two variables be maintained for each connection: a congestion window, cwnd, and a slow start threshold size, ssthresh. The combined algorithm operates as follows:
1. Initialization for a given connection sets cwnd to one segment and ssthresh to 65535 bytes.
2. Sender never sends out more than the minimum of cwnd and the receiver's advertised window.
3. When congestion occurs (indicated by a timeout or the reception of 3 or more duplicate ACKs), one-half of the current window size (the minimum of cwnd and the receiver's advertised window, but at least two segments) is saved in ssthresh. Additionally, if the congestion is indicated by a timeout, cwnd is set to one segment (i.e., slow start).
4. When new data is acknowledged by the other end, increase cwnd, but the way it increases depends on whether TCP is performing slow start or congestion avoidance. If cwnd is less than or equal to ssthresh, TCP is in slow start; otherwise TCP is performing congestion avoidance. Slow start continues until TCP is halfway to where it was when congestion occurred (since it recorded half of the window size that caused the problem in step 3), and then congestion avoidance takes over.
3. Fast Retransmit
A duplicate ACK can be caused by a lost segment or just a reordering of segments. TCP assumes that if there is just a reordering of the segments, there will be only one or two duplicate ACKs before the reordered segment is processed, which will then generate a new ACK. If three or more duplicate ACKs are received in a row, it is a strong indication that a segment has been lost. TCP then performs a retransmission of what appears to be the missing segment, without waiting for a retransmission timer to expire.
4. Fast Recovery
After fast retransmit sends what appears to be the missing segment, congestion avoidance, but not slow start is performed. This is the fast recovery algorithm. It is an improvement that allows high throughput under moderate congestion, especially for large windows.
The reason for not performing slow start is that the receiver can only generate the duplicate ACK when another segment is received, which means there is still data flowing between the two ends, and TCP does not want to reduce the flow abruptly by going into slow start.
More information can be found at:
RFC 2001
RFC 2581
RFC 1323