|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Network Working Group J. Hadi Salim |
|
|
Request for Comments: 2884 Nortel Networks |
|
|
Category: Informational U. Ahmed |
|
|
Carleton University |
|
|
July 2000 |
|
|
|
|
|
|
|
|
Performance Evaluation of Explicit Congestion Notification (ECN) |
|
|
in IP Networks |
|
|
|
|
|
Status of this Memo |
|
|
|
|
|
This memo provides information for the Internet community. It does |
|
|
not specify an Internet standard of any kind. Distribution of this |
|
|
memo is unlimited. |
|
|
|
|
|
Copyright Notice |
|
|
|
|
|
Copyright (C) The Internet Society (2000). All Rights Reserved. |
|
|
|
|
|
Abstract |
|
|
|
|
|
This memo presents a performance study of the Explicit Congestion |
|
|
Notification (ECN) mechanism in the TCP/IP protocol using our |
|
|
implementation on the Linux Operating System. ECN is an end-to-end |
|
|
congestion avoidance mechanism proposed by [6] and incorporated into |
|
|
RFC 2481[7]. We study the behavior of ECN for both bulk and |
|
|
transactional transfers. Our experiments show that there is |
|
|
improvement in throughput over NON ECN (TCP employing any of Reno, |
|
|
SACK/FACK or NewReno congestion control) in the case of bulk |
|
|
transfers and substantial improvement for transactional transfers. |
|
|
|
|
|
A more complete pdf version of this document is available at: |
|
|
http://www7.nortel.com:8080/CTL/ecnperf.pdf |
|
|
|
|
|
This memo in its current revision is missing a lot of the visual |
|
|
representations and experimental results found in the pdf version. |
|
|
|
|
|
1. Introduction |
|
|
|
|
|
In current IP networks, congestion management is left to the |
|
|
protocols running on top of IP. An IP router when congested simply |
|
|
drops packets. TCP is the dominant transport protocol today [26]. |
|
|
TCP infers that there is congestion in the network by detecting |
|
|
packet drops (RFC 2581). Congestion control algorithms [11] [15] [21] |
|
|
are then invoked to alleviate congestion. TCP initially sends at a |
|
|
higher rate (slow start) until it detects a packet loss. A packet |
|
|
loss is inferred by the receipt of 3 duplicate ACKs or detected by a |
|
|
|
|
|
|
|
|
|
|
|
Salim & Ahmed Informational [Page 1] |
|
|
|
|
|
RFC 2884 ECN in IP Networks July 2000 |
|
|
|
|
|
|
|
|
timeout. The sending TCP then moves into a congestion avoidance state |
|
|
where it carefully probes the network by sending at a slower rate |
|
|
(which goes up until another packet loss is detected). Traditionally |
|
|
a router reacts to congestion by dropping a packet in the absence of |
|
|
buffer space. This is referred to as Tail Drop. This method has a |
|
|
number of drawbacks (outlined in Section 2). These drawbacks coupled |
|
|
with the limitations of end-to-end congestion control have led to |
|
|
interest in introducing smarter congestion control mechanisms in |
|
|
routers. One such mechanism is Random Early Detection (RED) [9] |
|
|
which detects incipient congestion and implicitly signals the |
|
|
oversubscribing flow to slow down by dropping its packets. A RED- |
|
|
enabled router detects congestion before the buffer overflows, based |
|
|
on a running average queue size, and drops packets probabilistically |
|
|
before the queue actually fills up. The probability of dropping a new |
|
|
arriving packet increases as the average queue size increases above a |
|
|
low water mark minth, towards higher water mark maxth. When the |
|
|
average queue size exceeds maxth all arriving packets are dropped. |
|
|
|
|
|
An extension to RED is to mark the IP header instead of dropping |
|
|
packets (when the average queue size is between minth and maxth; |
|
|
above maxth arriving packets are dropped as before). Cooperating end |
|
|
systems would then use this as a signal that the network is congested |
|
|
and slow down. This is known as Explicit Congestion Notification |
|
|
(ECN). In this paper we study an ECN implementation on Linux for |
|
|
both the router and the end systems in a live network. The memo is |
|
|
organized as follows. In Section 2 we give an overview of queue |
|
|
management in routers. Section 3 gives an overview of ECN and the |
|
|
changes required at the router and the end hosts to support ECN. |
|
|
Section 4 defines the experimental testbed and the terminologies used |
|
|
throughout this memo. Section 5 introduces the experiments that are |
|
|
carried out, outlines the results and presents an analysis of the |
|
|
results obtained. Section 6 concludes the paper. |
|
|
|
|
|
2. Queue Management in routers |
|
|
|
|
|
TCP's congestion control and avoidance algorithms are necessary and |
|
|
powerful but are not enough to provide good service in all |
|
|
circumstances since they treat the network as a black box. Some sort |
|
|
of control is required from the routers to complement the end system |
|
|
congestion control mechanisms. More detailed analysis is contained in |
|
|
[19]. Queue management algorithms traditionally manage the length of |
|
|
packet queues in the router by dropping packets only when the buffer |
|
|
overflows. A maximum length for each queue is configured. The router |
|
|
will accept packets till this maximum size is exceeded, at which |
|
|
point it will drop incoming packets. New packets are accepted when |
|
|
buffer space allows. This technique is known as Tail Drop. This |
|
|
method has served the Internet well for years, but has the several |
|
|
drawbacks. Since all arriving packets (from all flows) are dropped |
|
|
|
|
|
|
|
|
|
|
|
Salim & Ahmed Informational [Page 2] |
|
|
|
|
|
RFC 2884 ECN in IP Networks July 2000 |
|
|
|
|
|
|
|
|
when the buffer overflows, this interacts badly with the congestion |
|
|
control mechanism of TCP. A cycle is formed with a burst of drops |
|
|
after the maximum queue size is exceeded, followed by a period of |
|
|
underutilization at the router as end systems back off. End systems |
|
|
then increase their windows simultaneously up to a point where a |
|
|
burst of drops happens again. This phenomenon is called Global |
|
|
Synchronization. It leads to poor link utilization and lower overall |
|
|
throughput [19] Another problem with Tail Drop is that a single |
|
|
connection or a few flows could monopolize the queue space, in some |
|
|
circumstances. This results in a lock out phenomenon leading to |
|
|
synchronization or other timing effects [19]. Lastly, one of the |
|
|
major drawbacks of Tail Drop is that queues remain full for long |
|
|
periods of time. One of the major goals of queue management is to |
|
|
reduce the steady state queue size[19]. Other queue management |
|
|
techniques include random drop on full and drop front on full [13]. |
|
|
|
|
|
2.1. Active Queue Management |
|
|
|
|
|
Active queue management mechanisms detect congestion before the queue |
|
|
overflows and provide an indication of this congestion to the end |
|
|
nodes [7]. With this approach TCP does not have to rely only on |
|
|
buffer overflow as the indication of congestion since notification |
|
|
happens before serious congestion occurs. One such active management |
|
|
technique is RED. |
|
|
|
|
|
2.1.1. Random Early Detection |
|
|
|
|
|
Random Early Detection (RED) [9] is a congestion avoidance mechanism |
|
|
implemented in routers which works on the basis of active queue |
|
|
management. RED addresses the shortcomings of Tail Drop. A RED |
|
|
router signals incipient congestion to TCP by dropping packets |
|
|
probabilistically before the queue runs out of buffer space. This |
|
|
drop probability is dependent on a running average queue size to |
|
|
avoid any bias against bursty traffic. A RED router randomly drops |
|
|
arriving packets, with the result that the probability of dropping a |
|
|
packet belonging to a particular flow is approximately proportional |
|
|
to the flow's share of bandwidth. Thus, if the sender is using |
|
|
relatively more bandwidth it gets penalized by having more of its |
|
|
packets dropped. RED operates by maintaining two levels of |
|
|
thresholds minimum (minth) and maximum (maxth). It drops a packet |
|
|
probabilistically if and only if the average queue size lies between |
|
|
the minth and maxth thresholds. If the average queue size is above |
|
|
the maximum threshold, the arriving packet is always dropped. When |
|
|
the average queue size is between the minimum and the maximum |
|
|
threshold, each arriving packet is dropped with probability pa, where |
|
|
pa is a function of the average queue size. As the average queue |
|
|
length varies between minth and maxth, pa increases linearly towards |
|
|
a configured maximum drop probability, maxp. Beyond maxth, the drop |
|
|
|
|
|
|
|
|
|
|
|
Salim & Ahmed Informational [Page 3] |
|
|
|
|
|
RFC 2884 ECN in IP Networks July 2000 |
|
|
|
|
|
|
|
|
probability is 100%. Dropping packets in this way ensures that when |
|
|
some subset of the source TCP packets get dropped and they invoke |
|
|
congestion avoidance algorithms that will ease the congestion at the |
|
|
gateway. Since the dropping is distributed across flows, the problem |
|
|
of global synchronization is avoided. |
|
|
|
|
|
3. Explicit Congestion Notification |
|
|
|
|
|
Explicit Congestion Notification is an extension proposed to RED |
|
|
which marks a packet instead of dropping it when the average queue |
|
|
size is between minth and maxth [7]. Since ECN marks packets before |
|
|
congestion actually occurs, this is useful for protocols like TCP |
|
|
that are sensitive to even a single packet loss. Upon receipt of a |
|
|
congestion marked packet, the TCP receiver informs the sender (in the |
|
|
subsequent ACK) about incipient congestion which will in turn trigger |
|
|
the congestion avoidance algorithm at the sender. ECN requires |
|
|
support from both the router as well as the end hosts, i.e. the end |
|
|
hosts TCP stack needs to be modified. Packets from flows that are not |
|
|
ECN capable will continue to be dropped by RED (as was the case |
|
|
before ECN). |
|
|
|
|
|
3.1. Changes at the router |
|
|
|
|
|
Router side support for ECN can be added by modifying current RED |
|
|
implementations. For packets from ECN capable hosts, the router marks |
|
|
the packets rather than dropping them (if the average queue size is |
|
|
between minth and maxth). It is necessary that the router identifies |
|
|
that a packet is ECN capable, and should only mark packets that are |
|
|
from ECN capable hosts. This uses two bits in the IP header. The ECN |
|
|
Capable Transport (ECT) bit is set by the sender end system if both |
|
|
the end systems are ECN capable (for a unicast transport, only if |
|
|
both end systems are ECN-capable). In TCP this is confirmed in the |
|
|
pre-negotiation during the connection setup phase (explained in |
|
|
Section 3.2). Packets encountering congestion are marked by the |
|
|
router using the Congestion Experienced (CE) (if the average queue |
|
|
size is between minth and maxth) on their way to the receiver end |
|
|
system (from the sender end system), with a probability proportional |
|
|
to the average queue size following the procedure used in RED |
|
|
(RFC2309) routers. Bits 10 and 11 in the IPV6 header are proposed |
|
|
respectively for the ECT and CE bits. Bits 6 and 7 of the IPV4 header |
|
|
DSCP field are also specified for experimental purposes for the ECT |
|
|
and CE bits respectively. |
|
|
|
|
|
3.2. Changes at the TCP Host side |
|
|
|
|
|
The proposal to add ECN to TCP specifies two new flags in the |
|
|
reserved field of the TCP header. Bit 9 in the reserved field of the |
|
|
TCP header is designated as the ECN-Echo (ECE) flag and Bit 8 is |
|
|
|
|
|
|
|
|
|
|
|
Salim & Ahmed Informational [Page 4] |
|
|
|
|
|
RFC 2884 ECN in IP Networks July 2000 |
|
|
|
|
|
|
|
|
designated as the Congestion Window Reduced (CWR) flag. These two |
|
|
bits are used both for the initializing phase in which the sender and |
|
|
the receiver negotiate the capability and the desire to use ECN, as |
|
|
well as for the subsequent actions to be taken in case there is |
|
|
congestion experienced in the network during the established state. |
|
|
|
|
|
There are two main changes that need to be made to add ECN to TCP to |
|
|
an end system and one extension to a router running RED. |
|
|
|
|
|
1. In the connection setup phase, the source and destination TCPs |
|
|
have to exchange information about their desire and/or capability to |
|
|
use ECN. This is done by setting both the ECN-Echo flag and the CWR |
|
|
flag in the SYN packet of the initial connection phase by the sender; |
|
|
on receipt of this SYN packet, the receiver will set the ECN-Echo |
|
|
flag in the SYN-ACK response. Once this agreement has been reached, |
|
|
the sender will thereon set the ECT bit in the IP header of data |
|
|
packets for that flow, to indicate to the network that it is capable |
|
|
and willing to participate in ECN. The ECT bit is set on all packets |
|
|
other than pure ACK's. |
|
|
|
|
|
2. When a router has decided from its active queue management |
|
|
mechanism, to drop or mark a packet, it checks the IP-ECT bit in the |
|
|
packet header. It sets the CE bit in the IP header if the IP-ECT bit |
|
|
is set. When such a packet reaches the receiver, the receiver |
|
|
responds by setting the ECN-Echo flag (in the TCP header) in the next |
|
|
outgoing ACK for the flow. The receiver will continue to do this in |
|
|
subsequent ACKs until it receives from the sender an indication that |
|
|
it (the sender) has responded to the congestion notification. |
|
|
|
|
|
3. Upon receipt of this ACK, the sender triggers its congestion |
|
|
avoidance algorithm by halving its congestion window, cwnd, and |
|
|
updating its congestion window threshold value ssthresh. Once it has |
|
|
taken these appropriate steps, the sender sets the CWR bit on the |
|
|
next data outgoing packet to tell the receiver that it has reacted to |
|
|
the (receiver's) notification of congestion. The receiver reacts to |
|
|
the CWR by halting the sending of the congestion notifications (ECE) |
|
|
to the sender if there is no new congestion in the network. |
|
|
|
|
|
Note that the sender reaction to the indication of congestion in the |
|
|
network (when it receives an ACK packet that has the ECN-Echo flag |
|
|
set) is equivalent to the Fast Retransmit/Recovery algorithm (when |
|
|
there is a congestion loss) in NON-ECN-capable TCP i.e. the sender |
|
|
halves the congestion window cwnd and reduces the slow start |
|
|
threshold ssthresh. Fast Retransmit/Recovery is still available for |
|
|
ECN capable stacks for responding to three duplicate acknowledgments. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Salim & Ahmed Informational [Page 5] |
|
|
|
|
|
RFC 2884 ECN in IP Networks July 2000 |
|
|
|
|
|
|
|
|
4. Experimental setup |
|
|
|
|
|
For testing purposes we have added ECN to the Linux TCP/IP stack, |
|
|
kernels version 2.0.32. 2.2.5, 2.3.43 (there were also earlier |
|
|
revisions of 2.3 which were tested). The 2.0.32 implementation |
|
|
conforms to RFC 2481 [7] for the end systems only. We have also |
|
|
modified the code in the 2.1,2.2 and 2.3 cases for the router portion |
|
|
as well as end system to conform to the RFC. An outdated version of |
|
|
the 2.0 code is available at [18]. Note Linux version 2.0.32 |
|
|
implements TCP Reno congestion control while kernels >= 2.2.0 default |
|
|
to New Reno but will opt for a SACK/FACK combo when the remote end |
|
|
understands SACK. Our initial tests were carried out with the 2.0 |
|
|
kernel at the end system and 2.1 (pre 2.2) for the router part. The |
|
|
majority of the test results here apply to the 2.0 tests. We did |
|
|
repeat these tests on a different testbed (move from Pentium to |
|
|
Pentium-II class machines)with faster machines for the 2.2 and 2.3 |
|
|
kernels, so the comparisons on the 2.0 and 2.2/3 are not relative. |
|
|
|
|
|
We have updated this memo release to reflect the tests against SACK |
|
|
and New Reno. |
|
|
|
|
|
4.1. Testbed setup |
|
|
|
|
|
----- ---- |
|
|
| ECN | | ECN | |
|
|
| ON | | OFF | |
|
|
data direction ---->> ----- ---- |
|
|
| | |
|
|
server | | |
|
|
---- ------ ------ | | |
|
|
| | | R1 | | R2 | | | |
|
|
| | -----| | ---- | | ---------------------- |
|
|
---- ------ ^ ------ | |
|
|
^ | |
|
|
| ----- |
|
|
congestion point ___| | C | |
|
|
| | |
|
|
----- |
|
|
|
|
|
The figure above shows our test setup. |
|
|
|
|
|
All the physical links are 10Mbps ethernet. Using Class Based |
|
|
Queuing (CBQ) [22], packets from the data server are constricted to a |
|
|
1.5Mbps pipe at the router R1. Data is always retrieved from the |
|
|
server towards the clients labelled , "ECN ON", "ECN OFF", and "C". |
|
|
Since the pipe from the server is 10Mbps, this creates congestion at |
|
|
the exit from the router towards the clients for competing flows. The |
|
|
machines labeled "ECN ON" and "ECN OFF" are running the same version |
|
|
|
|
|
|
|
|
|
|
|
Salim & Ahmed Informational [Page 6] |
|
|
|
|
|
RFC 2884 ECN in IP Networks July 2000 |
|
|
|
|
|
|
|
|
of Linux and have exactly the same hardware configuration. The server |
|
|
is always ECN capable (and can handle NON ECN flows as well using the |
|
|
standard congestion algorithms). The machine labeled "C" is used to |
|
|
create congestion in the network. Router R2 acts as a path-delay |
|
|
controller. With it we adjust the RTT the clients see. Router R1 |
|
|
has RED implemented in it and has capability for supporting ECN |
|
|
flows. The path-delay router is a PC running the Nistnet [16] |
|
|
package on a Linux platform. The latency of the link for the |
|
|
experiments was set to be 20 millisecs. |
|
|
|
|
|
4.2. Validating the Implementation |
|
|
|
|
|
We spent time validating that the implementation was conformant to |
|
|
the specification in RFC 2481. To do this, the popular tcpdump |
|
|
sniffer [24] was modified to show the packets being marked. We |
|
|
visually inspected tcpdump traces to validate the conformance to the |
|
|
RFC under a lot of different scenarios. We also modified tcptrace |
|
|
[25] in order to plot the marked packets for visualization and |
|
|
analysis. |
|
|
|
|
|
Both tcpdump and tcptrace revealed that the implementation was |
|
|
conformant to the RFC. |
|
|
|
|
|
4.3. Terminology used |
|
|
|
|
|
This section presents background terminology used in the next few |
|
|
sections. |
|
|
|
|
|
* Congesting flows: These are TCP flows that are started in the |
|
|
background so as to create congestion from R1 towards R2. We use the |
|
|
laptop labeled "C" to introduce congesting flows. Note that "C" as is |
|
|
the case with the other clients retrieves data from the server. |
|
|
|
|
|
* Low, Moderate and High congestion: For the case of low congestion |
|
|
we start two congesting flows in the background, for moderate |
|
|
congestion we start five congesting flows and for the case of high |
|
|
congestion we start ten congesting flows in the background. |
|
|
|
|
|
* Competing flows: These are the flows that we are interested in. |
|
|
They are either ECN TCP flows from/to "ECN ON" or NON ECN TCP flows |
|
|
from/to "ECN OFF". |
|
|
|
|
|
* Maximum drop rate: This is the RED parameter that sets the maximum |
|
|
probability of a packet being marked at the router. This corresponds |
|
|
to maxp as explained in Section 2.1. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Salim & Ahmed Informational [Page 7] |
|
|
|
|
|
RFC 2884 ECN in IP Networks July 2000 |
|
|
|
|
|
|
|
|
Our tests were repeated for varying levels of congestion with varying |
|
|
maximum drop rates. The results are presented in the subsequent |
|
|
sections. |
|
|
|
|
|
* Low, Medium and High drop probability: We use the term low |
|
|
probability to mean a drop probability maxp of 0.02, medium |
|
|
probability for 0.2 and high probability for 0.5. We also |
|
|
experimented with drop probabilities of 0.05, 0.1 and 0.3. |
|
|
|
|
|
* Goodput: We define goodput as the effective data rate as observed |
|
|
by the user, i.e., if we transmitted 4 data packets in which two of |
|
|
them were retransmitted packets, the efficiency is 50% and the |
|
|
resulting goodput is 2*packet size/time taken to transmit. |
|
|
|
|
|
* RED Region: When the router's average queue size is between minth |
|
|
and maxth we denote that we are operating in the RED region. |
|
|
|
|
|
4.4. RED parameter selection |
|
|
|
|
|
In our initial testing we noticed that as we increase the number of |
|
|
congesting flows the RED queue degenerates into a simple Tail Drop |
|
|
queue. i.e. the average queue exceeds the maximum threshold most of |
|
|
the times. Note that this phenomena has also been observed by [5] |
|
|
who proposes a dynamic solution to alleviate it by adjusting the |
|
|
packet dropping probability "maxp" based on the past history of the |
|
|
average queue size. Hence, it is necessary that in the course of our |
|
|
experiments the router operate in the RED region, i.e., we have to |
|
|
make sure that the average queue is maintained between minth and |
|
|
maxth. If this is not maintained, then the queue acts like a Tail |
|
|
Drop queue and the advantages of ECN diminish. Our goal is to |
|
|
validate ECN's benefits when used with RED at the router. To ensure |
|
|
that we were operating in the RED region we monitored the average |
|
|
queue size and the actual queue size in times of low, moderate and |
|
|
high congestion and fine-tuned the RED parameters such that the |
|
|
average queue zones around the RED region before running the |
|
|
experiment proper. Our results are, therefore, not influenced by |
|
|
operating in the wrong RED region. |
|
|
|
|
|
5. The Experiments |
|
|
|
|
|
We start by making sure that the background flows do not bias our |
|
|
results by computing the fairness index [12] in Section 5.1. We |
|
|
proceed to carry out the experiments for bulk transfer presenting the |
|
|
results and analysis in Section 5.2. In Section 5.3 the results for |
|
|
transactional transfers along with analysis is presented. More |
|
|
details on the experimental results can be found in [27]. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Salim & Ahmed Informational [Page 8] |
|
|
|
|
|
RFC 2884 ECN in IP Networks July 2000 |
|
|
|
|
|
|
|
|
5.1. Fairness |
|
|
|
|
|
In the course of the experiments we wanted to make sure that our |
|
|
choice of the type of background flows does not bias the results that |
|
|
we collect. Hence we carried out some tests initially with both ECN |
|
|
and NON ECN flows as the background flows. We repeated the |
|
|
experiments for different drop probabilities and calculated the |
|
|
fairness index [12]. We also noticed (when there were equal number |
|
|
of ECN and NON ECN flows) that the number of packets dropped for the |
|
|
NON ECN flows was equal to the number of packets marked for the ECN |
|
|
flows, showing thereby that the RED algorithm was fair to both kind |
|
|
of flows. |
|
|
|
|
|
Fairness index: The fairness index is a performance metric described |
|
|
in [12]. Jain [12] postulates that the network is a multi-user |
|
|
system, and derives a metric to see how fairly each user is treated. |
|
|
He defines fairness as a function of the variability of throughput |
|
|
across users. For a given set of user throughputs (x1, x2...xn), the |
|
|
fairness index to the set is defined as follows: |
|
|
|
|
|
f(x1,x2,.....,xn) = square((sum[i=1..n]xi))/(n*sum[i=1..n]square(xi)) |
|
|
|
|
|
The fairness index always lies between 0 and 1. A value of 1 |
|
|
indicates that all flows got exactly the same throughput. Each of |
|
|
the tests was carried out 10 times to gain confidence in our results. |
|
|
To compute the fairness index we used FTP to generate traffic. |
|
|
|
|
|
Experiment details: At time t = 0 we start 2 NON ECN FTP sessions in |
|
|
the background to create congestion. At time t=20 seconds we start |
|
|
two competing flows. We note the throughput of all the flows in the |
|
|
network and calculate the fairness index. The experiment was carried |
|
|
out for various maximum drop probabilities and for various congestion |
|
|
levels. The same procedure is repeated with the background flows as |
|
|
ECN. The fairness index was fairly constant in both the cases when |
|
|
the background flows were ECN and NON ECN indicating that there was |
|
|
no bias when the background flows were either ECN or NON ECN. |
|
|
|
|
|
Max Fairness Fairness |
|
|
Drop With BG With BG |
|
|
Prob flows ECN flows NON ECN |
|
|
|
|
|
0.02 0.996888 0.991946 |
|
|
0.05 0.995987 0.988286 |
|
|
0.1 0.985403 0.989726 |
|
|
0.2 0.979368 0.983342 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Salim & Ahmed Informational [Page 9] |
|
|
|
|
|
RFC 2884 ECN in IP Networks July 2000 |
|
|
|
|
|
|
|
|
With the observation that the nature of background flows does not |
|
|
alter the results, we proceed by using the background flows as NON |
|
|
ECN for the rest of the experiments. |
|
|
|
|
|
5.2. Bulk transfers |
|
|
|
|
|
The metric we chose for bulk transfer is end user throughput. |
|
|
|
|
|
Experiment Details: All TCP flows used are RENO TCP. For the case of |
|
|
low congestion we start 2 FTP flows in the background at time 0. Then |
|
|
after about 20 seconds we start the competing flows, one data |
|
|
transfer to the ECN machine and the second to the NON ECN machine. |
|
|
The size of the file used is 20MB. For the case of moderate |
|
|
congestion we start 5 FTP flows in the background and for the case of |
|
|
high congestion we start 10 FTP flows in the background. We repeat |
|
|
the experiments for various maximum drop rates each repeated for a |
|
|
number of sets. |
|
|
|
|
|
Observation and Analysis: |
|
|
|
|
|
We make three key observations: |
|
|
|
|
|
1) As the congestion level increases, the relative advantage for ECN |
|
|
increases but the absolute advantage decreases (expected, since there |
|
|
are more flows competing for the same link resource). ECN still does |
|
|
better than NON ECN even under high congestion. Infering a sample |
|
|
from the collected results: at maximum drop probability of 0.1, for |
|
|
example, the relative advantage of ECN increases from 23% to 50% as |
|
|
the congestion level increases from low to high. |
|
|
|
|
|
2) Maintaining congestion levels and varying the maximum drop |
|
|
probability (MDP) reveals that the relative advantage of ECN |
|
|
increases with increasing MDP. As an example, for the case of high |
|
|
congestion as we vary the drop probability from 0.02 to 0.5 the |
|
|
relative advantage of ECN increases from 10% to 60%. |
|
|
|
|
|
3) There were hardly any retransmissions for ECN flows (except the |
|
|
occasional packet drop in a minority of the tests for the case of |
|
|
high congestion and low maximum drop probability). |
|
|
|
|
|
We analyzed tcpdump traces for NON ECN with the help of tcptrace and |
|
|
observed that there were hardly any retransmits due to timeouts. |
|
|
(Retransmit due to timeouts are inferred by counting the number of 3 |
|
|
DUPACKS retransmit and subtracting them from the total recorded |
|
|
number of retransmits). This means that over a long period of time |
|
|
(as is the case of long bulk transfers), the data-driven loss |
|
|
recovery mechanism of the Fast Retransmit/Recovery algorithm is very |
|
|
effective. The algorithm for ECN on congestion notification from ECE |
|
|
|
|
|
|
|
|
|
|
|
Salim & Ahmed Informational [Page 10] |
|
|
|
|
|
RFC 2884 ECN in IP Networks July 2000 |
|
|
|
|
|
|
|
|
is the same as that for a Fast Retransmit for NON ECN. Since both are |
|
|
operating in the RED region, ECN barely gets any advantage over NON |
|
|
ECN from the signaling (packet drop vs. marking). |
|
|
|
|
|
It is clear, however, from the results that ECN flows benefit in bulk |
|
|
transfers. We believe that the main advantage of ECN for bulk |
|
|
transfers is that less time is spent recovering (whereas NON ECN |
|
|
spends time retransmitting), and timeouts are avoided altogether. |
|
|
[23] has shown that even with RED deployed, TCP RENO could suffer |
|
|
from multiple packet drops within the same window of data, likely to |
|
|
lead to multiple congestion reactions or timeouts (these problems are |
|
|
alleviated by ECN). However, while TCP Reno has performance problems |
|
|
with multiple packets dropped in a window of data, New Reno and SACK |
|
|
have no such problems. |
|
|
|
|
|
Thus, for scenarios with very high levels of congestion, the |
|
|
advantages of ECN for TCP Reno flows could be more dramatic than the |
|
|
advantages of ECN for NewReno or SACK flows. An important |
|
|
observation to make from our results is that we do not notice |
|
|
multiple drops within a single window of data. Thus, we would expect |
|
|
that our results are not heavily influenced by Reno's performance |
|
|
problems with multiple packets dropped from a window of data. We |
|
|
repeated these tests with ECN patched newer Linux kernels. As |
|
|
mentioned earlier these kernels would use a SACK/FACK combo with a |
|
|
fallback to New Reno. SACK can be selectively turned off (defaulting |
|
|
to New Reno). Our results indicate that ECN still improves |
|
|
performance for the bulk transfers. More results are available in the |
|
|
pdf version[27]. As in 1) above, maintaining a maximum drop |
|
|
probability of 0.1 and increasing the congestion level, it is |
|
|
observed that ECN-SACK improves performance from about 5% at low |
|
|
congestion to about 15% at high congestion. In the scenario where |
|
|
high congestion is maintained and the maximum drop probability is |
|
|
moved from 0.02 to 0.5, the relative advantage of ECN-SACK improves |
|
|
from 10% to 40%. Although this numbers are lower than the ones |
|
|
exhibited by Reno, they do reflect the improvement that ECN offers |
|
|
even in the presence of robust recovery mechanisms such as SACK. |
|
|
|
|
|
5.3. Transactional transfers |
|
|
|
|
|
We model transactional transfers by sending a small request and |
|
|
getting a response from a server before sending the next request. To |
|
|
generate transactional transfer traffic we use Netperf [17] with the |
|
|
CRR (Connect Request Response) option. As an example let us assume |
|
|
that we are retrieving a small file of say 5 - 20 KB, then in effect |
|
|
we send a small request to the server and the server responds by |
|
|
sending us the file. The transaction is complete when we receive the |
|
|
complete file. To gain confidence in our results we carry the |
|
|
simulation for about one hour. For each test there are a few thousand |
|
|
|
|
|
|
|
|
|
|
|
Salim & Ahmed Informational [Page 11] |
|
|
|
|
|
RFC 2884 ECN in IP Networks July 2000 |
|
|
|
|
|
|
|
|
of these requests and responses taking place. Although not exactly |
|
|
modeling HTTP 1.0 traffic, where several concurrent sessions are |
|
|
opened, Netperf-CRR is nevertheless a close approximation. Since |
|
|
Netperf-CRR waits for one connection to complete before opening the |
|
|
next one (0 think time), that single connection could be viewed as |
|
|
the slowest response in the set of the opened concurrent sessions (in |
|
|
HTTP). The transactional data sizes were selected based on [2] which |
|
|
indicates that the average web transaction was around 8 - 10 KB; The |
|
|
smaller (5KB) size was selected to guestimate the size of |
|
|
transactional processing that may become prevalent with policy |
|
|
management schemes in the diffserv [4] context. Using Netperf we are |
|
|
able to initiate these kind of transactional transfers for a variable |
|
|
length of time. The main metric of interest in this case is the |
|
|
transaction rate, which is recorded by Netperf. |
|
|
|
|
|
* Define Transaction rate as: The number of requests and complete |
|
|
responses for a particular requested size that we are able to do per |
|
|
second. For example if our request is of 1KB and the response is 5KB |
|
|
then we define the transaction rate as the number of such complete |
|
|
transactions that we can accomplish per second. |
|
|
|
|
|
Experiment Details: Similar to the case of bulk transfers we start |
|
|
the background FTP flows to introduce the congestion in the network |
|
|
at time 0. About 20 seconds later we start the transactional |
|
|
transfers and run each test for three minutes. We record the |
|
|
transactions per second that are complete. We repeat the test for |
|
|
about an hour and plot the various transactions per second, averaged |
|
|
out over the runs. The experiment is repeated for various maximum |
|
|
drop probabilities, file sizes and various levels of congestion. |
|
|
|
|
|
Observation and Analysis |
|
|
|
|
|
There are three key observations: |
|
|
|
|
|
1) As congestion increases (with fixed drop probability) the relative |
|
|
advantage for ECN increases (again the absolute advantage does not |
|
|
increase since more flows are sharing the same bandwidth). For |
|
|
example, from the results, if we consider the 5KB transactional flow, |
|
|
as we increase the congestion from medium congestion (5 congesting |
|
|
flows) to high congestion (10 congesting flows) for a maximum drop |
|
|
probability of 0.1 the relative gain for ECN increases from 42% to |
|
|
62%. |
|
|
|
|
|
2) Maintaining the congestion level while adjusting the maximum drop |
|
|
probability indicates that the relative advantage for ECN flows |
|
|
increase. From the case of high congestion for the 5KB flow we |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Salim & Ahmed Informational [Page 12] |
|
|
|
|
|
RFC 2884 ECN in IP Networks July 2000 |
|
|
|
|
|
|
|
|
observe that the number of transactions per second increases from 0.8 |
|
|
to 2.2 which corresponds to an increase in relative gain for ECN of |
|
|
20% to 140%. |
|
|
|
|
|
3) As the transactional data size increases, ECN's advantage |
|
|
diminishes because the probability of recovering from a Fast |
|
|
Retransmit increases for NON ECN. ECN, therefore, has a huge |
|
|
advantage as the transactional data size gets smaller as is observed |
|
|
in the results. This can be explained by looking at TCP recovery |
|
|
mechanisms. NON ECN in the short flows depends, for recovery, on |
|
|
congestion signaling via receiving 3 duplicate ACKs, or worse by a |
|
|
retransmit timer expiration, whereas ECN depends mostly on the TCP- |
|
|
ECE flag. This is by design in our experimental setup. [3] shows |
|
|
that most of the TCP loss recovery in fact happens in timeouts for |
|
|
short flows. The effectiveness of the Fast Retransmit/Recovery |
|
|
algorithm is limited by the fact that there might not be enough data |
|
|
in the pipe to elicit 3 duplicate ACKs. TCP RENO needs at least 4 |
|
|
outstanding packets to recover from losses without going into a |
|
|
timeout. For 5KB (4 packets for MTU of 1500Bytes) a NON ECN flow will |
|
|
always have to wait for a retransmit timeout if any of its packets |
|
|
are lost. ( This timeout could only have been avoided if the flow had |
|
|
used an initial window of four packets, and the first of the four |
|
|
packets was the packet dropped). We repeated these experiments with |
|
|
the kernels implementing SACK/FACK and New Reno algorithms. Our |
|
|
observation was that there was hardly any difference with what we saw |
|
|
with Reno. For example in the case of SACK-ECN enabling: maintaining |
|
|
the maximum drop probability to 0.1 and increasing the congestion |
|
|
level for the 5KB transaction we noticed that the relative gain for |
|
|
the ECN enabled flows increases from 47-80%. If we maintain the |
|
|
congestion level for the 5KB transactions and increase the maximum |
|
|
drop probabilities instead, we notice that SACKs performance |
|
|
increases from 15%-120%. It is fair to comment that the difference |
|
|
in the testbeds (different machines, same topology) might have |
|
|
contributed to the results; however, it is worth noting that the |
|
|
relative advantage of the SACK-ECN is obvious. |
|
|
|
|
|
6. Conclusion |
|
|
|
|
|
ECN enhancements improve on both bulk and transactional TCP traffic. |
|
|
The improvement is more obvious in short transactional type of flows |
|
|
(popularly referred to as mice). |
|
|
|
|
|
* Because less retransmits happen with ECN, it means less traffic on |
|
|
the network. Although the relative amount of data retransmitted in |
|
|
our case is small, the effect could be higher when there are more |
|
|
contributing end systems. The absence of retransmits also implies an |
|
|
improvement in the goodput. This becomes very important for scenarios |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Salim & Ahmed Informational [Page 13] |
|
|
|
|
|
RFC 2884 ECN in IP Networks July 2000 |
|
|
|
|
|
|
|
|
where bandwidth is expensive such as in low bandwidth links. This |
|
|
implies also that ECN lends itself well to applications that require |
|
|
reliability but would prefer to avoid unnecessary retransmissions. |
|
|
|
|
|
* The fact that ECN avoids timeouts by getting faster notification |
|
|
(as opposed to traditional packet dropping inference from 3 duplicate |
|
|
ACKs or, even worse, timeouts) implies less time is spent during |
|
|
error recovery - this also improves goodput. |
|
|
|
|
|
* ECN could be used to help in service differentiation where the end |
|
|
user is able to "probe" for their target rate faster. Assured |
|
|
forwarding [1] in the diffserv working group at the IETF proposes |
|
|
using RED with varying drop probabilities as a service |
|
|
differentiation mechanism. It is possible that multiple packets |
|
|
within a single window in TCP RENO could be dropped even in the |
|
|
presence of RED, likely leading into timeouts [23]. ECN end systems |
|
|
ignore multiple notifications, which help in countering this scenario |
|
|
resulting in improved goodput. The ECN end system also ends up |
|
|
probing the network faster (to reach an optimal bandwidth). [23] also |
|
|
notes that RENO is the most widely deployed TCP implementation today. |
|
|
|
|
|
It is clear that the advent of policy management schemes introduces |
|
|
new requirements for transactional type of applications, which |
|
|
constitute a very short query and a response in the order of a few |
|
|
packets. ECN provides advantages to transactional traffic as we have |
|
|
shown in the experiments. |
|
|
|
|
|
7. Acknowledgements |
|
|
|
|
|
We would like to thank Alan Chapman, Ioannis Lambadaris, Thomas Kunz, |
|
|
Biswajit Nandy, Nabil Seddigh, Sally Floyd, and Rupinder Makkar for |
|
|
their helpful feedback and valuable suggestions. |
|
|
|
|
|
8. Security Considerations |
|
|
|
|
|
Security considerations are as discussed in section 9 of RFC 2481. |
|
|
|
|
|
9. References |
|
|
|
|
|
[1] Heinanen, J., Finland, T., Baker, F., Weiss, W. and J. |
|
|
Wroclawski, "Assured Forwarding PHB Group", RFC 2597, June 1999. |
|
|
|
|
|
[2] B.A. Mat. "An empirical model of HTTP network traffic." In |
|
|
proceedings INFOCOMM'97. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Salim & Ahmed Informational [Page 14] |
|
|
|
|
|
RFC 2884 ECN in IP Networks July 2000 |
|
|
|
|
|
|
|
|
[3] Balakrishnan H., Padmanabhan V., Seshan S., Stemn M. and Randy |
|
|
H. Katz, "TCP Behavior of a busy Internet Server: Analysis and |
|
|
Improvements", Proceedings of IEEE Infocom, San Francisco, CA, |
|
|
USA, March '98 |
|
|
http://nms.lcs.mit.edu/~hari/papers/infocom98.ps.gz |
|
|
|
|
|
[4] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z. and W. |
|
|
Weiss, "An Architecture for Differentiated Services", RFC 2475, |
|
|
December 1998. |
|
|
|
|
|
[5] W. Feng, D. Kandlur, D. Saha, K. Shin, "Techniques for |
|
|
Eliminating Packet Loss in Congested TCP/IP Networks", U. |
|
|
Michigan CSE-TR-349-97, November 1997. |
|
|
|
|
|
[6] S. Floyd. "TCP and Explicit Congestion Notification." ACM |
|
|
Computer Communications Review, 24, October 1994. |
|
|
|
|
|
[7] Ramakrishnan, K. and S. Floyd, "A Proposal to add Explicit |
|
|
Congestion Notification (ECN) to IP", RFC 2481, January 1999. |
|
|
|
|
|
[8] Kevin Fall, Sally Floyd, "Comparisons of Tahoe, RENO and Sack |
|
|
TCP", Computer Communications Review, V. 26 N. 3, July 1996, |
|
|
pp. 5-21 |
|
|
|
|
|
[9] S. Floyd and V. Jacobson. "Random Early Detection Gateways for |
|
|
Congestion Avoidance". IEEE/ACM Transactions on Networking, |
|
|
3(1), August 1993. |
|
|
|
|
|
[10] E. Hashem. "Analysis of random drop for gateway congestion |
|
|
control." Rep. Lcs tr-465, Lav. Fot Comput. Sci., M.I.T., 1989. |
|
|
|
|
|
[11] V. Jacobson. "Congestion Avoidance and Control." In Proceedings |
|
|
of SIGCOMM '88, Stanford, CA, August 1988. |
|
|
|
|
|
[12] Raj Jain, "The art of computer systems performance analysis", |
|
|
John Wiley and sons QA76.9.E94J32, 1991. |
|
|
|
|
|
[13] T. V. Lakshman, Arnie Neidhardt, Teunis Ott, "The Drop From |
|
|
Front Strategy in TCP Over ATM and Its Interworking with Other |
|
|
Control Features", Infocom 96, MA28.1. |
|
|
|
|
|
[14] P. Mishra and H. Kanakia. "A hop by hop rate based congestion |
|
|
control scheme." Proc. SIGCOMM '92, pp. 112-123, August 1992. |
|
|
|
|
|
[15] Floyd, S. and T. Henderson, "The NewReno Modification to TCP's |
|
|
Fast Recovery Algorithm", RFC 2582, April 1999. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Salim & Ahmed Informational [Page 15] |
|
|
|
|
|
RFC 2884 ECN in IP Networks July 2000 |
|
|
|
|
|
|
|
|
[16] The NIST Network Emulation Tool |
|
|
http://www.antd.nist.gov/itg/nistnet/ |
|
|
|
|
|
[17] The network performance tool |
|
|
http://www.netperf.org/netperf/NetperfPage.html |
|
|
|
|
|
[18] ftp://ftp.ee.lbl.gov/ECN/ECN-package.tgz |
|
|
|
|
|
[19] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, S., |
|
|
Estrin, D., Floyd, S., Jacobson, V., Minshall, G., Partridge, |
|
|
C., Peterson, L., Ramakrishnan, K., Shenker, S., Wroclawski, J. |
|
|
and L. Zhang, "Recommendations on Queue Management and |
|
|
Congestion Avoidance in the Internet", RFC 2309, April 1998. |
|
|
|
|
|
[20] K. K. Ramakrishnan and R. Jain. "A Binary feedback scheme for |
|
|
congestion avoidance in computer networks." ACM Trans. Comput. |
|
|
Syst.,8(2):158-181, 1990. |
|
|
|
|
|
[21] Mathis, M., Mahdavi, J., Floyd, S. and A. Romanow, "TCP |
|
|
Selective Acknowledgement Options", RFC 2018, October 1996. |
|
|
|
|
|
[22] S. Floyd and V. Jacobson, "Link sharing and Resource Management |
|
|
Models for packet Networks", IEEE/ACM Transactions on |
|
|
Networking, Vol. 3 No.4, August 1995. |
|
|
|
|
|
[23] Prasad Bagal, Shivkumar Kalyanaraman, Bob Packer, "Comparative |
|
|
study of RED, ECN and TCP Rate Control". |
|
|
http://www.packeteer.com/technology/Pdf/packeteer-final.pdf |
|
|
|
|
|
[24] tcpdump, the protocol packet capture & dumper program. |
|
|
ftp://ftp.ee.lbl.gov/tcpdump.tar.Z |
|
|
|
|
|
[25] TCP dump file analysis tool: |
|
|
http://jarok.cs.ohiou.edu/software/tcptrace/tcptrace.html |
|
|
|
|
|
[26] Thompson K., Miller, G.J., Wilder R., "Wide-Area Internet |
|
|
Traffic Patterns and Characteristics". IEEE Networks Magazine, |
|
|
November/December 1997. |
|
|
|
|
|
[27] http://www7.nortel.com:8080/CTL/ecnperf.pdf |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Salim & Ahmed Informational [Page 16] |
|
|
|
|
|
RFC 2884 ECN in IP Networks July 2000 |
|
|
|
|
|
|
|
|
10. Authors' Addresses |
|
|
|
|
|
Jamal Hadi Salim |
|
|
Nortel Networks |
|
|
3500 Carling Ave |
|
|
Ottawa, ON, K2H 8E9 |
|
|
Canada |
|
|
|
|
|
EMail: hadi@nortelnetworks.com |
|
|
|
|
|
|
|
|
Uvaiz Ahmed |
|
|
Dept. of Systems and Computer Engineering |
|
|
Carleton University |
|
|
Ottawa |
|
|
Canada |
|
|
|
|
|
EMail: ahmed@sce.carleton.ca |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Salim & Ahmed Informational [Page 17] |
|
|
|
|
|
RFC 2884 ECN in IP Networks July 2000 |
|
|
|
|
|
|
|
|
11. Full Copyright Statement |
|
|
|
|
|
Copyright (C) The Internet Society (2000). All Rights Reserved. |
|
|
|
|
|
This document and translations of it may be copied and furnished to |
|
|
others, and derivative works that comment on or otherwise explain it |
|
|
or assist in its implementation may be prepared, copied, published |
|
|
and distributed, in whole or in part, without restriction of any |
|
|
kind, provided that the above copyright notice and this paragraph are |
|
|
included on all such copies and derivative works. However, this |
|
|
document itself may not be modified in any way, such as by removing |
|
|
the copyright notice or references to the Internet Society or other |
|
|
Internet organizations, except as needed for the purpose of |
|
|
developing Internet standards in which case the procedures for |
|
|
copyrights defined in the Internet Standards process must be |
|
|
followed, or as required to translate it into languages other than |
|
|
English. |
|
|
|
|
|
The limited permissions granted above are perpetual and will not be |
|
|
revoked by the Internet Society or its successors or assigns. |
|
|
|
|
|
This document and the information contained herein is provided on an |
|
|
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING |
|
|
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING |
|
|
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION |
|
|
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF |
|
|
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. |
|
|
|
|
|
Acknowledgement |
|
|
|
|
|
Funding for the RFC Editor function is currently provided by the |
|
|
Internet Society. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Salim & Ahmed Informational [Page 18] |
|
|
|
|
|
|