What is Decongestion Control?

Decongestion control is a new approach to Internet congestion control, which, contrary to existing protocols such as TCP, involves hosts transmitting packets as fast as possible into the network. We leverage erasure coding to mitigate the impact of losses and thereby have the potential to deliver not only high throughput, but also to improve traffic stability, shrink router buffer requirements, and protect against selfish and malicious transmission behavior.

Papers

Efficient and Robust Decongestion Control
Barath Raghavan, John McCullough and Alex Snoeren
Draft, April 2007

Decongestion Control [PDF, talk PDF]
Barath Raghavan and Alex Snoeren
ACM SIGCOMM Workshop on Hot Topics in Networks (Hotnets-V), November 2006

People

John McCullough
Barath Raghavan
Alex Snoeren

FAQ

Since decongestion control significantly alters the way network congestion is handled, many questions arise. Below are several of the concerns we are aware of, and our initial answers.

Basics

How do hosts decide how fast to send?

In principle, hosts send as fast as possible whenever they have data to send. In practice, hosts need only send fast enough to overdrive their upstream bottleneck link(s).

Won't decongestion control cause congestion collapse?

Conventional wisdom dictates that blasting packets into the network is a bad idea, and will lead to congestion collapse for two reasons: hosts are unable to cope with packet losses and packets are dropped after having traversed multiple links, thereby wasting bandwidth.

We believe we can avoid these two problems by applying efficient erasure codes and by leveraging the network's topology such that drops happen early along a flow's path.


Engineering

How do hosts erasure code their data?

As an application writes data into its socket, we read the data and chop it into caravans. We code each caravan using online codes longer caravans and redundant coding for shorter ones. We find that we achieve erasure coding overheads of less than 3% using online codes in practice.

How does caravan size affect performance?

Erasure codes are typically more efficient over more data, so larger caravans yield lower erasure code overhead. We do not, however, adjust transmission rates based upon caravan size; the two issues are orthogonal.

How do flow send rates change?

Over the lifetime of a flow, the sender learns about the flow's goodput from the receiver. Using this information, the sender increases the send rate to find the flow's funnel rate: the rate at which further send rate increases yield no goodput increases. In the future, we plan to use inter-flow history to choose appropriate send rates for short-lived flows.

How does the ACK channel work?

Hosts send partial ACKs of data they've received within caravans. These partial ACKs can be lost without consequence. Once a full caravan has been received, the receiver sends a caravan ACK. Currently, we send caravan ACKs as short bursts of redundant ACKs with an inter-burst wait period; we have yet to explore different parameters and approaches for ACKs.

How does decongestion control cope with greedy end-host behavior?

Decongestion control is inherently robust to greedy sender behavior: each sender chooses a send rate that maximizes its goodput, irrespective of other senders. As a result, one sender cannot cause another sender to back off. Senders do however rely upon receivers for feedback about flow goodput, and, when overloaded, might be susceptible to greedy receivers. We have yet to directly address this problem, but expect that we can use a robust congestion signaling approach to detect greedy receivers.

What about malicious end-host behavior?

Malicious end hosts can attack another host at many layers of the protocol stack; we aim to protect against transport-layer attacks. In particular, protocols such as TCP are vulnerable to denial-of-service both in the network (via shrew attacks) and at the end host (via a spoofed RST attack). Decongestion control is robust to attacks in the network since the only option for an attacker is to send as fast as possible, which is the same behavior as that of a non-malicious host. To cope with RST attacks, we explicitly check for a RST's authenticity using a nonce and simple hash commitment. (More details are in the paper.)


Network Impacts

What is the impact of short flows that use decongestion control?

Short flows have the potential to overdrive links in such a way that they waste network capacity. Though the growing prevalence of HTTP keepalive increases the average size of flows, there will continue to be a non-negligible fraction of small flows in the network. Our current results indicate that the impact of short, web-like flows is modest, but further evaluation is needed.

How much does the bandwidth hierarchy affect performance?

We have found that for HOT-like (Internet-like) network topologies, decongestion control does not cause congestion collapse; on the contrary, it yields performance comparable or better than TCP. In the future, we plan to evaluate a more varied selection of topologies.

What is required of routers?

Decongestion control doesn't directly implement any notion of fairness, and is agnostic to the notion of fairness desired; to enforce some particular notion of fairness, routers can drop packets as needed. To implement a TCP-like flow fairness, routers can implement an algorithm such as Approximate Fair Dropping. In principle, however, the requirements for a dropping policy are simple. We say a policy is a brickwall policy if a there exists some maximum rate (at each point in time) for a flow; the sender cannot increase its throughput of this flow by sending faster. While FIFO and RED are not brickwall policies, we expect that all explicit notions of fairness are.

How does decongestion control affect router buffering? Does queuing delay increase significantly?

Decongestion control quite naturally fills bottleneck queues. Unlike for TCP (which is inherently bursty), queues in a decongestion-controlled network serve to ensure that outgoing links never go idle. As a result, we find that we can shrink queues significantly to roughly 16 packets for all link sizes encountered in today's networks. Consequently, queuing delay will be minimal with short queues, even when full.


Theory

What is the theoretical maximum goodput of a network using decongestion control?

Initial results for a HOT-like network topology, using decongestion control can yield within roughly 75% of the optimal. (We compute the optimal by computing the maximum multicommodity flow of the network given its topology and traffic matrix.) These results are quite preliminary, however; we plan to study this more extensively.


Protocol Interactions

How is BGP affected?

In principle, decongestion control should not affect BGP routing. In practice, however, there are likely a few engineering concerns: routers have limited spare CPU to erasure code BGP sessions and sometimes use simple ping-like tests to test link liveness. We expect that by moving BGP sessions to proxy hosts attached to routers, we can offload the work of using decongestion control. In addition, bursts of several packets rather than of a single packet should be used to determine liveness.

How are typical MAC layers affected?

We have yet to explore MAC-layer interactions.


Other

How stable is bandwidth allocation at senders?

We have yet to explore the stability of bandwidth allocation, but expect that decongestion control does not suffer from significant stability or convergence problems since senders do not employ a tightly-controlled feedback loop as in traditional congestion-control protocols.