TCP Bandwidth-Delay Product (BDP) Window Optimization Simulator

Parameters

Link bandwidth

Mbps

Physical link rate. 1 Gbps = 1000, 10 Gbps = 10000.

Round-trip time (RTT)

Ping RTT: 10-30 same-region, 80-130 trans-Pacific, 500+ satellite.

TCP variant

Congestion control algorithm. Linux default is CUBIC.

MSS (max segment size)

Standard Ethernet 1460, jumbo frames 9000.

TCP window size

Sender socket buffer. Smaller than BDP -> idle pipe.

Packet loss rate

0.01-0.1% on backbones, 1-5% on Wi-Fi.

OS tuning

Effect of sysctl rmem/wmem and qdisc tuning.

Results

—

BDP (KB)

—

BDP (MB)

—

Window utilization (%)

—

Mathis limit (Mbps)

—

Effective throughput (Mbps)

—

Congestion window (packets)

—

TCP pipe visualization — bandwidth, window, ACK round-trip

Pipe thickness represents link bandwidth and pipe length represents RTT. Data packets flow from sender to receiver; ACKs return from the right; the next window-worth of data is released only after ACKs arrive. A window smaller than BDP leaves gaps in the pipe.

Throughput sensitivity — effective speed vs. window size

TCP variant comparison — CUBIC / Reno / BBR / Vegas

Theory & Key Formulas

$$\text{BDP} = \text{Bandwidth} \times \text{RTT}$$

Bandwidth-delay product: the maximum number of bits in flight at any moment. RTT in seconds.

$$\text{Throughput}_{\text{Mathis}} \approx \frac{\text{MSS}}{\text{RTT}} \cdot \frac{1}{\sqrt{p}}$$

Mathis upper bound on TCP throughput; p is packet loss probability. The inverse square root means even tiny loss kills throughput on long paths.

$$\text{Throughput} \le \min\!\bigl(\text{Bandwidth},\ \tfrac{W}{\text{RTT}},\ \tfrac{\text{MSS}}{\text{RTT}\sqrt{p}}\bigr)$$

Effective throughput is the minimum of physical bandwidth, window-limited rate, and loss-limited rate. W is the TCP window size in bytes.

TCP BDP — Window Sizing for High-Speed Wide-Area Transfer

🙋

We just got a 10 Gbps leased line, but copying a file from Tokyo to San Jose runs at 100 Mbps. Is the link broken?

🎓

The link is fine — this is the classic BDP trap. Bandwidth-Delay Product (BDP) is bandwidth times RTT. TCP cannot send more data until it gets ACKs back, so the volume of data the wire can hold sets the throughput ceiling. 10 Gbps × 80 ms ≈ 100 MB. But the Linux default send buffer is only about 4 MB, so even ideally you cap out around 4% × 10 Gbps ≈ 400 Mbps. If you are only seeing 100 Mbps, there is more (probably packet loss), but raising the window is the first move.

🙋

Got it. So if I push the window slider all the way to 65535 KB, the utilization climbs to 64%. Great — am I done?

🎓

Not quite. 65535 bytes (64 KB) is actually the legacy TCP window ceiling. Beyond that you need window scaling (RFC 1323). Linux enables this by default (tcp_window_scaling=1), but you also have to raise net.core.rmem_max / wmem_max via sysctl, typically to 64–128 MB to fill a 100 MB BDP. The "OS tuning = Tuned" option in this simulator models that effect as a 1.3× boost — a rough first-order approximation, not a substitute for measurement.

🙋

Packet loss is scary. Going from 0.01% to 0.1% dropped the Mathis limit dramatically. Is 0.1% really that bad?

🎓

Yes — Mathis says throughput is inverse to √p. Ten times more loss means roughly 1/√10 ≈ 1/3.16 the throughput. This is the single biggest reason long-haul TCP feels slow even on well-provisioned circuits. On Wi-Fi with 1–5% loss, CUBIC barely manages a few Mbps. That is exactly why Google built BBR — it ignores loss as a signal and measures bandwidth/RTT directly, so it tolerates lossy paths far better. This tool gives BBR a 1.5× multiplier as a rough model.

🙋

Which variant should I actually pick — CUBIC, Reno, BBR, or Vegas? The bar chart shows BBR usually winning.

🎓

Rule of thumb: BBR for wide-area / high-bandwidth, CUBIC for LANs, Vegas when latency is more sacred than throughput. BBR is fast but can starve coexisting CUBIC flows ("unfairness"), which is why BBRv2/v3 work continues. Reno survives in textbooks but is rarely deployed today. Vegas slows down when RTT swells, so it suits VoIP/gaming/HFT where jitter is the enemy. For raw throughput, BBR; for stability, CUBIC; for latency, Vegas.

🙋

One last thing — the "congestion window (packets)" stat. How is that different from the window size in KB?

🎓

Great question. TCP actually keeps two windows: the receive window (rwnd) advertised by the receiver, and the congestion window (cwnd) the sender estimates from network feedback. Effective in-flight bytes = min(rwnd, cwnd). The "cwnd in packets" stat is just window-size-in-bytes / MSS, i.e. how many segments can be on the wire simultaneously. 64 KB / 1460 B ≈ 44 packets, but a 100 MB BDP needs roughly 71,800 packets in flight to saturate a 10 Gbps pipe. The gap is huge — that is the BDP problem in one number.

Frequently Asked Questions

BDP is the product of link bandwidth and round-trip time (RTT). It represents the maximum amount of data that can be 'in flight' on a link at any moment. For example, a 10 Gbps link with 80 ms RTT has BDP = 10000 Mbps x 0.08 s = 800 Mbit ~= 100,000 KB ~= 97.66 MB. If the TCP send window is smaller than the BDP, the sender stalls waiting for ACKs and the link sits idle, capping throughput regardless of how fat the pipe is. On long fat networks, BDP-aware window sizing is the single most important tuning parameter.

The Mathis equation gives the classical TCP throughput upper bound as MSS x 8 / (RTT x sqrt(p)), where p is the packet loss probability. Throughput is inversely proportional to the square root of loss, so loss going from 0.01% to 0.1% (a 10x increase) cuts throughput by ~3.16x. This is why long-haul TCP rarely fills the pipe: even microscopic loss dominates. This simulator combines the Mathis limit with the BDP constraint, OS tuning factor, and TCP variant multiplier to estimate effective throughput.

Reno is the textbook AIMD algorithm (additive increase, multiplicative decrease) and is too loss-sensitive for long fat networks. CUBIC (Linux default) uses a cubic growth function for the congestion window, recovering faster from loss on high-BDP links. BBR (Google) bypasses loss as a signal and instead probes the bottleneck bandwidth and RTT directly, often delivering 1.5x the throughput of CUBIC on lossy networks. Vegas detects congestion by RTT inflation rather than loss, making it suitable for ultra-low-latency use cases. This tool lets you compare all four side by side under the same conditions.

Key parameters are net.core.rmem_max / wmem_max (socket buffer ceilings), net.ipv4.tcp_rmem / tcp_wmem (autotuning range), net.ipv4.tcp_window_scaling (must be 1, the default), and net.ipv4.tcp_congestion_control (cubic or bbr). For a 10 Gbps x 80 ms link with BDP ~= 100 MB, the default 4 MB buffer is woefully inadequate; raise the maximum to 64-128 MB. To run BBR, also set net.core.default_qdisc=fq. Use this simulator to estimate the order-of-magnitude improvement before touching production sysctls.

Real-world applications

Long-haul datacenter backup replication: Trans-Pacific and trans-Atlantic links (Tokyo–Singapore RTT 70 ms, Tokyo–London 240 ms) push BDP into the 100 MB–2 GB range. Default socket buffers cap throughput around 50 Mbps even on 1 Gbps circuits. Raising rmem_max/wmem_max to 2–3× BDP, switching to CUBIC or BBR, and using parallel-TCP tools (Aspera/FASP, UDT, Multipath TCP) typically yields 10–20× throughput improvement.

Cloud uploads and CDN origin sync: S3, Cloud Storage, and Azure Blob bypass BDP limits by opening many parallel TCP streams. AWS Multipart Upload defaults to 8–16 concurrent uploads precisely because no single TCP session can saturate the BDP. HTTP/2 stream multiplexing and QUIC also address the BDP problem at the protocol layer.

Satellite / wireless links (Starlink, LEO, 5G): Geostationary satellites add 500–800 ms RTT; even LEO Starlink runs at 30–50 ms. CUBIC struggles on these paths because losses are non-congestion-related. Starlink uses BBR for exactly this reason — it preserves throughput despite jitter and loss. Shipboard, airborne, and disaster-recovery networks increasingly standardize on BBR + large buffers.

Financial trading / low-latency HPC: Conversely, exchange order entry and HPC interconnects optimize for RTT, not throughput. Here you want Vegas or DCTCP (delay-based, congestion-avoidance early) with deliberately small buffers. Cisco and Mellanox HFT NICs combine TCP offload with kernel bypass (DPDK, Solarflare OpenOnload) to push latency into microseconds.

Common misconceptions and pitfalls

The biggest pitfall is assuming "bigger window = always faster." A window smaller than BDP is indeed a bottleneck, but going much beyond BDP creates "bufferbloat": routers queue excess packets, RTT inflates 2–10×, and interactive workloads (VoIP, gaming, ssh) suffer. Active Queue Management (CoDel/FQ-CoDel/FQ) is the proper countermeasure. BBR's preference for RTT over loss as the congestion signal is partly motivated by exactly this issue.

Second pitfall: treating the Mathis bound as gospel. Mathis derives from Reno; CUBIC and especially BBR routinely exceed it. CUBIC tolerates 2–3× the loss before throughput collapses; BBR ignores loss as a signal entirely. The variant multiplier in this simulator is a first-order correction — for production decisions, always confirm with iperf3 (-P 8) and TCP_INFO measurements on the actual path.

Third pitfall: tuning only one side. TCP throughput is gated by min(sender wmem, receiver rmem). If you raise rmem_max on your EC2 instance but the on-prem peer still has a tiny wmem, you get the smaller of the two — usually well below BDP. Both endpoints must be tuned, or rely on autotuning (tcp_rmem/tcp_wmem) on both sides with maxima ≥ 2× BDP. Finally, jumbo frames (MSS=9000) only help inside Layer-2 segments that fully support them; misconfigured PMTUD across an MTU mismatch silently kills throughput.

TCP Bandwidth-Delay Product (BDP) Window Optimization Simulator

TCP BDP — Window Sizing for High-Speed Wide-Area Transfer

Frequently Asked Questions

Real-world applications

Common misconceptions and pitfalls

How to Use

Worked Example

Practical Notes