Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bufferbloat report #3

Open
dtaht opened this issue May 3, 2021 · 4 comments
Open

Bufferbloat report #3

dtaht opened this issue May 3, 2021 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@dtaht
Copy link

dtaht commented May 3, 2021

My early tests of the beta show enormous amounts of unneeded bufferbloat on the starlink uplink, downlink, and the wifi. To me, this is an easily fixable starlink problem, assuming they are using linux. Add sch_cake on the outbound pr eferably with backpressure from "BQL"( https://blog.linuxplumbersconf.org/2011/ocw/sessions/171 ), or using cake's built in shaper ( https://lwn.net/Articles/758353/ ) , add fq_codel ( https://tools.ietf.org/html/rfc8290 ) or something similar to SQM at the head-end, and fq_codel for wifi ( https://www.usenix.org/conference/atc17/technical-sessions/presentation/hoilan-jorgesen .

All these have standard APIs in the linux kernel - and would take, like, a week, to implement on the dishy for someone with clue. Well, the bloat on the wifi side is harder to fix (only support for this on 4 chipsets) but the wifi AQL and fq_codel APIs have long been in linux. ( https://lwn.net/Articles/705884/ )

The alternative... for consistently low latency under normal conditions is... sigh... is for an end user to closely monitor the connection with a tool like yours and adjust their local openwrt router's "SQM" implementation dynamically to suit with:

 ssh myrouter tc qdisc replace dev eth0 root cake bandwidth whateveritisnow... 

or get your measurement tool to run directly on openwrt.

So to make starlink bufferbloat more visible to users I am curious if you would be interested in adding a far, far more robust test than speedtest to your suite? flent's rrul test is pretty good, and the tcp_nup and tcp_ndown pretty useful.

I've established a network of flent.org servers around the world just for starlink and a mailing list (starlink@lists.bufferbloat.net) to discuss this and other ongoing measurements (and one of the participants steered me to your github).

@dtaht dtaht added the enhancement New feature or request label May 3, 2021
@dtaht
Copy link
Author

dtaht commented May 3, 2021

This is the typical behavior of the uplink with 400ms of bufferbloat.

tcp_nup_-_starlink-nosqm

It's mind boggling they still use a fifo after 8 years of fq_codel in wide deployment, the default in linux, osx, ios, openwrt, etc... Sigh.

Anyway, with sqm in play - without active monitoring to keep things under control - we can do much better.

tcp_nup_-_starlink-nosqm_cake_compared2

Give flent a try on your connection?

@dtaht
Copy link
Author

dtaht commented May 3, 2021

flent (from a linux box) can also measure uplink rtts in the tcp stream itself without needing a packet capture.

flent -H fremont.starlink.taht.net -t whateverthetestparamsare --socket-stats -x --step-size=.05 --te=upload_streams=16 tcp_nup

tcp_rtt_compared

At t+28 is where cake without measurement & adjustment from a tool like yours, lost control of the queue.

And a plot of suitable type (svgs are best) extracted from that same command line... (-o plotname.svg) or via flent-gui...

While there are a set of very interesting plots that can be extracted (I like cdfs but given starlink going up and down it pays to have a look at the more detailed plots like in the first ones above).

It's also possible to see interesting behavior by running the flent test for many minutes, but the resulting bufferbloat
tends to make the link unusable (so in my case I'm running tests from 1AM to 6AM).

I tend to think the bloat on starlink is one of the sources of the many complaints about slowness, jitter, calls dropping, gaming performance suffering, etc. But more data more widely collected would be nice. You can clearly
see with a longer test (-l 1200) some other details:

tcptrends_-_starlink-nosqm

Falling out of this data is what appears to be a roughly 15 sec period bandwidth adjustment at the headend, which
changes the rate... without changing the buffer size to suit or FQ'ing the results at all.

@virtuallynathan
Copy link

Thanks for this Dave! Glad to see you are still working on this! I'll be testing this once I get my dish for sure.

@dtaht
Copy link
Author

dtaht commented Oct 10, 2021

did you ever get your dishy?

I had a fun encounter with starlink folk, described here: https://www.youtube.com/watch?v=c9gLo6Xrwgw

still waiting for progress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

3 participants