7. Bandwidth authority and measurements

This is an slightly updated version from [BandwidthAuthorityMeasurements]

7.1. What does Bandwidth Measurement Optimise?

The goal of the bandwidth authorities is to balance load across the network such that a user can expect to have the same average stream capacity regardless of path. Any deviation from this ideal load balancing can be regarded as error.

https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/README.spec.txt#n353 (This spec has not been updated as the code has changed.)

So the Tor network should give an equal share of bandwidth to each relay, based on its capacity. But this does not always happen in practice, due to geographical effects: https://atlas.torproject.org/#map (consensus weight vs advertised bandwidth).

Todo

Create a graph consensus weight vs advertised bandwdith?

TODO: work out how to measure how successful we are

At [PerformanceWork] we have metrics for latency, throughput, capacity, and reliability.

7.1.1. What other things might we want bandwidth measurement to optimise?

Using the Tor network needs to be a good experience for clients. This means:

  • maximising throughput, and

  • minimising latency. Different clients have different needs: a SSH or IRC client wants low latency, but a big download wants throughput. Tor 0.2.9.8 and later try to prioritise interactive sessions over bulk transfers.

7.2. What do Bandwidth Authorities Measure?

Tor bandwidth scanners measure download speed by downloading or uploading files. The scanner is a Tor client that connects to a bandwidth server using a two-hop path. :strike: The path has one Guard/Middle node, and one Exit. These relays are selected from a group of relays with similar bandwidths. The scanner uses larger files for relays with larger bandwidths.

Scanners measure the time it takes to download or upload the file. If we want to measure what clients see, we have to measure both latency and throughput.

The throughput of the circuit depends on the spare throughput of both relays in the circuit. The relays may also limit the per-circuit throughput. (TODO: what is this limit?)

Circuit latency adds to the connection setup time. It also takes longer for the client to ask for more data. The time it takes to build the circuit depends on the latency between the scanner, the entry, the exit, and the bandwidth server. Busy relays can be slow sending data to the network, or drop packets. Dropped packets need to be re-sent from the end of the circuit. Even if there is no congestion, tor clients still need ask for more data. (TODO: what is the limit on SENDME cells?)

7.3. Overall Network Improvements

An extra bandwidth authority makes all the bandwidth measurements for the tor network more stable.

The relay measurement is the median of all the different authority measurements for that relay. If there are N bandwidth authorities, each authority is the median for 1/N relays. A bandwidth authority also reduces the variance for 3/N relays.

A graph of the relays that are the median for each bandwidth authority is in [issues21882], [GraphsHistorical], [bwauthstatus].

7.4. Should I put a bandwidth server on IPv6?

We think so. Tor Browser sets PreferIPv6 on its SOCKSPorts, so Exits connect via IPv6 if it is available. We want to do this for bandwidth scanners, so they measure what clients see (#21995).

Here’s how to configure a dual-stack bandwidth authority:

  1. Set up a bandwidth scanner and one (or more) bandwidth servers

  2. Add the following bandwidth server addresses to the bandwidth scanner:

  • an IPv4-only address (either DNS or an IPv4 anycast literal)

  • a dual-stack DNS address

That way, IPv4-only exits get measured 100% via IPv4, and dual-stack exits get measured 50% via IPv4 and 50% via IPv6.

We don’t want to add IPv6 literals or IPv6-only addresses, because IPv4-only Exits will fail on those addresses.

Since we started to use Simple bandwidth scanner (sbws), the DNS resolver at the Exit is the one that returns either an IPv4 or IPv6 address. The Tor DNS resolver only resolves IPv4 address (issue number?, still true?)

7.5. Does Bandwidth Authority Location Matter?

Yes! Tor sends more Guard and Middle bandwidth to relays close to the bandwidth scanner, and more Exit bandwidth to relays close to the bandwidth server.

7.5.1. Current Bandwidth Authority Locations

All of the current bandwidth scanners are in North America or Europe, and most of the bandwidth servers are in North America or Europe, with one in Asia.

We’re working to change this, by putting bandwidth servers (and scanners) on other continents.

(TODO: add a table with actual locations, if the operators are ok with that) From [bandwidth_authorities_timeline] there’re currently: - 3 bandwdith authorities in North America - 2 bandwdith authorities in Europe - 2 Web server in North America, used by 4 of the bandwidth authorities - 1 Web server in Europe

7.5.2. How does Location Impact Tor Clients?

The current bandwidth authority locations mean that relays in North America and Europe handle more traffic:

  • The Tor network is faster for all clients. Clients are more likely to choose a path with relays that are near each other. This affects hidden services the most, because they have 6-hop long paths.

  • But Exits in North America and Europe can get overloaded, and slow down the network for all clients. This does not happen as much for Guards, because there are more Guards than Exits. (TODO: measure Exit congestion <- we do, where?)

  • Tor clients in North America and Europe are even faster, because their Guard is closer (on average).

  • Websites with servers in North America or Europe are faster through Tor Exits (on average).

  • Websites that use a CDN are faster if the CDN DNS sends the connection to nearby data center, and if the CDN has many servers in North America or Europe.

7.5.3. Bandwidth Authorities Outside North America / Europe

Adding or moving bandwidth authorities changes relay measurements: relays closer to the new location will be measured higher. A bandwidth scanner affects Guard and Middle measurements. It also affects Exit measurements when the Exit can’t be measured as Exit (due exit policy). A bandwidth server affects Exit measurements.

7.5.3.1. A Quick Example: CDN

Using a CDN as a bandwidth server spreads the load from exits around the world. It avoids concentrating exit bandwidth near existing bandwidth servers in the US and Western Europe. This should also move entry and middle bandwidth further away from the scanners in these areas.

However, it can result in very fast measurements if the scanner, both relays, and server are all in the same CDN data center.

Here’s how we can avoid this issue:

  • make sure that the scanner isn’t in one of the CDN data centers,

  • use different CDNs for different scanners,

  • configure the bandwidth scanner with a CDN, and with other bandwidth servers that aren’t in one of the CDN data centers.

7.5.3.2. A Detailed Example: South America

What happens if we add a bandwidth server in South America to a bandwidth authority?

A bandwidth scanner can get files from than one bandwidth server. This is more reliable and provides better measurements. But most scanners don’t do it yet. Let’s assume this one does.

Adding a bandwidth server in South America will shift Exit bandwidth away from Europe, and maybe North America.

Websites in South America will become faster through Tor. (Tor is mainly used for web traffic.)

The average distance between middles and exits will increase, so tor will become slightly slower. But there will be less load on European Exits, which will make them faster for all clients. (TODO: work out which effect wins?)

Guards and Middles that are closer to Exits that are closer to South America get more bandwidth. But it matters much more if Guards and Middles are close to the scanner.

People will put more Exits (and relays) in South America, because they measure better.

There will be a small change to a small number of relays. We use the median measurement, so changing 1/5 bandwidth authorities will not change many relays. And changing 1/2 bandwidth servers on 1 authority makes the size of the impact small. We would need to change scanners and servers on 3/5 bandwidth authorities to change a lot of relay measurements.