5. Torflow
measurements aggregation and scaling¶
This documentation is a more extended version than Torflow aggregation and scaling or [BandwidthFileSpec].
5.1. Scaling without PID¶
From Torflow’s README.spec.txt (section 2.2)
In this way, the resulting network status consensus bandwidth values
are effectively re-weighted proportional to how much faster the node
was as compared to the rest of the network.
The variables and steps used in Torflow:
strm_bw
The strm_bw field is the average (mean) of all the streams for the relay
identified by the fingerprint field.
bw_i = \mu(bw_j) = \frac{\sum_{j=1}^{n}bw_j}{n}
filt_bw
The filt_bw field is computed similarly, but only the streams equal to
or greater than the strm_bw are counted in order to filter very slow
streams due to slow node pairings.
bwfilt_j &= max(\mu(bw_j), bw_j) = max\left(\frac{\sum_{j=1}^{n}bw_j}{n}, bw_j\right) bwfilt_i &= max(\mu(bw_j), bw_j) = \frac{\sum_{j=1}^{n}max(\mu, bw_j)}{n_{bwfiltj}} = \frac{\sum_{j=1}^{n}max\left(\frac{\sum_{j=1}^{n}bw_j}{n}, bw_j\right)}{n_{bwfiltj}}
filt_sbw and strm_sbw
for rs in RouterStats.query.filter(stats_clause).\
options(eagerload_all('router.streams.circuit.routers')).all():
tot_sbw = 0
sbw_cnt = 0
for s in rs.router.streams:
if isinstance(s, ClosedStream):
skip = False
#for br in badrouters:
# if br != rs:
# if br.router in s.circuit.routers:
# skip = True
if not skip:
# Throw out outliers < mean
# (too much variance for stddev to filter much)
if rs.strm_closed == 1 or s.bandwidth() >= rs.sbw:
tot_sbw += s.bandwidth()
sbw_cnt += 1
if sbw_cnt: rs.filt_sbw = tot_sbw/sbw_cnt
else: rs.filt_sbw = None
filt_avg, and strm_avg
Once we have determined the most recent measurements for each node, we
compute an average of the filt_bw fields over all nodes we have measured.
filt_avg = sum(map(lambda n: n.filt_bw, nodes.itervalues()))/float(len(nodes))
strm_avg = sum(map(lambda n: n.strm_bw, nodes.itervalues()))/float(len(nodes))
bwfilt &= \mu(bwfilt_i) = \frac{\sum_{i=1}^{n}bwfilt_i}{n} = \frac{\sum_{i=1}^{n}\frac{\sum_{j=1}^{n}max(\mu, bw_j)}{n_{bwfiltj}}}{n} = \frac{\sum_{i=1}^{n}\frac{\sum_{j=1}^{n}max\left(\frac{\sum_{j=1}^{n}bw_j}{n}, bw_j\right)}{n_{bwfiltj}}}{n}
bwstrm &= \mu(bw_i) = \frac{\sum_{i=1}^{n}\mu_i}{n}
true_filt_avg and true_strm_avg
for cl in ["Guard+Exit", "Guard", "Exit", "Middle"]:
true_filt_avg[cl] = filt_avg
true_strm_avg[cl] = strm_avg
In the non-pid case, all types of nodes get the same avg
n.fbw_ratio and n.fsw_ratio
for n in nodes.itervalues():
n.fbw_ratio = n.filt_bw/true_filt_avg[n.node_class()]
n.sbw_ratio = n.strm_bw/true_strm_avg[n.node_class()]
rf_i = \frac{bwfilt_i}{bwfilt} rs_i = \frac{bw_i}{bwstrm}
n.ratio
These averages are used to produce ratios for each node by dividing the
measured value for that node by the network average.
# Choose the larger between sbw and fbw
if n.sbw_ratio > n.fbw_ratio:
n.ratio = n.sbw_ratio
else:
n.ratio = n.fbw_ratio
n.pid_error = 0
n.pid_error_sum = 0
n.new_bw = n.desc_bw*n.ratio
n.pid_bw = n.new_bw # for transition between pid/no-pid
n.change = n.new_bw - n.desc_bw
if n.idhex in prev_consensus:
if prev_consensus[n.idhex].bandwidth != None:
prev_consensus[n.idhex].measured = True
tot_net_bw += n.new_bw
if IGNORE_GUARDS \
and ("Guard" in prev_consensus[n.idhex].flags and not "Exit" in \
prev_consensus[n.idhex].flags):
plog("INFO", "Skipping voting for guard "+n.nick)
n.ignore = True
elif "Authority" in prev_consensus[n.idhex].flags:
r_i = max(rf_i, rs_i) = max\left(\frac{bwfilt_i}{bwfilt}, \frac{bw_i}{bwstrm}\right) = max\left(\frac{bwfilt_i}{\mu(bwfilt_i)}, \frac{bw_i}{\mu(bwfilt_i)}\right)
desc_bw
It is the observed bandwidth
in the descriptor, NOT the average
bandwidth
return Router(ns.idhex, ns.nickname, bw_observed, dead, exitpolicy,
ns.flags, ip, version, os, uptime, published, contact, rate_limited,
ns.orhash, ns.bandwidth, extra_info_digest, ns.unmeasured)
self.desc_bw = max(bw,1) # Avoid div by 0
new_bw
These ratios are then multiplied by the most recent observed descriptor
bandwidth we have available for each node, to produce a new value for
the network status consensus process.
n.new_bw = n.desc_bw*n.ratio
The descriptor observed bandwidth is multiplied by the ratio.
bwnew_i = bwobs_i \times r_i = bwobs_i \times max(rf_i, rs_i) = bwobs_i \times max\left(\frac{bwfilt_i}{bwfilt}, \frac{bw_i}{bwstrm}\right) = bwobs_i \times max\left(\frac{bwfilt_i}{\mu(bwfilt_i)}, \frac{bw_i}{\mu(bw_i)}\right)
Limit the bandwidth to a maximum
NODE_CAP = 0.05
if n.new_bw > tot_net_bw*NODE_CAP:
plog("INFO", "Clipping extremely fast "+n.node_class()+" node "+n.idhex+"="+n.nick+
" at "+str(100*NODE_CAP)+"% of network capacity ("+
str(n.new_bw)+"->"+str(int(tot_net_bw*NODE_CAP))+") "+
" pid_error="+str(n.pid_error)+
" pid_error_sum="+str(n.pid_error_sum))
n.new_bw = int(tot_net_bw*NODE_CAP)
However, tot_net_bw does not seems to be updated when not using pid. This clipping would make faster relays to all have the same value.
bwn_i =& min\left(bwnew_i, \sum_{i=1}^{n}bwnew_i \times 0.05\right) \ &= min\left( \left(bwobs_i \times r_i\right), \sum_{i=1}^{n}\left(bwobs_i \times r_i\right) \times 0.05\right)\ &= min\left( \left(bwobs_i \times max\left(rf_i, rs_i\right)\right), \sum_{i=1}^{n}\left(bwobs_i \times max\left(rf_i, rs_i\right)\right) \times 0.05\right)\ &= min\left( \left(bwobs_i \times max\left(\frac{bwfilt_i}{bwfilt}, \frac{bw_i}{bwstrm}\right)\right), \sum_{i=1}^{n}\left(bwobs_i \times max\left(\frac{bwfilt_i}{bwfilt}, \frac{bw_i}{bwstrm}\right)\right) \times 0.05\right)
5.2. Torflow PID Control feedback¶
The bandwidth authorities measure F_node: the filtered stream
capacity through a given node (filtering is described in Section 1.6).
(=`filt_bw`?)
The PID Setpoint, or target for each node is F_avg: the average F_node
value observed across the entire network.
(=`filt_avg`?)
do we really want SP to be F_avg?
The normalized PID error e(t) for each node is then:
pid_error = e(t) = (F_node - F_avg)/F_avg.
bwfilt = filt_bw
e(t) = \frac{bwfilt_i}{\mu_{bwfilt}} - 1
of the entire population or the sample of this measurement?
In the "Process" step, we take the output of the "controller" and multiply it by
the current consensus bandwidth for the node, and then add this new
proportion to the consensus bandwidth, thereby adjusting the
consensus bandwidth in proportion to the error:
new_consensus_bw = old_consensus_bw +
old_consensus_bw * K_p * e(t) +
old_consensus_bw * K_i * \integral{e(t)} +
old_consensus_bw * K_d * \derivative{e(t)}
new_consensus_bw = bwwn
old_consensus_bw = cbw_o
bwwn = bwwo + bwwo K_p e(t) + bwwo K_i \int _{0}^{t}{e(t)} + bwwo K_d \frac {d}{dt}{e(t)}
why the 1st term?
For the case where K_p = 1, K_i=0, and K_d=0, it can be seen that this
new_bw = old_bw*ratio
what is new_bw and old_bw here?
K_p = 1, K_i=K_d=0: \\ bwn =& bwo + bwo e(t) \\ bwn =& bwo + bwo \left(\frac{bwfilt_i}{\mu_{bwfilt}} - 1\right) \\ bwn =& bwo \frac{bwfilt_i}{\mu_{bwfilt}}
compute an average of the filt_bw fields over all nodes we have measured.
These averages are used to produce ratios for each node by dividing the
measured value for that node by the network average.
These ratios are then multiplied by the most recent observed descriptor
bandwidth we have available for each node, to produce a new value for
the network status consensus process.
bw_i = strm_bw
In the scaling without PID controller
bwn_i =& max\left( \frac{bw_i}{\mu}, \frac{bwfilt_i}{\mu_{bwfilt}} \right) \times bwobs_i
So the called ratio is the same, but not what is considered the new bw.
For purposes of calculating the integral and the derivative of the
error, we assume units of time that correspond to feedback intervals,
eliminating the need to track time for any other purpose other than
determining when to report a measurement.
The integral component, pid_error_sum is subjected to a decay factor
per each interval, to prevent unbounded growth in cases without
convergence.
The differential component is a simple delta, calculating by
subtracting the previous pid_error from the current pid_error.
For the integral component: how the decay factor works?
K_i \int _{0}^{t}{e(t)} = ?
For the differential component:
K_d \frac {d}{dt}{e(t)} = e(t_{i-1}) - e(t_{i})