.. onionperf documentation master file, created by sphinx-quickstart on Fri Mar 3 18:35:00 2023. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. OnionPerf ========= * :ref:`search` - `Overview <#overview>`__ - `What does OnionPerf do? <#what-does-onionperf-do->`__ - `What does OnionPerf not do? <#what-does-onionperf--not--do->`__ - `Installation <#installation>`__ - `Tor <#tor>`__ - `TGen <#tgen>`__ - `OnionPerf <#onionperf-1>`__ - `Measurement <#measurement>`__ - `Starting and stopping measurements <#starting-and-stopping-measurements>`__ - `Output directories and files <#output-directories-and-files>`__ - `Changing Tor configurations <#changing-tor-configurations>`__ - `Changing the TGen traffic model <#changing-the-tgen-traffic-model>`__ - `Sharing measurement results <#sharing-measurement-results>`__ - `Troubleshooting <#troubleshooting>`__ - `Analysis <#analysis>`__ - `Analyzing measurement results <#analyzing-measurement-results>`__ - `Filtering measurement results <#filtering-measurement-results>`__ - `Visualizing measurement results <#visualizing-measurement-results>`__ - `Interpreting the PDF output format <#interpreting-the-pdf-output-format>`__ - `Interpreting the CSV output format <#interpreting-the-csv-output-format>`__ - `Visualizations on Tor Metrics <#visualizations-on-tor-metrics>`__ - `Contributing <#contributing>`__ Overview -------- What does OnionPerf do? ~~~~~~~~~~~~~~~~~~~~~~~ OnionPerf measures performance of bulk file downloads over Tor. Together with its predecessor, Torperf, OnionPerf has been used to measure long-term performance trends in the Tor network since 2009. It is also being used to perform short-term performance experiments to compare different Tor configurations or implementations. OnionPerf uses multiple processes and threads to download random data through Tor while tracking the performance of those downloads. The data is served and fetched on localhost using two TGen (traffic generator) processes, and is transferred through Tor using Tor client processes and an ephemeral Tor onion service. Tor control information and TGen performance statistics are logged to disk and analyzed once per day to produce a JSON analysis file that can later be used to visualize changes in Tor client performance over time. What does OnionPerf *not* do? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ OnionPerf does not attempt to simulate complex traffic patterns like a web-browsing user or a voice-chatting user. It measures a very specific user model: a bulk 5 MiB file download over Tor. OnionPerf does not interfere with how Tor selects paths and builds circuits, other than setting configuration values as specified by the user. As a result it cannot be used to measure specific relays nor to scan the entire Tor network. Installation ------------ OnionPerf has several dependencies in order to perform measurements or analyze and visualize measurement results. These dependencies include Tor, TGen (traffic generator), and a few Python packages. The following description was written with a Debian system in mind but should be transferable to other Linux distributions and possibly even other operating systems. Tor ~~~ OnionPerf relies on the ``tor`` binary to start a Tor process on the client side to make client requests and another Tor process on the server side to host onion services. The easiest way to satisfy this dependency is to install the ``tor`` package, which puts the ``tor`` binary into the ``PATH`` where OnionPerf will find it. Optionally, systemd can be instructed to make sure that ``tor`` is never started as a service: .. code:: shell sudo apt install tor sudo systemctl stop tor.service sudo systemctl mask tor.service Alternatively, Tor can be built from source: .. code:: shell sudo apt install automake build-essential libevent-dev libssl-dev zlib1g-dev cd ~/ git clone https://git.torproject.org/tor.git cd tor/ ./autogen.sh ./configure --disable-asciidoc make In this case the resulting ``tor`` binary can be found in ``~/tor/src/app/tor`` and needs to be passed to OnionPerf’s ``--tor`` parameter when doing measurements. TGen ~~~~ OnionPerf uses TGen to generate traffic on client and server side for its measurements. Installing dependencies, cloning TGen to a subdirectory in the user’s home directory, and building TGen is done as follows: .. code:: shell sudo apt install cmake libglib2.0-dev libigraph0-dev make cd ~/ git clone https://github.com/shadow/tgen.git cd tgen/ mkdir build cd build/ cmake .. make The TGen binary will be contained in ``~/tgen/build/src/tgen``, which is also the path that needs to be passed to OnionPerf’s ``--tgen`` parameter when doing measurements. .. _onionperf-1: OnionPerf ~~~~~~~~~ OnionPerf is written in Python 3. The following instructions assume that a Python virtual environment is being used, even though installation is also possible without that. The virtual environment is created, activated, and tested using: .. code:: shell sudo apt install python3-venv cd ~/ python3 -m venv venv source venv/bin/activate which python3 The last command should output something like ``~/venv/bin/python3`` as the path to the ``python3`` binary used in the virtual environment. The next step is to clone the OnionPerf repository and install its requirements: .. code:: shell git clone https://git.torproject.org/onionperf.git pip3 install --no-cache -r onionperf/requirements.txt The final step is to install OnionPerf and print out the usage information to see if the installation was successful: .. code:: shell cd onionperf/ python3 setup.py install cd ~/ onionperf --help The virtual environment is deactivated with the following command: .. code:: shell deactivate However, in order to perform measurements or analyses, the virtual environment needs to be activated first. This will ensure all the paths are found. If needed, unit tests are run with the following command: .. code:: shell cd ~/onionperf/ python3 -m nose --with-coverage --cover-package=onionperf Measurement ----------- Performing measurements with OnionPerf is done by starting an ``onionperf`` process that itself starts several other processes and keeps running until it is interrupted by the user. During this time it performs new measurements every 5 minutes and logs measurement results to files. Ideally, OnionPerf is run detached from the terminal session using tmux, systemd, or similar, except for the most simple test runs. The specifics for using these tools are not covered in this document. Starting and stopping measurements ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The most trivial configuration is to measure onion services only. In that case, OnionPerf runs without needing any additional configuration. For direct measurements via exit nodes, firewall rules or port forwarding may be required to allow inbound connections to the TGen server. Starting these measurements is as simple as: .. code:: shell cd ~/ onionperf measure --onion-only --tgen ~/tgen/build/tgen --tor ~/tor/src/app/tor OnionPerf logs its main output on the console and then waits indefinitely until the user presses ``CTRL-C`` for graceful shutdown. It does not, however, print out measurement results or progress on the console, just a heartbeat message every hour. OnionPerf’s ``measure`` mode has several command-line parameters for customizing measurements. See the following command for usage information: .. code:: shell onionperf measure --help Output directories and files ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ OnionPerf writes several files to two subdirectories in the current working directory while doing measurements: - ``onionperf-data/`` is the main directory containing measurement results. - ``htdocs/`` is created at the first UTC midnight after starting and contains measurement analysis result files that can be shared via a local web server. - ``$date.onionperf.analysis.json.xz`` contains extracted metrics in OnionPerf’s analysis JSON format. - ``index.xml`` contains a directory index with file names, sizes, last-modified times, and SHA-256 digests. - ``tgen-client/`` is the working directory of the client-side ``tgen`` process. - ``log_archive/`` is created at the first UTC midnight after starting and contains compressed log files from previous UTC days. - ``onionperf.tgen.log`` is the current log file. - ``tgen.graphml.xml`` is the traffic model file generated by OnionPerf and used by TGen. - ``tgen-server/`` is the working directory of the server-side ``tgen`` process with the same structure as ``tgen-client/``. - ``tor-client/`` is the working directory of the client-side ``tor`` process. - ``log_archive/`` is created at the first UTC midnight after starting and contains compressed log files from previous UTC days. - ``onionperf.tor.log`` is the current log file containing log messages by the client-side ``tor`` process. - ``onionperf.torctl.log`` is the current log file containing controller events obtained by OnionPerf connecting to the control port of the client-side ``tor`` process. - ``[...]`` (several other files written by the client-side ``tor`` process to its data directory) - ``tor-server/`` is the working directory of the server-side ``tor`` process with the same structure as ``tor-client/``. - ``onionperf-private/`` contains private keys of the onion services used for measurements and potentially other files that are not meant to be published together with measurement results. Changing Tor configurations ~~~~~~~~~~~~~~~~~~~~~~~~~~~ OnionPerf generates Tor configurations for both client-side and server-side ``tor`` processes. There are a few ways to add Tor configuration lines: - If the ``BASETORRC`` environment variable is set, OnionPerf appends its own configuration options to the contents of that variable. Example: .. code:: shell BASETORRC=$'Option1 Foo\nOption2 Bar\n' onionperf ... - If the ``--torclient-conf-file`` and/or ``--torserver-conf-file`` command-line arguments are given, the contents of those files are appended to the configurations of client-side and/or server-side ``tor`` process. - If the ``--additional-client-conf`` command-line argument is given, its content is appended to the configuration of the client-side ``tor`` process. These options can be used, for example, to change the default measurement setup use bridges (or pluggable transports) by passing bridge addresses as additional client configuration lines as follows: .. code:: shell onionperf measure --additional-client-conf="UseBridges 1\nBridge 72.14.177.231:9001 AC0AD4107545D4AF2A595BC586255DEA70AF119D\nBridge 195.91.239.8:9001 BA83F62551545655BBEBBFF353A45438D73FD45A\nBridge 148.63.111.136:35577 768C8F8313FF9FF8BBC915898343BC8B238F3770" Changing the TGen traffic model ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ OnionPerf is a relatively simple tool that can be adapted to do more complex measurements beyond what can be configured on the command line. For example, the hard-coded traffic model generated by OnionPerf and executed by the TGen processes is to send a small request from client to server and receive a relatively large response of 5 MiB of random data back. This model can be changed by editing ``~/onionperf/onionperf/model.py``, rebuilding, and restarting measurements. For specifics, see the `TGen documentation <https://github.com/shadow/tgen/blob/master/doc/TGen-Overview.md>`__ and `TGen traffic model examples <https://github.com/shadow/tgen/blob/master/tools/scripts/generate_tgen_config.py>`__. Sharing measurement results ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Measurement results can be further analyzed and visualized on the measuring host. But in many cases it’s more convenient to do analysis and visualization on another host, also to compare measurements from different hosts to each other. There are at least two common ways of sharing measurement results: 1. Creating a tarball of the ``onionperf-data/`` directory; and 2. Using a local web server to serve the contents of the ``onionperf-data/`` directory. The details of doing either of these two methods are not covered in this document. Troubleshooting ~~~~~~~~~~~~~~~ If anything goes wrong while doing measurements, OnionPerf typically informs the user in its console output. This is also the first place to look for investigating any issues. The second place would be to check the log files in ``~/onionperf-data/tgen-client/`` or ``~/onionperf-data/tor-client/``. The most common configuration problems are probably related to firewall and port forwarding for doing direct (non onion-service) measurements. The specifics for setting up the firewall are out of scope for this document. Another class of common issues of long-running measurements is that one of the ``tgen`` or ``tor`` processes dies for reasons or hints (hopefully) to be found in their respective log files. In order to avoid extended downtimes it is recommended to deploy monitoring tools that check whether measurement results produced by OnionPerf are fresh. The specifics are, again, out of scope for this document. Analysis -------- The next steps after performing measurements are to analyze and optionally visualize measurement results. Analyzing measurement results ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ While performing measurements, OnionPerf writes quite verbose log files to disk. The first step in the analysis is to parse these log files, extract key metrics, and write smaller and more structured measurement results to disk. This is done with OnionPerf’s ``analyze`` mode. For example, the following command analyzes current log files of a running (or stopped) OnionPerf instance (as opposed to log-rotated, compressed files from previous days): .. code:: shell onionperf analyze --tgen ~/onionperf-data/tgen-client/onionperf.tgen.log --torctl ~/onionperf-data/tor-client/onionperf.torctl.log The output analysis file is written to ``onionperf.analysis.json.xz`` in the current working directory. The file format is described in more detail in ``schema/onionperf-3.0.json``. The same analysis files are written automatically as part of ongoing measurements once per day at UTC midnight and can be found in ``onionperf-data/htdocs/``. OnionPerf’s ``analyze`` mode has several command-line parameters for customizing the analysis step: .. code:: shell onionperf analyze --help Filtering measurement results ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The ``filter`` subcommand can be used to filter out measurement results based on given criteria. This subcommand is typically used in combination with the ``visualize`` subcommand. The workflow is to apply one or more filters and then visualize only those measurements with an existing mapping between TGen transfers/streams and Tor streams/circuits. Currently, OnionPerf measurement results can be filtered based on Tor relay fingerprints found in Tor circuits, although support for filtering based on Tor streams and/or TGen transfers/streams may be added in the future. The ``filter`` mode takes a list of fingerprints and one or more existing analysis files as inputs and outputs new analysis files with the same contents as the input analysis files plus annotations on those Tor circuits that have been filtered out. If a directory of analysis files is given to ‘-i’, the structure and filenames of that directory are preserved under the path specified with ‘-o’. For example, the analysis file produced above can be filtered with the following command, which retains only those Tor circuits with fingerprints contained in the file ‘fingerprints.txt’: .. code:: shell onionperf filter -i onionperf.analysis.json.xz -o filtered.onionperf.analysis.json.xz --include-fingerprints fingerprints.txt OnionPerf’s ``filter`` command usage can be inspected with: .. code:: shell onionperf filter --help Visualizing measurement results ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Step two in the analysis is to process analysis files with OnionPerf’s ``visualize`` mode which produces CSV and PDF files as output. For example, the analysis file produced above can be visualized with the following command, using “Test Measurements” as label for the data set: .. code:: shell onionperf visualize --data onionperf.analysis.json.xz "Test Measurements" As a result, three files are written to the current working directory: - ``onionperf.viz.$datetime.csv`` contains visualized data in a CSV file format; and - ``onionperf.viz.$datetime.pdf`` contains visualizations in a PDF file format. - ``onionperf.outliers.$datetime.pdf`` contains measurement outliers visualizations in a PDF file format. By default, both the base pdf and the outliers pdf are produced, but this can be controlled using the ``-c`` switch on the command line. For analysis files containing tor circuit filters, only measurements with an existing mapping between TGen transfers/streams Tor streams/circuits which have not been marked as ‘filtered_out’ are visualized. Similar to the other modes, OnionPerf’s ``visualize`` mode has command-line parameters for customizing the visualization step: .. code:: shell onionperf visualize --help Interpreting the PDF output format ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The base PDF output file contains visualizations of the following metrics: - Time to download first (last) byte, which is defined as elapsed time between starting a measurement and receiving the first (last) byte of the HTTP response. - Throughput, which is computed from the elapsed time between receiving 0.5 and 1 MiB of the response for 1MiB transfers, and from the elapsed time between receiving 4 and 5 MiB of the response for 5MiB transfers. - Number of downloads. - Number and type of failures. The measurement outliers PDF output file contains visualizations of the following metrics: - Outlier relays in the TTFB (time to first byte) dataset, for public service measurements - Outlier relays in the TTFB and TTLB datasets, for onion service measurements - Common outliers in the TTFB and TTLB dataset across both public and onion service measurements - Relays most seen in circuits that failed with errors for both public and onion measurements By default, we consider measurement results in the 75th percentile and only display the top 15 fingerprints that appear the most by count. This can be changed with command line arguments: .. code:: shell onionperf visualize -d <file(s)> label --percentile 90 --threshold 50 Interpreting the CSV output format ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The CSV output file contains the same data that is visualized in the PDF file. It contains the following columns: - ``id`` is the identifier used in the TGen client logs which may be useful to look up more details about a specific measurement. - ``error_code`` is an optional error code if a measurement did not succeed. - ``filesize_bytes`` is the requested file size in bytes. - ``label`` is the data set label as given in the ``--data/-d`` parameter to the ``visualize`` mode. - ``server`` is set to either ``onion`` for onion service measurements or ``public`` for direct measurements. - ``start`` is the measurement start time. - ``time_to_first_byte`` is the time in seconds (with microsecond precision) to download the first byte. - ``time_to_last_byte`` is the time in seconds (with microsecond precision) to download the last byte. Visualizations on Tor Metrics ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The analysis and visualization steps above can all be done by using the OnionPerf tool. In addition to that it’s possible to visualize OnionPerf analysis files using other tools. For example, the `Tor Metrics website <https://metrics.torproject.org/torperf.html>`__ contains various graphs based OnionPerf data. Contributing ------------ The OnionPerf code is developed at https://gitlab.torproject.org/tpo/network-health/metrics/onionperf. Contributions to OnionPerf are welcome and encouraged!