A high-level overview

The very high level

Ultimately, Tor runs as an event-driven network daemon: it responds to network events, signals, and timers by sending and receiving things over the network. Clients, relays, and directory authorities all use the same codebase: the Tor process will run as a client, relay, or authority depending on its configuration.

Tor has a few major dependencies, including Libevent (used to tell which sockets are readable and writable), OpenSSL or NSS (used for many encryption functions, and to implement the TLS protocol), and zlib (used to compress and uncompress directory information).

Most of Tor's work today is done in a single event-driven main thread. Tor also spawns one or more worker threads to handle CPU-intensive tasks. (Right now, this only includes circuit encryption and the more expensive compression algorithms.)

On startup, Tor initializes its libraries, reads and responds to its configuration files, and launches a main event loop. At first, the only events that Tor listens for are a few signals (like TERM and HUP), and one or more listener sockets (for different kinds of incoming connections). Tor also configures several timers to handle periodic events. As Tor runs over time, other events will open, and new events will be scheduled.

The codebase is divided into a few top-level subdirectories, each of which contains several sub-modules.

  • ext – Code maintained elsewhere that we include in the Tor source distribution. You should not edit this code if you can avoid it: we try to keep it identical to the upstream versions.
  • lib – Lower-level utility code, not necessarily tor-specific.
  • trunnel – Automatically generated code (from the Trunnel tool): used to parse and encode binary formats.
  • core – Networking code that is implements the central parts of the Tor protocol and main loop.
  • feature – Aspects of Tor (like directory management, running a relay, running a directory authorities, managing a list of nodes, running and using onion services) that are built on top of the mainloop code.
  • app – Highest-level functionality; responsible for setting up and configuring the Tor daemon, making sure all the lower-level modules start up when required, and so on.
  • tools – Binaries other than Tor that we produce. Currently this is tor-resolve, tor-gencert, and the tor_runner.o helper module.
  • test – unit tests, regression tests, and a few integration tests.

In theory, the above parts of the codebase are sorted from highest-level to lowest-level, where high-level code is only allowed to invoke lower-level code, and lower-level code never includes or depends on code of a higher level. In practice, this refactoring is incomplete: The modules in lib are well-factored, but there are many layer violations ("upward dependencies") in core and feature. We aim to eliminate those over time.

Some key high-level abstractions

The most important abstractions at Tor's high-level are Connections, Channels, Circuits, and Nodes.

A 'Connection' (connection_t) represents a stream-based information flow. Most connections are TCP connections to remote Tor servers and clients. (But as a shortcut, a relay will sometimes make a connection to itself without actually using a TCP connection. More details later on.) Connections exist in different varieties, depending on what functionality they provide. The principle types of connection are edge_connection_t (eg a socks connection or a connection from an exit relay to a destination), or_connection_t (a TLS stream connecting to a relay), dir_connection_t (an HTTP connection to learn about the network), and control_connection_t (a connection from a controller).

A 'Circuit' (circuit_t) is persistent tunnel through the Tor network, established with public-key cryptography, and used to send cells one or more hops. Clients keep track of multi-hop circuits (origin_circuit_t), and the cryptography associated with each hop. Relays, on the other hand, keep track only of their hop of each circuit (or_circuit_t).

A 'Channel' (channel_t) is an abstract view of sending cells to and from a Tor relay. Currently, all channels are implemented using OR connections (channel_tls_t). If we switch to other strategies in the future, we'll have more connection types.

A 'Node' (node_t) is a view of a Tor instance's current knowledge and opinions about a Tor relay or bridge.