Skip to content

Tor Hackweek Project: Visualize Tor metrics's data in ways that it can be useful for community

Summary: The goal is to have a dashboard with Tor usage per country in a way that is easy to see big changes happening. Right now we need to select each country to see the Tor usage. It would be good to have a way to see all the countries and the onews where usage is increasing (via bridge and direct connection).

Skills Needed: data visualization

Team

  • gus (UTC-3)

  • gaba (UTC-3)

  • tara(?) (UTC +1)

  • joydeep (UTC +5:30)

  • djackson (UTC +1)

Merging with the other project with similar goal

PLAN THURSDAY MARCH 31ST

  • get a domain for mb.torproject.org
  • add shutdown data to metabase and add a dashboard - done
  • import user relays data to metabase - done
  • anomaly detection queries
  • alert system link here

RESOURCES

USE CASES (what do we want to do?)

DATA SETS

  • user stats per country: userstats relay country (the estimated number of directly-connecting clients)

  • date: UTC date (YYYY-MM-DD) for which user numbers are estimated.

  • country: Two-letter lower-case country code as found in a GeoIP database by resolving clients' IP addresses, or "??" if client IP addresses could not be resolved. If this column contains the empty string, all clients are included, regardless of their country code.

  • users: Estimated number of clients.

  • lower: Lower number of expected clients under the assumption that there has been no censorship event. If users < lower, a censorship-related event might have happened in this country on the given day. If this column contains the empty string, there are no expectations on the number of clients.

  • upper: Upper number of expected clients under the assumption that there has been no release of censorship. If users > upper, a censorship-related event might have happened in this country on the given day. If this column contains the empty string, there are no expectations on the number of clients.

  • frac: Fraction of relays in percent that the estimate is based on.

  • bridge user per country / transport: userstats-bridge-combined

  • date: UTC date (YYYY-MM-DD) for which user numbers are estimated.

  • country: Two-letter lower-case country code as found in a GeoIP database by resolving clients' IP addresses, or "??" if client IP addresses could not be resolved.

  • transport: Transport name used by clients to connect to the Tor network using bridges. Examples are "obfs4", "websocket" for Flash proxy/websocket, "fte" for FTE, "<??>" for unknown pluggable transport(s), or "" for the default OR protocol.

  • high: Upper bound of estimated users from the given country and transport.

  • low: Lower bound of estimated users from the given country and transport.

  • frac: Fraction of bridges in percent that the estimate is based on.

format for tor metrics data userstats-relay-country

  • prometheus metrics that tor exports issues-40063

  • Keepiton data on internet shutdowns

  • link to download data: keepiton-stop-data-2020

  • ID

  • start_date

  • end_date

  • duration

  • Info_source

  • news_link

  • continent

  • country

  • State/India

  • geo_scope

  • area_name

  • ordered_by

  • decision_maker

  • shutdown_type_new

  • affected_network

  • full or service-based

  • Facebook_affected

  • Twitter_affected

  • WhatsApp_affected

  • Instagram_affected

  • Telegram_affected

  • other_service_details (specify)

  • SMS_affected

  • phone_call_affected

  • telcos_involved

  • gov_ack

  • official_just

  • other_just_details

  • off_statement

  • actual_cause

  • other_cause_details

  • election

  • violence

  • hr_abuse_reported

  • users_notified

  • users_affected/targetted

  • legal_justif

  • legal_method

  • telco_resp

  • telco_ ack

  • econ_impact

  • event

  • an_link

  • notes

Todo:

  • csv needs some cleaning

  • country needs to be converted to country codes

  • OONI data on censorship events

TOOLS

  • dash-plo
  • jupyter
  • grafana: for the dashboard
  • metabase: straight forward to use
  • apache superset: to explore data
  • csvkit
  • https://pgloader.readthedocs.io/en/latest/ref/csv.html
  • prometheus: for collecting data and alerts (alerts-manager)
  • https://www.bamsoftware.com/git/tor-metrics-country.git/
  • infolabe-anomalies: link

Tasks

  • share the db (gus):

    echo "Downloading userstats"

    curl -O userstat-relay-country

    echo "Removing the first 5 lines"

    sed -i '1,5d' userstats-relay-country.csv

    echo "Import db userstats-relay-country"

    (echo .separator ,; echo .import userstats-relay-country.csv userstats) | sqlite3 userstats-relay-country.db

  • a downloader to get data from metrics regularly to get into the db for the tool/s to use it.

  • try grafana locally and gus will use his vps to install grafana

How prometheus is working at TPI

prometheus-monitored-services link

monitoring

  • connectivity test (through blackbox exporter)

  • rdsys

  • bridgestrap

  • prometheus is configured by puppet prometheus-torproject

  • grafana is configured by hand

Tasks to work on:

  • metrics data into prometheus

  • install and configure component push gateway

  • getting csv into promethus

  • snowflake metrics data into prometheus

  • grafana to visualize data

hackweek-metrics: to launch prometheus locally, parse a csv and send a silly metric to the pushgateway