Data retention

We often need to know for how long the measurements are written or read, to know whether some calculations are correct. It’s usually confusing cause we have at least two defaults, 5 and 28 days.

The 28 days default is used:

  • to keep measurements for this data interval in the file system

  • to read measurements from this data interval to generate the BandwidthFile

    This means that all average values, like bw_mean are calculated from measurements from the previous 28 days.

The value comes from globals:

GENERATE_PERIOD = 28 * 24 * 60 * 60

Used in generator main():

elif scaling_method == TORFLOW_SCALING:
    fresh_days = ceil(GENERATE_PERIOD / 24 / 60 / 60)  # 28

results = load_recent_results_in_datadir(
    fresh_days,  # 28

The 5 days default is used:

  • to read the measurements from this data intervel during scanning

  • to keep in memory the measurements for this data interval during scanning

The value comes from config.default.ini:

data_period = 5

Used in scanner run_speedtest():

measurements_period = conf.getint("general", "data_period")

ResultDump .__init__:

self.fresh_days = conf.getint("general", "data_period")

store_result():

self.data = trim_results(self.fresh_days, self.data)

enter():

self.data = load_recent_results_in_datadir(
    self.fresh_days, self.datadir
)

It’s also globals:

MEASUREMENTS_PERIOD = 5 * 24 * 60 * 60
PERIOD_DAYS = int(MEASUREMENTS_PERIOD / (24 * 60 * 60))  # 5

Used in RelayList .__init__:

measurements_period=MEASUREMENTS_PERIOD
self._measurements_period = measurements_period

_init_relays():

days = self._measurements_period

These defaults are overwritten when calling the class from scanner.py:

measurements_period = conf.getint("general", "data_period")  # 5
rl = RelayList(args, conf, controller, measurements_period, state)

These functions are call with either 28 or 5 days:

trim_results():

data_period = fresh_days * 24 * 60 * 60

load_recent_results_in_datadir():

data_period = fresh_days + 2
oldest_day = today - timedelta(days=data_period)

results = trim_results(fresh_days, results)