Data retention¶
We often need to know for how long the measurements are written or read, to know whether some calculations are correct. It’s usually confusing cause we have at least two defaults, 5 and 28 days.
The 28 days default is used:
to keep measurements for this data interval in the file system
to read measurements from this data interval to generate the BandwidthFile
This means that all average values, like
bw_mean
are calculated from measurements from the previous 28 days.
The value comes from globals
:
GENERATE_PERIOD = 28 * 24 * 60 * 60
elif scaling_method == TORFLOW_SCALING:
fresh_days = ceil(GENERATE_PERIOD / 24 / 60 / 60) # 28
results = load_recent_results_in_datadir(
fresh_days, # 28
The 5 days default is used:
to read the measurements from this data intervel during scanning
to keep in memory the measurements for this data interval during scanning
The value comes from config.default.ini
:
data_period = 5
Used in scanner run_speedtest()
:
measurements_period = conf.getint("general", "data_period")
ResultDump
.__init__
:
self.fresh_days = conf.getint("general", "data_period")
self.data = trim_results(self.fresh_days, self.data)
self.data = load_recent_results_in_datadir(
self.fresh_days, self.datadir
)
It’s also globals
:
MEASUREMENTS_PERIOD = 5 * 24 * 60 * 60
PERIOD_DAYS = int(MEASUREMENTS_PERIOD / (24 * 60 * 60)) # 5
Used in RelayList
.__init__
:
measurements_period=MEASUREMENTS_PERIOD
self._measurements_period = measurements_period
days = self._measurements_period
These defaults are overwritten when calling the class from scanner.py
:
measurements_period = conf.getint("general", "data_period") # 5
rl = RelayList(args, conf, controller, measurements_period, state)
These functions are call with either 28 or 5 days:
data_period = fresh_days * 24 * 60 * 60
load_recent_results_in_datadir()
:
data_period = fresh_days + 2
oldest_day = today - timedelta(days=data_period)
results = trim_results(fresh_days, results)