Trac: Summary: Check prioritisation, it should make 2 measurments that are 24 appart to Check prioritisation, it should make 2 measurements that are 24 appart
There's no parameter that directly controls the frequency with which sbws will scan an arbitrary relay. If you want to increase (or decrease) how often sbws gets around to scanning relay Foo, then you can
increase (or decrease) the number of measurement threads (def: 3, I use 6 when I test sbws)
decrease (or increase) the number of downloads to make per measurement (def: 10, I leave alone)
decrease (or increase) the duration a download must be in order to be considered valid (def: documented here)
If you're against changing those parameters and still can't manage to accumulate the desired number of measurements for relay Foo within the validity period, then you can increase the validity period.
In the name of the added code complexity, I'm against adding a notion to the prioritizer that says it should aim for making a measurement for relay Foo every X hours. Or that measurements should be performed differently when we have a lot of "catching up" to do.
Thanks for the list of options. I'm not against any.
I'm going to use the try/error method to check whether some parameter combination increases the number of relays' measurements to 2.
There's no concrete relay Foo to be prioritized, they all need to get the 2 measurements.
I've been running sbws from home for just over 24 hours.
(venv) matt@spacecow:~/.sbws/datadir$ sbws --log-level info stats --error-types2018-10-17 10:24:18,814 INFO MainThread sbws.py:73 - main - sbws 0.8.1-dev0 with python 3.5.3 on Linux-4.9.0-8-amd64-x86_64-with-debian-9.5, stem 1.7.0, and requests 2.19.17031 relays have recent resultsMean 0.80 successful measurements per relayFor Result: min=1 q1=1 med=2 q3=2 max=4For ResultSuccess: min=0 q1=0 med=1 q3=1 max=3For ResultErrorCircuit: min=0 q1=0 med=0 q3=1 max=4For ResultErrorStream: min=0 q1=0 med=0 q3=0 max=312240 total results, and 45.7% are successes5594 success results and 6646 error resultsThe fastest download was 1901.14 KiB/sResults come from 2018-10-16 13:58:27 to 2018-10-17 14:24:15 over a period of 1 day, 0:25:481128/12240 (9.22%) results were error-stream5518/12240 (45.08%) results were error-circ
Months ago when I ran sbws I would see ~80% success measurement rate. The < 50% success rate is very concerning to me. That was probably with different parameters, and was definitely on a different machine than my home desktop. So I will now begin to run it on a server instead.
Today, around 14h, i have commented the rtt measurements. By the number of results in the file today, it seems faster now...
I also noticed that today every prioritization loop (~300 relays) happens every ~45min. The days before, it was happening every 6h.
I still don't understand why such a difference eliminating that code.
With the data from today, now i get:
number of successfully measured relays with the restriction: 1319
number of measured relays: 5933
the exact code that you are using to do the "2 measurements at least 24 hours apart" test?
I've been running sbws from home for just over 24 hours.
[...]
Months ago when I ran sbws I would see ~80% success measurement rate. The < 50% success rate is very concerning to me. That was probably with different parameters, and was definitely on a different machine than my home desktop. So I will now begin to run it on a server instead.
I am very confused by these results. It seems strange to me that sbws can't make 2 measurements of a relay over a few days.
It also seems weird to me, so that i've been debugging prioritization today, so far i didn't find the reason.
Is there some documentation for relay priority?
If we prioritise failed relays, then most of our circuits will fail. In some cases, we will overload relays that are already failing.
Can you please tell us:
how long sbws has been running for?
Forever :) Results dated older than 7 days ago are not took
Those results were done without the last recent day, but still.
the number of relays measured each day:
This is the number of results each day, not relays:
If any of these things are true, do not put the relay in the bandwidth file:
there are less than 2 sbws measured bandwidths
all the sbws measured bandwidths are within 24 hours of each other
But the code implements:
If there are less than 2 results, do not put the relay in the bandwidth file
If any result is within 24 hours of the previous result, remove it. Then, if there is only one result, do not put the relay in the bandwidth file.
Here are some results and the outputs they should produce:
0h produces None
0h, 24h produces 0h, 24h
0h, 1h, 24h produces 0h, 1h, 24h (but I think the current function produces 0h, 24h)
0h, 24h, 36h produces 0h, 24h, 36h (but I think the current function produces 0h, 24h)
Maybe these functions need some unit tests?
the exact measurement times for a relay that passes the test?
Attachment
I don't understand this attachment.
It looks like a list of all the times that pass the test, with no relay ids.
But I wanted to see a list of times for one relay that passes the test.
the exact measurement times for a relay with multiple measurements that fails the test?
If any of these things are true, do not put the relay in the bandwidth file:
there are less than 2 sbws measured bandwidths
all the sbws measured bandwidths are within 24 hours of each other
But the code implements:
If there are less than 2 results, do not put the relay in the bandwidth file
Is this not correct?
If any result is within 24 hours of the previous result, remove it. Then, if there is only one result, do not put the relay in the bandwidth file.
Here are some results and the outputs they should produce:
0h produces None
0h, 24h produces 0h, 24h
0h, 1h, 24h produces 0h, 1h, 24h (but I think the current function produces 0h, 24h)
0h, 24h, 36h produces 0h, 24h, 36h (but I think the current function produces 0h, 24h)
Ok, i understood this wrong, though this does not change the number of relays.
Maybe these functions need some unit tests?
Yeah, going to implement that
the exact measurement times for a relay that passes the test?
Attachment
I don't understand this attachment.
It looks like a list of all the times that pass the test, with no relay ids.
But I wanted to see a list of times for one relay that passes the test.
the exact measurement times for a relay with multiple measurements that fails the test?
Attachment
All from today, hmm?
Let's just focus on a single relay.
Relay with 3 results that discards 1 result:
241C0134A6F62F40D378562B348531E51B946A07 Results NOT away from each other at least 86400s: 2018-10-17T15:51:23, 2018-10-17T15:51:04
Final results ts: ['2018-10-13T22:02:00', '2018-10-17T15:51:23']
Relay with 2 results that does not pass the test:
D1352AECB24F050C54CA13A8BC84F9613E4E61FD Results NOT away from each other at least 86400s:2018-10-17T11:58:17, 2018-10-17T11:57:47
If any of these things are true, do not put the relay in the bandwidth file:
there are less than 2 sbws measured bandwidths
all the sbws measured bandwidths are within 24 hours of each other
But the code implements:
If there are less than 2 results, do not put the relay in the bandwidth file
If any result is within 24 hours of the previous result, remove it. Then, if there is only one result, do not put the relay in the bandwidth file.
[...]
pastly, i suspect that the reason why the same relay gets measured twice within minutes, is because the threads are not sharing the same vision on the best_priority iterator. Would you mind to have a look to it?
pastly, i suspect that the reason why the same relay gets measured twice within minutes, is because the threads are not sharing the same vision on the best_priority iterator. Would you mind to have a look to it?
I don't think the measurement threads have different copies or ideas about the RelayPrioritizer because it is created in the main thread and never given to worker threads. See it defined here and never given to the workers.
My untested theory is:
A thread will grab one of the last relays in the RelayPrioritizer. While it is measuring that relay, the RelayPrioritizer is recalculating priority, finishes, and gives the same relay to another thread to also measure.
This might be a good time to switch to using Python's PriorityQueue while figuring out the problem. I'll give it some thought this evening.
pastly, i suspect that the reason why the same relay gets measured twice within minutes, is because the threads are not sharing the same vision on the best_priority iterator. Would you mind to have a look to it?
I don't think the measurement threads have different copies or ideas about the RelayPrioritizer because it is created in the main thread and never given to worker threads. See it defined here and never given to the workers.
I agree, i saw that, with "vision" i didn't mean copy
My untested theory is:
A thread will grab one of the last relays in the RelayPrioritizer. While it is measuring that relay, the RelayPrioritizer is recalculating priority, finishes, and gives the same relay to another thread to also measure.
by "vision" i was thinking something like that.
This might be a good time to switch to using Python's PriorityQueue while figuring out the problem. I'll give it some thought this evening.
multiprocessing also have a https://docs.python.org/2/library/multiprocessing.html#multiprocessing.Queue class. However i thought that if threads have to wait to 1 measurement to finish, then will slow down the process.
What about just getting a list of relays to measure (instead of an iterator), and do not call best_priority again until the list is almost finished?.
If you're a busy, i can try that.
Thanks for looking at it.
What about just getting a list of relays to measure (instead of an iterator), and do not call best_priority again until the list is almost finished?.
If you're a busy, i can try that.
Thanks for looking at it.
I think we'll have the same problem unless: when the we're measuring the last relay the prioritizer returned (either via an iterator or a list) we don't give the other measurement threads anything until it's done. When it is done, we recalculate priority and start handing out relays to the worker threads again.
I don't think having all the measurement threads stop and wait once in a while for the slowest of them to finish up so we can regenerate a new set of relays to measure is going to significantly impact how long it takes to scan the entire network.
i had that branch running for 1 day and checked the dates of the results that do not pass the restrictions. They were close in hours, not minutes, as it happened before.
Compare the timestamp of the results being meausured from yesterday:
2018-10-26 13:35:05,392 DEBUG MainThread v3bwfile.py:349 - results_away_each_other - Results are NOT away from each other in at least 86400s: ['2018-10-25T11:54:17', '2018-10-26T07:45:14']
2018-10-26 13:35:05,395 DEBUG MainThread v3bwfile.py:349 - results_away_each_other - Results are NOT away from each other in at least 86400s: ['2018-10-25T15:47:16', '2018-10-26T04:04:18']
...
To the ones days before:
2018-10-26 14:59:16,270 DEBUG MainThread v3bwfile.py:349 - results_away_each_other - Results are NOT away from each other in at least 60s: ['2018-10-25T12:58:22', '2018-10-25T12:58:40']
2018-10-26 14:59:18,271 DEBUG MainThread v3bwfile.py:349 - results_away_each_other - Results are NOT away from each other in at least 60s: ['2018-10-24T12:42:44', '2018-10-24T12:43:18']