#13600 (moved) introduced three new execution modes with a separate mode for writing files in the out/ directory. However, a newly written out/ directory may differ from a previous out/ directory even if the status/ directory has not changed. The main (or only?) reason is that documents containing graphs only contain the past 3 days, 1 week, etc. of data ending at the current system time when writing documents. We could change that and write the last known 3 days, 1 week, etc. of data to make the process of writing the out/ directory from data in the status/ directory more deterministic. We should try changing that to see if there are other differences. If it's easy, let's do it.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items
0
Link issues together to show that they're related.
Learn more.
Include exit list entries if they are more than 24 hours older than the last network status consensus containing the relay, rather than 24 hours prior to system time.
Make first_seen_days and last_seen_days parameters independent of system time by using publication time of latest network status instead of current system time. This has the advantage that results don't suddenly change without relays_published or bridges_published changing at the same time.
I already gave this a try a few months ago, and yesterday I went through that code again and improved it quite a bit. I think it's ready for review now. Please take a careful look at my task-16513 branch.
Note that I did not consider the two related thoughts stated above. Those are not directly related to writing the files in out/. Still worth considering, but as part of another (new) ticket.
I'm going to revise my branch based on an idea I had yesterday related to #24831 (moved): let's use the time when we last saw a relay in a consensus (and a bridge in a status) to determine which graphs to produce. This has advantages over other approaches:
It does not depend on system time, which is what we're trying to resolve here.
It does not rely on statistics reported by the relay or bridge, which breaks if a relay or bridge has a broken clock, and which may lead to different documents (bandwidth, weights, etc.) showing somewhat different time periods.
Branch follows, but I'll have to implement and test the idea first.
Please review commit 0de5770 in my task-16513-2 branch. Setting priority to high, because I'd like to base my #24831 (moved) work on this branch (or a revised version of it), and it would be cool to move forward with that while having a possible plan in mind. (Thanks!)
Trac: Status: needs_revision to needs_review Priority: Medium to High
[junit] Testcase: testIgnoreFuture took 0.392 sec [junit] FAILED [junit] expected:<1> but was:<0> [junit] junit.framework.AssertionFailedError: expected:<1> but was:<0> [junit] at org.torproject.onionoo.writer.BandwidthDocumentWriterTest.testIgnoreFuture(BandwidthDocumentWriterTest.java:74)
Uhm, yes. Please find the fixup commit that I just pushed to the same branch. (I changed that test a couple of times, and it looks like it ended up being in a mixed state at the end.)
Shouldn't the new method parameter 'long now' (in various methods) be named differently to reflect the meaning of the milliseconds timestamp? Maybe 'long lastMeasurementMillis' or similar?
How will the data be affected when deploying the fix? The stored data timestamps might be more recent than the last seen value used in the new code. Did you run system tests on a local instance?
This should be clarified before deployment, but the code is merge ready.
Shouldn't the new method parameter 'long now' (in various methods) be named differently to reflect the meaning of the milliseconds timestamp? Maybe 'long lastMeasurementMillis' or similar?
Good point. Let's go with lastSeenMillis which is what we use in NodeStatus, too. I'll change that before merging.
How will the data be affected when deploying the fix? The stored data timestamps might be more recent than the last seen value used in the new code. Did you run system tests on a local instance?
This should be clarified before deployment, but the code is merge ready.
I'll run another local test before merging. Once that's done, let's coordinate deployment as usual.
I renamed now to lastSeenMillis in commit 27794f7, and I realized that we'll have to make similar changes to updating status which I did in commit 7e99b95. Please take another look at least at that second commit.
Right now I'm downloading omeiense's status/ and out/ directories and will run a local Onionoo server for a few hours, just to be really sure that it's doing the right thing.
I renamed now to lastSeenMillis in commit 27794f7, and I realized that we'll have to make similar changes to updating status which I did in commit 7e99b95. Please take another look at least at that second commit.
Good that you thought of this!
When taking another look at the tests, I'd say it's time to start using new DateTime classes. For example:
This is a bit longer, but will make the tests more readable and it will be immediately obvious what timestamp is used in a test. Maybe, start at least with the newly introduced tests here?
Right now I'm downloading omeiense's status/ and out/ directories and will run a local Onionoo server for a few hours, just to be really sure that it's doing the right thing.
Tests and checks pass. Ready for merge when the system test succeeds.
Hmm, or should we replace all those milliseconds in the code with Java 8 date/time constructs all in a single step? I have to admit, I can't believe that that's the most elegant way of converting seven numbers into one number, yet you may be right. But I'd want to read up on Java 8 things first before changing things. (We do have a ticket for that, right?) Happy to add a ISO string what the long number means as a comment next to that line.
System tests have not even begun. I'm still working on downloading the status/ directory from omeiense...
Would you be able to review the last four commits in my updated task-16513-2 branch? It contains several fixes from local tests with omeiense's status/ directory.
I'm currently running another test for the last commit in that branch, and I'm considering a rewrite of the code that produces a graph history, so that we can reduce code duplication at least a bit. I'll see if that last part can be done within reasonable time, and if not, put it on hold again.
Would you be able to review the last four commits in my updated task-16513-2 branch? It contains several fixes from local tests with omeiense's status/ directory.
From skimming the commits:
Could you easily add simple tests for the comparison fix and the one-off-bug? These will document clearly what is intended with the current code.
I'm currently running another test for the last commit in that branch, and I'm considering a rewrite of the code that produces a graph history, so that we can reduce code duplication at least a bit. I'll see if that last part can be done within reasonable time, and if not, put it on hold again.
Would you be able to review the last four commits in my updated task-16513-2 branch? It contains several fixes from local tests with omeiense's status/ directory.
From skimming the commits:
Could you easily add simple tests for the comparison fix and the one-off-bug? These will document clearly what is intended with the current code.
All checks and tests pass; good to start using more java.time classes.
Would it be possible to rename the one-letter variables in BandwidthDocumentWriter; there are other places, too, but let's start slowly.
Merge ready, when all system tests ran/are running fine.
Thanks for checking. I just added one small (but important) fixup commit (c915794) and another big (and also important) refactoring commit (699d826) to my branch. That second commit reduces code duplication by a lot, which I think is an important step towards making this code more maintainable by other people than me in the future.
However, I'm running out of time here. The new code is way better documented than usual, it compiles, checkstyle doesn't have complaints, and all tests pass, including new ones. But I can't spend another day on running system tests, and I do expect the output to be somewhat different from current output. But each difference needs investigation, and maybe the old output was wrong and the new output is right.
iwakeh or maybe irl, can I ask either of you to take over here? Reviewing and testing this code is not a trivial task, and it will almost certainly take longer than a few hours. But I could imagine it's a good way to dive deeper into this part of Onionoo, and at the same time ensure that this code will be more readable.
The list of exit_addresses is one more data item in onionoo's details documents that is depending on the current system time. Depending on the current time the addresses are added or omitted to the stored collection (cf. NodeDetailsStatusUpdater). Could the validAfter time of the latest consensus processed be a replacement for the system time?
The list of exit_addresses is one more data item in onionoo's details documents that is depending on the current system time. Depending on the current time the addresses are added or omitted to the stored collection (cf. NodeDetailsStatusUpdater). Could the validAfter time of the latest consensus processed be a replacement for the system time?
Yes. We should probably do that. I haven't tried it out yet, nor have I looked around for similar places where we're depending on system time there.
However! That's a different issue. It's about making the update step less dependent on things like system time, though we won't be able to make it fully deterministic. Still a good thing to do. But it's different from making the write step deterministic where we're taking a status/ directory as input and writing files to the out/ directory. Should we collect this idea on a new ticket, like, "Make updating status files independent of system time"?
The list of exit_addresses is one more data item in onionoo's details documents that is depending on the current system time. Depending on the current time the addresses are added or omitted to the stored collection (cf. NodeDetailsStatusUpdater). Could the validAfter time of the latest consensus processed be a replacement for the system time?
Yes. We should probably do that. I haven't tried it out yet, nor have I looked around for similar places where we're depending on system time there.
However! That's a different issue. It's about making the update step less dependent on things like system time, though we won't be able to make it fully deterministic. Still a good thing to do. But it's different from making the write step deterministic where we're taking a status/ directory as input and writing files to the out/ directory. Should we collect this idea on a new ticket, like, "Make updating status files independent of system time"?
Oh, I took this ticket as meaning 'Make results from Onionoo deterministic', but writing and updating are different concerns. Will open a new ticket (rather with the meaning I perceived) and check for more issues in the update process while continuing the review.
Raising priority of this ticket even more, because it's currently blocking #24729 (moved) which confuses a lot of relay operators and leads to repeated questions.
The made changes look ok, although the new graph history compiler code is still not making much use the java.time classes. But, changing it raises some questions about the calculations and would take too long for this ticket. Thus, making a note later on the java8 ticket for Onionoo.
The changes here and the upcoming changes in 25002 will make Onionooo distances independent of their local state as far as possible.
The change here leads to identical 'out' directories when run with the same 'status' directory contents. I have a few minor unrelated commits.
Comparing the 'out' dir resulting from Onionoo-4.4-1.8.0 with the resulting 'out' from the new 5.0-1.9.0 can only be done manually. So far, I think those differences are due to the changes here and thus intended, but that's hard to verify. Maybe, make a copy of 'out' and 'status' before deployment to have a rollback option in the unlikely case there are unwanted differences?
Here's a possible improvement: how about we use the last consensus valid-after time or bridge network status publication timestamp as timestamp rather than the last time we saw a specific relay or bridge? I think we called that "Tor time" on #25002 (moved). This would lead to a smaller out/ directory, because we wouldn't write as detailed history files anymore for relays or bridges that are long gone. Does this make sense?
Here's a possible improvement: how about we use the last consensus valid-after time or bridge network status publication timestamp
as timestamp
Which timestamp in particular are you referring to?
rather than the last time we saw a specific relay or bridge? I think we called that "Tor time" on #25002 (moved). This would lead to a smaller out/ directory, because we wouldn't write as detailed history files anymore for relays or bridges that are long gone. Does this make sense?
Regardless, 'Tor time' makes sense, b/c it is a value from the network and not 'added' to the data.
I pushed two commits related to this branch to a branch that is under review for #24729 (moved). Those commits and comments there also ask the question above. Feel free to move that part of the discussion here, but I'm also happy to keep it there.
The two commits are fine. Could we use a more telling name for 'retrieveLastSeenMillis', maybe 'retrieveLatestNetworkUpdateMillis' or else? Also throughout replace 'lastSeenMillis' with something that indicates the usage better. I don't really have a good naming idea though.
Great! Renamed in a separate commit, squashed all fixup/squash commits in that branch, and pushed to master. I'd say this one is safe to be closed now. Closing. Thanks!
Trac: Status: merge_ready to closed Resolution: N/Ato fixed