00:02:09 [NOTICE] pathbias_count_build_success(): Bug: Unexpectedly high successes counts (203.784282/203.421875) for guard ...
00:01:52 [NOTICE] pathbias_count_build_success(): Bug: Unexpectedly high successes counts (202.784282/202.421875) for guard ...
00:01:46 [NOTICE] pathbias_count_build_success(): Bug: Unexpectedly high successes counts (201.784282/201.421875) for guard ...
00:01:01 [NOTICE] pathbias_count_build_success(): Bug: Unexpectedly high successes counts (200.784282/200.421875) for guard ...
00:00:58 [NOTICE] pathbias_count_build_success(): Bug: Unexpectedly high successes counts (199.784282/199.421875) for guard ...
00:00:41 [NOTICE] pathbias_count_build_success(): Bug: Unexpectedly high successes counts (198.784282/198.421875) for guard ...
14:54:21 [NOTICE] pathbias_count_build_success(): Bug: Unexpectedly high successes counts (179.568564/158.843750) for guard ...
14:54:15 [NOTICE] pathbias_count_build_success(): Bug: Unexpectedly high successes counts (178.568564/158.843750) for guard ...
14:54:03 [NOTICE] pathbias_count_build_success(): Bug: Unexpectedly high successes counts (177.568564/156.843750) for guard ...
14:54:02 [NOTICE] pathbias_count_build_success(): Bug: Unexpectedly high successes counts (176.568564/155.843750) for guard ...
14:54:02 [NOTICE] pathbias_count_build_success(): Bug: Unexpectedly high successes counts (175.568564/155.843750) for guard ...
14:53:53 [NOTICE] pathbias_count_build_success(): Bug: Unexpectedly high successes counts (174.568564/154.843750) for guard ...
Tor Version: Tor-0.2.4.10-alpha.
This trend repeats with the bias increasing on both sides.
The guard did not change. I have removed it for security reasons. I'll provide if you need it, however.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
FTR, I saw something similar while I was testing the path use bias code in 0.2.4.9-alpha. There was a source of overcounting present during scaling our counts in versions prior to 0.2.4.10-alpha that may have been what caused your high counts here. It is possible that the miscounting began while you were running those earlier versions.
If you want to start fresh, stop your tor daemon and run 'grep -v "EntryGuardPath" state > state.new ; mv state.new state' in your Tor's data directory, then start your tor daemon again. This will clear out the old path bias data.
If it happens again after that, we might need more general information about how you're using that Tor daemon (as a hidden service server? with a tor controller other than Vidalia? as a relay? with TBB-alpha? with a custom client app/script?) so we can try to reproduce here. But man, the state machine changes in tor 0.2.4.10 really should make it impossible to count the same circuit construction twice, and should have eliminated the error source during scaling from 0.2.4.9 and earlier...
Trac: Status: new to assigned Owner: N/Ato mikeperry
If you want to start fresh, stop your tor daemon and run 'grep -v "EntryGuardPath" state > state.new ; mv state.new state' in your Tor's data directory, then start your tor daemon again. This will clear out the old path bias data.
Hi mikeperry,
The issue has not shown itself again after following those steps. Here's hoping, eh? :)
Another thing you could try is to see if you are getting any Bug log messages about pathbias. I believe most of them are at info level. You can enable them with a torrc line like:
mike, I think that if this happens at startup, we should just assume we have an old bad state file, clip the success count to the maximum, and not warn. That way if it happens later while Tor is running, we'll know that there's another bug. Sound plausible?
Pushed to mikeperry/bug8235-diagnostic (585c3108e6bddcc7a42750129cfc2392adf7e27c). I also added an additional check to the scaling, just in case there are still more issues hiding there.
Still untested as of yet; I am about to start testing it.
This is going to want to get rebased back onto 0.2.4. Also the changes file should probably mention #8235 (moved). Also, "Changes File" is not a cool commit msg. Otherwise appears plausible. Let me know once it's tested.
If you are still doing tor_next branches, this might be a good candidate to throw on there just to see if anything obvious breaks. It's going to be a few days before I'll be able to test this thoroughly anyway.
It's in the latest tor-next. It doesn't merge cleanly into 0.2.4, since it isn't based on 0.2.4. I don't know if there are people running those though. We should dogfood.
2e1dd2d9d0ab11160ab6c45e6923551b50717027 looks fine to me
In c29da58ecb58f6513a4a3051144b285bb3653f3e, you're emitting log message if the counts become inconsistent after scaling but were consistent before (counts_are_sane && ...); if !counts_are_sane, are we certain to have already warned when that condition first occurred, or can this cause it to silently not log? Change this or say you're sure this is the case, I think.
2e1dd2d9d0ab11160ab6c45e6923551b50717027 looks fine to me
In c29da58ecb58f6513a4a3051144b285bb3653f3e, you're emitting log message if the counts become inconsistent after scaling but were consistent before (counts_are_sane && ...); if !counts_are_sane, are we certain to have already warned when that condition first occurred, or can this cause it to silently not log? Change this or say you're sure this is the case, I think.
I don't believe the checks in this particular commit will silence any log messages. The changes to entrynodes.c in 2e1dd2d9 might silence some, but they are meant to cover the cases where previous tor versions had overcounting bugs that this one does not.
Should I perhaps promote 2e1dd2d9 to notice level? It might be nice to know if that keeps happening... I can add a version check in there for 0.2.4.10-alpha I guess (assuming that is easy).
Otherwise, I think these log messages will at least get us closer to solving this bug. Of course, I don't claim that any of these commits solve it for sure as of yet. (That's why the changes file is so gimpy and doesn't mention any solved bugs or versions).
Ok, I pushed a fixup commit that changes 2e1dd2d9 to emit a notice message if the state file was written by 0.2.4.12-alpha or above (on the assumption that this will get merged in time to make 0.2.4.12-alpha).
The idea is that if 0.2.4.12-alpha or above wrote the state file, it should have already clipped the overcounting issue when it started up, and if we still see evidence of overcounting in such newer statefiles, the overcounting bug is still hiding somewhere.
The fixup commit is 2d8f8b2f86aad22e2bbf6d110814feace5c03094 on mikeperry/bug8235-diagnostic.
Ok. We have to remember to update that entrynodes.c version check for whatever 0.2.4.x release we actually backport this into.
I would also prefer earlier than later of course, because these logs will clarify any more bug reports from people who are potentially using state files created before the bug was fixed in 0.2.4.10-alpha (like we suspect for this bug and #8580 (moved)). Otherwise we're just going to have to dup those to this on the assumption that it was an old tor version that caused their issue.
Ok. We have to remember to update that entrynodes.c version check for whatever 0.2.4.x release we actually backport this into.
It has been merged into 0.2.4; no need to change the version. See above where I said "merged into maint-0.2.4". I had to rebase it, because it was built on master.
It's not the branch I just merged that is getting backported. The "024-backport" here means "If we find out that there is a fixable bug here, then consider backporting the fix".
Was this supposed to be fixed as of 0.2.4.10? I think that's what was said, but I got lost in all the "commits". Anyway, I experienced this issue with version 0.2.4.11-alpha. I stopped the tor daemon, ran 'grep -v "EntryGuardPath" state > state.new ; mv state.new state' in the tor data directory, and then started the tor daemon. After approximately 19.5 of uptime, I experienced the same message repeatedly in the logs. I am running as both a relay and a hidden service. I believe the service tends to have extremely long lived connections (days), if that makes a difference. In addition, the node accepts socks connections from the local network, and the software that is proxied also tends to have extremely long-lived circuits.
I have just upgraded to 0.2.4.12-alpha via the torproject repo for Debian Wheezy. I did not reset the state file again before 0.2.4.12 started up. Let me know if I should do so now, or just report here if the bug recurs regardless.
Issue began again after 43 minutes of uptime with 0.2.4.12-alpha. Excerpt of log below. I left some of the (seemingly) extraneous log items in there in case they turn out to be relative.
Apr 19 05:36:45.000 [notice] pathbias_scale_close_rates(): Bug: Scaling has mangled pathbias counts to 164.500000/151.000000 (38/1 open) for guard...Apr 19 05:36:48.000 [notice] pathbias_count_build_success(): Bug: Unexpectedly high successes counts (165.500000/153.000000) for guard...Apr 19 05:36:52.000 [notice] pathbias_count_build_success(): Bug: Unexpectedly high successes counts (166.500000/155.000000) for guard...Apr 19 05:36:53.000 [notice] pathbias_count_build_success(): Bug: Unexpectedly high successes counts (167.500000/155.000000) for guard...Apr 19 05:36:55.000 [notice] Have tried resolving or connecting to address '[scrubbed]' at 3 different places. Giving up.Apr 19 05:36:56.000 [notice] pathbias_count_build_success(): Bug: Unexpectedly high successes counts (168.500000/156.000000) for guard...Apr 19 05:36:57.000 [notice] Closing stream for '[scrubbed].onion': hidden service is unavailable (try again later).Apr 19 05:37:21.000 [notice] pathbias_count_build_success(): Bug: Unexpectedly high successes counts (169.500000/158.000000) for guard...Apr 19 05:37:21.000 [notice] pathbias_count_build_success(): Bug: Unexpectedly high successes counts (170.500000/158.000000) for guard...Apr 19 05:37:27.000 [warn] Didn't recognize cell, but circ stops here! Closing circ.Apr 19 05:37:27.000 [warn] circuit_receive_relay_cell (forward) failed. Closing.Apr 19 05:37:36.000 [notice] We tried for 15 seconds to connect to '[scrubbed]' using exit ... at .... Retrying on a new circuit.Apr 19 05:37:37.000 [notice] Closing stream for '[scrubbed].onion': hidden service is unavailable (try again later).Apr 19 05:37:46.000 [notice] pathbias_count_build_success(): Bug: Unexpectedly high successes counts (171.500000/159.000000) for guard...Apr 19 05:37:49.000 [notice] pathbias_count_build_success(): Bug: Unexpectedly high successes counts (172.500000/160.000000) for guard...Apr 19 05:37:52.000 [notice] pathbias_count_build_success(): Bug: Unexpectedly high successes counts (173.500000/161.000000) for guard...
Great! I was able to use those values to work through pathbias_scale_close_rates() on paper and spot the flaw. It should be fixed in mikeperry/bug8235-fix, but it would be nice if someone who can reproduce this easily can confirm. Here's how you build that branch:
I just pushed another commit to bump that version check in entrynodes.c to 0.2.4.13-alpha, since we now know that the bug exists in 0.2.4.12-alpha, too (so no need to issue notices at startup for statefiles from then).
70552787aae864c4d3b69f7bc89428b82265ba03 is the new head of mikeperry/bug8235-fix with this commit.
Is there some way to confirm this other than "wait"? If I pasted just the EntryGuardPathBias and EntryGuardPathUseBias lines from the state file, would you be able to tell if the scaling had occurred without producing an error?
Well, assuming you stripped those out again, the one thing I would look for in your state file is pathbias counts with decimal values in them (which would indicate scaling).
Otherwise, you can do something like:
Log [circ]info notice file /var/lib/tor/tor.log
in your torrc to log circuit activity. You can then grep your logs for "Scaled pathbias counts".
However, in both cases this will only tell you if scaling happened. Triggering the overcounting bug itself is a function of how close your success rate is to 100%, and how many open circuits you have at any given time.