Some changes done to connection_consider_empty_write_buckets and connection_consider_empty_read_buckets before the following commits are causing infinite loops in Shadow. The infinite loops are fixed by the following two commits.
The latest stable 0.2.4.21 still has infinite loops. I am requesting that those commits get backported to 0.2.4.x to be included in the next 0.2.4.x stable release. They are relatively trivial changes.
Rob, those patches don't apply cleanly to maint-0.2.4. Have you tested 0.2.4 plus these patches? If so, do you have a copy of the patches as you backported them?
If not, I made a branch as "bug11638_024" where I tried to backport the full version of #9731 (moved) from 0.2.5.x as modified by those patches. Will you have a chance to try it out and see if it works for you?
b) we already backported most of this patch but missed these parts.
We actually didn't backport most of 9731; we backported a conservative version in 0.2.4.18-rc where we checked for conn->type == CONN_TYPE_CPUWORKER. The tickets above move the the checks to the start of the function, but rely on an earlier commit that made the check into !connection_is_rate_limited(conn).
Rob, those patches don't apply cleanly to maint-0.2.4. Have you tested 0.2.4 plus these patches? If so, do you have a copy of the patches as you backported them?
I did not apply those directly, and do not have a patch. I just noticed that following those commits, things starting working again.
If not, I made a branch as "bug11638_024" where I tried to backport the full version of #9731 (moved) from 0.2.5.x as modified by those patches. Will you have a chance to try it out and see if it works for you?
An experiment is running now. If things go well, it should finish in ~6 hours, and I'll post back here with the results.
b) we already backported most of this patch but missed these parts.
We actually didn't backport most of 9731; we backported a conservative version in 0.2.4.18-rc where we checked for conn->type == CONN_TYPE_CPUWORKER. The tickets above move the the checks to the start of the function, but rely on an earlier commit that made the check into !connection_is_rate_limited(conn).
Right, I noticed multiple sets of commits as well. I think the first set caused the infinite loops, and the last set fixed them.
If not, I made a branch as "bug11638_024" where I tried to backport the full version of #9731 (moved) from 0.2.5.x as modified by those patches. Will you have a chance to try it out and see if it works for you?
An experiment is running now. If things go well, it should finish in ~6 hours, and I'll post back here with the results.
The experiment failed due to configuration problems. Trying again now.
If not, I made a branch as "bug11638_024" where I tried to backport the full version of #9731 (moved) from 0.2.5.x as modified by those patches. Will you have a chance to try it out and see if it works for you?
An experiment is running now. If things go well, it should finish in ~6 hours, and I'll post back here with the results.
The experiment failed due to configuration problems. Trying again now.
Nodes are still failing to bootstrap after a couple of quick config tweaks. I won't be able to debug this further until next week. Mahalo.
I did some more testing with Shadow v1.9.2 and Nick's "bug11638_024" Tor branch at commit efab3484e6ea3a799ccf61061450cfc35791ad41 (one before the backported patch), using my minimal Tor topology.
The bad news is that Tor nodes continue to have bootstrapping problems. The good news is that since the problems occur on the minimal network, we can reproduce them in a matter of seconds. I'm not sure if this is Tor's fault or Shadow's fault somehow, but the same Shadow version and minimal Tor configuration works fine on tor-v0.2.5.2-alpha.
Do we care to look into the bootstrapping issues? If so, does this need a new ticket? I have log files that I could use some help debugging.
I talked to Rob about bootstrapping issues at PETS. We didn't get to the bottom of his questions. Rob, yes, sounds like a new ticket for that would be useful once you have details (including the debug logs from startup).
It sounds like we should close this ticket, since 0.2.5.x is what Shadow uses, and nobody else is encountering these problems in 0.2.4.x.
Please reopen if I'm wrong.
Trac: Status: needs_review to closed Resolution: N/Ato wontfix
I talked to Rob about bootstrapping issues at PETS. We didn't get to the bottom of his questions. Rob, yes, sounds like a new ticket for that would be useful once you have details (including the debug logs from startup).
I discovered an --Address config error that was the cause of those problems. After setting the correct address of my node, everything worked swimmingly.
It sounds like we should close this ticket, since 0.2.5.x is what Shadow uses, and nobody else is encountering these problems in 0.2.4.x.
Please reopen if I'm wrong.
This decision means that no one will ever be able to experiment with 0.2.4.x in Shadow. It's true that Shadow uses 0.2.5.x by default these days, but that doesn't mean that no one ever does research with older Tor versions. In fact, I recently told some researchers that they need to port their code from 0.2.4.x to 0.2.5.x because of the bug described in this ticket, and haven't heard from them since...
I'm OK with not fixing it as long as we agree that its worth more to save the cost of fixing this than to obtain the ability to simulate 0.2.4.x.
This decision means that no one will ever be able to experiment with 0.2.4.x in Shadow. It's true that Shadow uses 0.2.5.x by default these days, but that doesn't mean that no one ever does research with older Tor versions. In fact, I recently told some researchers that they need to port their code from 0.2.4.x to 0.2.5.x because of the bug described in this ticket, and haven't heard from them since...
I'm OK with not fixing it as long as we agree that its worth more to save the cost of fixing this than to obtain the ability to simulate 0.2.4.x.
Is it okay to say "If you want 0.2.4.x to work with shadow, you need to apply this patch?"
I want 0.2.4.x to work with Shadow, since I've had a fair number of requests for it.
To clarify, this means that the approach of saying "apply this patch if you want to run Tor 0.2.4.x with Shadow" is not working?
I think I misunderstood what you meant by "apply this patch if you want to run Tor 0.2.4.x with Shadow". I originally thought you meant that I need to apply your patch and run more tests to make sure it is working correctly, since my tests from comment 15 above indicated that there were still problems (with your patch applied to 0.2.4.21-ish). I now think you may have meant that you have no plans to merge this into 0.2.4.x, and want Shadow users who need 0.2.4.x-stable to merge this patch manually.
So, now that I think I understand you, I will push back a bit. With my new experiments completing successfully (I also ran a 3600 relay experiment without the infinite loop issue), does this change your mind on whether or not you want to merge your patch into 0.2.4.x? Remember that the Shadow infinite loop symptom is usually caused by busy loops in Tor; Although the busy loops are likely not a real performance issue in practice, removing them may still be desirable. Are there parts of your patch that you are uncomfortable merging?
If you are still hesitant to merge, I will respect your decision and tell people to merge your patch now that I know Shadow+Tor works correctly with it.
I'm inclined to leave it out -- Debian et al ship Tor 0.2.4.x now, and the question to ask for backports in this case is "does it merit a security advisory, or is it a major correctness fix?"
Also given that Tor 0.2.5.x will be going stable in not too long, and that it's a pretty short patch, hopefully the Shadow users can handle applying it?
That said, we (or you) could make a tarball with the patch applied, for users who like tarballs but can't run patch. Or we could make a Tor branch, for those who like those?
This is quite clearly a minor fix, and I agree it it simple enough that applying it shouldn't cause problems. I'll create a patch from Nick's branch to 0.2.4.23 (latest stable) and post it on the Shadow ticket. Thanks!
Trac: Resolution: N/Ato wontfix Status: reopened to closed