Opened 5 months ago

Closed 3 months ago

#28897 closed defect (fixed)

Stop running twice destination usability tests

Reported by: juga Owned by: juga
Priority: Medium Milestone: sbws: 1.0.x-final
Component: Core Tor/sbws Version: sbws: 1.0.2
Severity: Normal Keywords:
Cc: juga, teor Actual Points:
Parent ID: #28663 Points:
Reviewer: teor Sponsor:

Description

Every time that a relay is measured, it is first checked whether destination is usable, and then run connect_to_destination_over_circuit, which checks again if destination is "usable".
It implies two locks, and a destination can fail in a row due to the relay being unstable.
It'd be better to determine that a destination is not usable, either by trying to make a connection not over Tor, or after a X number of failures in a row.

Child Tickets

Change History (11)

comment:1 Changed 5 months ago by juga

Status: assignedneeds_review

I think this is what is making sbws stalls.
A backtrace [0] in the moment it's stalled shows two threads trying to get the next destination.
It locks https://github.com/torproject/sbws/blob/ee64d76df54ceb3a3c9e1e2a797fd70d68bb0035/sbws/lib/destination.py#L248, then while True, then if it enters in _perform_usability_test, will lock again, then call _is_usable, which call connect_destination_over_circuit which locks again. If it fails then it sleep.
Many times connect_to_destination_over_circuit fails several times in a row, cause it's not reliable to do it through Tor and random relays.
If _usable_dests is not overwritten every time that a destination is chosen, then a lock is only needed in connect_destination_over_circuit.
It would be better to refactor all this code to trace it and debug it easier.
As a temporal solution, the "usability" can be tracked out of the class and without locks.
For now i'd just disable checking for usability, and prioritize the refactoring ticket as soon as 1.0 is done.

[0] https://paste.debian.net/hidden/82fc9ec0/
Edit: add link

Last edited 5 months ago by juga (previous) (diff)

comment:2 Changed 5 months ago by juga

Status: needs_reviewneeds_revision

I thought that there's no need of a big refactor to check if a destination is "usable".
Every time that a destination is used (in connect_destination_over_circuit) and it fails, it can be recorded since there's already a lock.
Putting this in revision to implement this before review.

comment:3 Changed 5 months ago by juga

Status: needs_revisionneeds_review

Implemented
https://github.com/torproject/sbws/pull/320
Edit: add the GH PR

Last edited 5 months ago by juga (previous) (diff)

comment:4 Changed 5 months ago by dgoulet

Reviewer: teor

comment:5 Changed 3 months ago by juga

Reminder: when merging into master, add additional commit or fixup to change circuit_id = cb.build_circuit(circuit_path) into circuit_id, _ = cb.build_circuit(circuit_path) [0], as #28736 changed build_circuit to return a tuple.

[0] https://github.com/torproject/sbws/pull/320/commits/b0aaf5d806276bfa946422d016da8506cc48a209#diff-9714c4c15b47a818a1e2537f9e1cd6f2R36 and https://github.com/torproject/sbws/pull/320/commits/b0aaf5d806276bfa946422d016da8506cc48a209#diff-9714c4c15b47a818a1e2537f9e1cd6f2R57

comment:6 Changed 3 months ago by juga

Merged in a branch with current master and current needs_review tickets and tested a whole loop with the public Tor network

comment:7 in reply to:  5 Changed 3 months ago by teor

Replying to juga:

Reminder: when merging into master, add additional commit or fixup to change circuit_id = cb.build_circuit(circuit_path) into circuit_id, _ = cb.build_circuit(circuit_path) [0], as #28736 changed build_circuit to return a tuple.

[0] https://github.com/torproject/sbws/pull/320/commits/b0aaf5d806276bfa946422d016da8506cc48a209#diff-9714c4c15b47a818a1e2537f9e1cd6f2R36 and https://github.com/torproject/sbws/pull/320/commits/b0aaf5d806276bfa946422d016da8506cc48a209#diff-9714c4c15b47a818a1e2537f9e1cd6f2R57

I reviewed the pull request. I suggested some design changes to help sbws recover better from temporary failures.

Can you please do a rebase or merge before the next review, so I can review the code that will be merged to master?

Thanks!

comment:8 Changed 3 months ago by teor

Status: needs_reviewneeds_revision

comment:9 Changed 3 months ago by juga

Status: needs_revisionneeds_review

I fixed the non-design suggested changes, created #29589 to change the design and created https://github.com/torproject/sbws/pull/339 squashing the fixups, rebasing to master and adapting to the new changes in master.

comment:10 Changed 3 months ago by teor

Status: needs_reviewmerge_ready

Thanks, the fixups and squashed PR seem fine to me.

comment:11 Changed 3 months ago by juga

Resolution: fixed
Status: merge_readyclosed

Thanks, merged.

Note: See TracTickets for help on using tickets.