To check if an exit can exit to the destination, sbws is trying to resolve the domain locally, which fails in many cases.
Even if it does not fail, the IP obtained won't be the same IP to which the exit will make the HTTP request when using a CND.
When the domain resolution is failing, sbws try to choose other exit without restricting that does not have the bad flag and without checking that can exit to an http port.
sbws will also be faster and give less errors not resolving domains locally.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items
0
Link issues together to show that they're related.
Learn more.
sbws is trying to resolve the domain locally, which fails in many cases.
Really? That seems like a problem. Is the system's resolver having issues? Should sbws cache good results it gets back?
Even if it does not fail, the IP obtained won't be the same IP to which the exit will make the HTTP request.
This could be the case, but isn't necessarily the case. For simple destinations (simple like a single webserver as opposed to complex like a CDN) it's most likely not the case that the IPs will be different.
When the domain resolution is failing, sbws try to choose other relay that does not have the exit flag. If it is not an exit, it will fail to make an HTTP request.
If DNS fails line 180 then we go to the else block on line 186 where the "second hop" we pick there is an exit.
So if we are trying to measure an exit and DNS fails, we treat the exit as a non-exit and find an exit to help measure it. This may not be ideal, but it works.
I don't see what you see: I don't see where sbws chooses a non-exit and then tries to use it as the last hop in a circuit.
Right now I don't like these changes and don't agree with merging them. I will also leave comments on the PR, but acknowledging/fixing just them does not mean I think the code is ready to be merged.
sbws is trying to resolve the domain locally, which fails in many cases.
Really? That seems like a problem. Is the system's resolver having issues? Should sbws cache good results it gets back?
If the system resolver can't answer DNS queries, then the operator should install a caching / recursive local resolver. See #28461 (moved).
sbws could cache results, but a proper DNS resolver is going to be better at DNS than any sbws code we can write.
Even if it does not fail, the IP obtained won't be the same IP to which the exit will make the HTTP request.
This could be the case, but isn't necessarily the case. For simple destinations (simple like a single webserver as opposed to complex like a CDN) it's most likely not the case that the IPs will be different.
Even if the IP address is different, the content should be the same: only the latency would vary.
So if we are trying to measure an exit and DNS fails, we treat the exit as a non-exit and find an exit to help measure it. This may not be ideal, but it works.
What happens if sbws gets END_REASON_EXITPOLICY from the exit?
(END_REASON_EXITPOLICY is the response when an exit's policy won't allow the IP address that the exit resolved.)
This can happen because:
the exit has just changed its policy, and sbws has an old version
the exit resolves a different IP address from sbws
I think we should measure the relay as a non-exit in this case.
What happens if the exit can't connect because the connection is refused, or times out?
This can happen because:
the exit is busy
the exit is censored
the remote site is down
I think we should measure the relay as a non-exit in this case.
We could cover all these cases by choosing to measure exits as non-exit relays at random. About half the time would work.
What happens if sbws gets END_REASON_EXITPOLICY from the exit?
(END_REASON_EXITPOLICY is the response when an exit's policy won't allow the IP address that the exit resolved.)
This can happen because:
the exit has just changed its policy, and sbws has an old version
the exit resolves a different IP address from sbws
I think we should measure the relay as a non-exit in this case.
What happens if the exit can't connect because the connection is refused, or times out?
This can happen because:
the exit is busy
the exit is censored
the remote site is down
I think we should measure the relay as a non-exit in this case.
We could cover all these cases by choosing to measure exits as non-exit relays at random. About half the time would work.
I like the idea of recovering from END_REASON_EXITPOLICY better, but the 50% probability idea is waaayyy easier to implement. The former is harder in part because IIRC we currently don't ever look at circuit/stream close reasons, so we should implement doing that in a clean way.
sorry i didn't put this ticket before in needs_review, because:
i thought was just travis, then i realized it needed more work
i was not expecting any of you to look at this now (there're has been tickets last month that nobody reviews in 1 week, and then i merge them myself), thought i'd solve it fast
We were checking bad exit, exit policy and exit in different ways in different parts of the code.
The local domain resolver is only needed to check the IP in the policy, which i don't think is needed, it will fail and be measured later with a probably different exit.
With the changes i've made, i check only the port, but check it in all attempts to get an exit.
I also found that the port was not not being correctly parsed.
The changes solve the problem of having a local domain resolver that fails.
I'm running a testing instance and is being faster, failing less circuits.
We were checking bad exit, exit policy and exit in different ways in different parts of the code.
Sounds like you need to refactor the code so you check the flags and exit policy in a single function.
The local domain resolver is only needed to check the IP in the policy, which i don't think is needed, it will fail and be measured later with a probably different exit.
But what happens when the exit that's being measured fails?
Do we need to give it a 50% chance of being measured like a non-exit?
With the changes i've made, i check only the port, but check it in all attempts to get an exit.
I also found that the port was not not being correctly parsed.
The changes solve the problem of having a local domain resolver that fails.
I'm running a testing instance and is being faster, failing less circuits.
But what happens when the exit that's being measured fails?
if the exit is the one being measured (1st hop) is treated as any other relay,
it stores the result as error, it'll get measured in the next prioritization loop, which gives less priority to the relays that failed.
A new exit is chosen for every relay that gets measured, so the next time that the relay that failed gets measured will get an exit with similar bandwidth at random.
Do we need to give it a 50% chance of being measured like a non-exit?
i don't understand what you mean as "non-exit", all relays are measured as non-exits (1st hop), the exits are chosen independently for the 2nd hop.
Edit title and description to match with what is happening
Trac: Summary: Stop resolving domains locally and stop using non-exits as 2nd hop to Stop resolving domains locally and check same flags for the 2nd hop Description: To check if an exit can exit to the destination, sbws is trying to resolve the domain locally, which fails in many cases.
Even if it does not fail, the IP obtained won't be the same IP to which the exit will make the HTTP request.
When the domain resolution is failing, sbws try to choose other relay that does not have the exit flag. If it is not an exit, it will fail to make an HTTP request.
sbws will also be faster and give less errors not resolving domains locally.
to
To check if an exit can exit to the destination, sbws is trying to resolve the domain locally, which fails in many cases.
Even if it does not fail, the IP obtained won't be the same IP to which the exit will make the HTTP request when using a CND.
When the domain resolution is failing, sbws try to choose other exit without restricting that does not have the bad flag and without checking that can exit to an http port.
sbws will also be faster and give less errors not resolving domains locally.
When the domain resolution is failing, sbws try to choose other relay that does not have the exit flag. If it is not an exit, it will fail to make an HTTP request.
If DNS fails line 180 then we go to the else block on line 186 where the "second hop" we pick there is an exit.
i've changed the title to describe what was really happening.
Changes requested in the PR. And now I'm going to repeat myself a bit.
Replying to [comment:6 juga]
I'm running a testing instance and is being faster, failing less circuits.
Okay. Let's do away with resolving domain names locally.
All we need to do to accomplish this is give stem a None in place of a host when asking it to check exit policies.
Replying to [comment:10 juga]
Do we need to give it a 50% chance of being measured like a non-exit?
i don't understand what you mean as "non-exit", all relays are measured as non-exits (1st hop), the exits are chosen independently for the 2nd hop.
Right now when sbws measures an exit, it tries to measure it in the 2nd hop position (ignoring DNS failures etc. etc)
With your changes, all relays (exits and non-exits) are measured in the 1st hop position. Why?
I think there's two changes that should be made as part of this ticket, and somehow that's exploded into a lot of little buggy revisions.
Give stem a None for a host
When picking an exit helper (for measuring a non-exit or for measuring an exit that can't exit to our destination port, it doesn't matter), be sure to check that its exit policy allows exiting to our destination port.
Do we need to give it a 50% chance of being measured like a non-exit?
i don't understand what you mean as "non-exit", all relays are measured as non-exits (1st hop), the exits are chosen independently for the 2nd hop.
Right now when sbws measures an exit, it tries to measure it in the 2nd hop position (ignoring DNS failures etc. etc)
With your changes, all relays (exits and non-exits) are measured in the 1st hop position. Why?
It was confusing:
what _pick_ideal_second_hop is doing, it is actually not second hop what it picks, but a helper.
is_exit, it can be intepreted as what the relay to measure is, not as what the candidates must be.
the criteria used to pick the 2nd hop (see below).
I'll change those names in other ticket, so i don't get confused again.
I think there's two changes that should be made as part of this ticket, and somehow that's exploded into a lot of little buggy revisions.
Solution:
When you feel like reviewing a ticket, put yourself as reviewer, otherwise i'll assume nobody is going to review the ticket.
I won't put any ticket to needs_review unless it has a reviewer.
If it doesn't have reviewer, i might do the refactors that help me to solve the ticket in the same branch (in different commits).
If you'd like to continue to review this ticket, please put yourself as reviewer. If not, no need to read more.
When picking an exit helper (for measuring a non-exit or for measuring an exit that can't exit to our destination port, it doesn't matter), be sure to check that its exit policy allows exiting to our destination port.
So, you're saying:
when the helper is in the 2nd hop, check the exit policy.
In the current code, why would you check the policy of an exit and the badexit flag when you use the relay to measure in the 2nd hop and only check the exit flag when the helper is in the 2nd position?
I think we should check that it allows to exit to our destination port in all cases (and no need to check exit flag cause if it can exit, it's an exit).
Would you stop checking that it has not the badexit flag?
I think there's two changes that should be made as part of this ticket, and somehow that's exploded into a lot of little buggy revisions.
Solution:
When you feel like reviewing a ticket, put yourself as reviewer, otherwise i'll assume nobody is going to review the ticket.
I won't put any ticket to needs_review unless it has a reviewer.
If it doesn't have reviewer, i might do the refactors that help me to solve the ticket in the same branch (in different commits).
If you'd like to continue to review this ticket, please put yourself as reviewer. If not, no need to read more.
In Core Tor/Tor, we set a ticket as needs_review when it is ready for review, then assign a reviewer.
You can use a different process for sbws if you like.
But if you want more reviews from people who usually work on Tor, please use the same review process.
We could also send RESOLVE command, but i think ADDRMAP can only solve IPv4 and
i guess there could be the case in which a policy rejects IPv4 but IPv6.