Opened 20 months ago

Last modified 3 days ago

#21600 assigned defect

Hidden service introduction point retries occur at 1 second intervals

Reported by: teor Owned by: asn
Priority: Medium Milestone: Tor: 0.3.6.x-final
Component: Core Tor/Tor Version: Tor: 0.2.7.2-alpha
Severity: Normal Keywords: tor-hs, single-onion, prop224, 034-triage-20180328, 034-removed-20180328
Cc: Actual Points:
Parent ID: #21446 Points: 1
Reviewer: Sponsor:

Description

Tor will try to reconnect to an introduction point up to 3 times.
But rend_consider_services_intro_points() is called every second, which means that it uses up these retries very quickly, particularly if the connection fails quickly, like direct connections sometimes do on single onion services.

It might be more sensible to retry slightly more slowly.

On the other hand, maybe it's good that we fail fast and replace the introduction point.

This behaviour was introduced in commit 1125a4876b4.

Child Tickets

Change History (26)

comment:1 Changed 20 months ago by dgoulet

Status: newneeds_information

Hrm... failing 3 times in 3 seconds means that every circuit creation failed right away? In that case, "tor" might have more problems... But I think the behavior here should be that we open a circuit and then if it fails like in 10 seconds after, we note down the try and retry a second later. That sounds reasonable to me?

comment:2 in reply to:  1 Changed 20 months ago by teor

Replying to dgoulet:

Hrm... failing 3 times in 3 seconds means that every circuit creation failed right away? In that case, "tor" might have more problems... But I think the behavior here should be that we open a circuit and then if it fails like in 10 seconds after, we note down the try and retry a second later. That sounds reasonable to me?

Let's reword it to be something like:

When a circuit fails, we retry 10 seconds after we first detect the failure.

Then it's dynamic based on circuit failure time.

comment:3 Changed 20 months ago by teor

Status: needs_informationassigned

comment:4 Changed 20 months ago by teor

See #21621 for notes on the timing of the retries here: I think we should retry after 30 seconds to match the connection timeout (and avoid penalising slow hidden services).

comment:5 Changed 20 months ago by teor

(Oh, and we should randomise each interval between 0.5 and 1.5 times, to avoid thundering herds.)

comment:6 Changed 20 months ago by teor

Let's try that again: the maximum we'll ever get is the circuit timeout, so let's make it a random value in [CircuitTimeout/3, CircuitTimeout].

Last edited 20 months ago by teor (previous) (diff)

comment:7 Changed 19 months ago by dgoulet

Sponsor: SponsorR-can

comment:8 Changed 18 months ago by teor

Owner: teor deleted

I will not have time to do this before the 0.3.1 code freeze.
It would be good if someone else fixed this bug in 0.3.1, because it affects hidden service reliability.

If you defer to 0.3.2, please reassign to me.

comment:9 Changed 17 months ago by dgoulet

Milestone: Tor: 0.3.1.x-finalTor: 0.3.2.x-final

comment:10 Changed 17 months ago by dgoulet

Owner: set to teor

comment:11 Changed 17 months ago by dgoulet

Keywords: prop224 added

comment:12 Changed 14 months ago by dgoulet

Milestone: Tor: 0.3.2.x-finalTor: 0.3.3.x-final

Still worth considering!

comment:13 Changed 9 months ago by teor

Milestone: Tor: 0.3.3.x-finalTor: 0.3.4.x-final

Moving most of my assigned tickets to 0.3.4

comment:14 Changed 8 months ago by teor

Owner: teor deleted

I'm not going to get time to do this in 0.3.4

comment:15 Changed 7 months ago by nickm

Keywords: 034-triage-20180328 added

comment:16 Changed 7 months ago by nickm

Keywords: 034-removed-20180328 added

Per our triage process, these tickets are pending removal from 0.3.4.

comment:17 Changed 6 months ago by nickm

Milestone: Tor: 0.3.4.x-finalTor: unspecified

These tickets, tagged with 034-removed-*, are no longer in-scope for 0.3.4. We can reconsider any of them, if time permits.

comment:18 Changed 6 months ago by arma

To be clear, this ticket is about the onion service retrying circuits to its already-announced intro points, so it can resume using these intro points, so clients won't be too impacted when e.g. the onion service loses its network connection?

I ask because #25882 seems to be thinking this ticket is about clients who access onion services launching too many requests.

comment:19 Changed 3 months ago by teor

Status: assignednew

Make everything that is assigned to no-one new again.

comment:20 Changed 6 weeks ago by teor

Keywords: 035-must added
Milestone: Tor: unspecifiedTor: 0.3.5.x-final

Let's look at this again in 0.3.5. In Tor 0.3.4, we made these callbacks happen a few times a second.

comment:21 Changed 3 weeks ago by nickm

Sponsor: SponsorR-can

comment:22 Changed 3 weeks ago by nickm

Keywords: 035-must removed

Worth doing, but a long-deferred ticket can't really be a "must" IMO.

comment:23 Changed 3 weeks ago by nickm

Priority: MediumHigh

comment:24 Changed 9 days ago by nickm

Owner: set to asn
Status: newassigned

comment:25 Changed 3 days ago by asn

Milestone: Tor: 0.3.5.x-finalTor: 0.3.6.x-final

No time for this in 035. Pushing to 036.

comment:26 Changed 3 days ago by asn

Priority: HighMedium
Note: See TracTickets for help on using tickets.