Opened 6 years ago

Closed 3 years ago

Last modified 3 years ago

#8864 closed defect (fixed)

Hidden service: Suddenly the service does not respond to new connections (INTRODUCE2 cell on intro circ with no corresponding rend_intro_point_t)

Reported by: reiam Owned by:
Priority: Medium Milestone: Tor: 0.2.7.x-final
Component: Core Tor/Tor Version: Tor: 0.2.7
Severity: Blocker Keywords: tor-hs, 023-backport, 025-triaged, 027-triaged-1-in
Cc: Actual Points:
Parent ID: Points: unclear
Reviewer: Sponsor: SponsorR

Description

I am hosting a hidden service with tor on "Windows Server 2008 R2 Standard 64-bit".

The clients can connect to my service without problems after I start tor (my service is available as it should be).

After some time (it depends - it can be 10 min, 30 min, up to 2 hours, but no longer than 2 hours) i get the following warning every few seconds:

[warn] rend_service_introduce(): Bug: Internal error: Got an INTRODUCE2 cell on an intro circ (for service "<censored onion address>") with no corresponding rend_intro_point_t.

(see attached tor logs)

If i get this warnings, the service is not reachable for new connections, so i have to restart tor (existing connections seem to be working).

Tor on the client side shows no error message (it seems that the service is reachable for the client, but the client does not get a response from the service, otherwise a message like "hidden service is unavailable (try again later)" would appear - which is NOT the case)

After the restart the service is available again, but after a maximum of 2 hours the same problem appears again (service is not available for new clients).

I have nothing special in my torrc file (only the default file with HiddenServiceDir and HiddenServicePort additionally defined).

Child Tickets

Attachments (3)

tor log.txt (3.4 KB) - added by reiam 6 years ago.
log1
tor log 2.txt (2.8 KB) - added by reiam 6 years ago.
log2
20130515115701.rar (877.0 KB) - added by reiam 6 years ago.
Debug log file

Download all attachments as: .zip

Change History (47)

Changed 6 years ago by reiam

Attachment: tor log.txt added

log1

Changed 6 years ago by reiam

Attachment: tor log 2.txt added

log2

comment:1 Changed 6 years ago by reiam

I switched my server to Ubuntu 10.4, but the issue ocurres again.

Is there any torrc config available to prevent from this issue?

comment:2 Changed 6 years ago by arma

Keywords: tor-hs added; tor hidden services removed
Milestone: Tor: 0.2.3.x-final

comment:3 Changed 6 years ago by arma

This one is going to be a bear to track down, I suspect.

First step is to get a debug-level log from the Tor running the hidden service. The more the better. You'll probably prefer doing that on the Linux machine. Extra credit if you have debug-level logs of a Tor client (with the clock synchronized) trying to access the hidden service when it generates one of these log lines.
https://www.torproject.org/docs/faq#Logs

Step two, once you've got the logs so we're not destroying what might be our only hint, is to try upgrading to 0.2.4.12-alpha and see if that changes things.

If it doesn't change things, then we should add some more debugging hints to an upcoming 0.2.4 release.

Changed 6 years ago by reiam

Attachment: 20130515115701.rar added

Debug log file

comment:4 Changed 6 years ago by reiam

I attached my debug logfile.

The first time the problem appears is on line 384398.

I'll try the suggested alpa version and let you know if it happens again

comment:5 in reply to:  4 Changed 6 years ago by reiam

I'll try the suggested alpa version and let you know if it happens again

The problem also exists in "0.2.4.12-alpha"

comment:6 Changed 6 years ago by rransom

Some relevant lines:

  2122  May 15 11:53:06.000 [info] rend_service_intro_has_opened(): Established circuit 33837 as introduction point for service <servicename>

  2200  May 15 11:53:06.000 [info] rend_service_intro_has_opened(): Established circuit 33835 as introduction point for service <servicename>

  2252  May 15 11:53:07.000 [info] rend_service_intro_established(): Received INTRO_ESTABLISHED cell on circuit 33837 for service <servicename>
  2256  May 15 11:53:07.000 [info] rend_services_introduce(): Giving up on [scrubbed] as intro point for [scrubbed] (circuit disappeared).

  2398  May 15 11:53:07.000 [info] rend_service_intro_established(): Received INTRO_ESTABLISHED cell on circuit 33835 for service <servicename>

On line 379613, the HS finishes launching descriptor-upload directory connections.

On line 382784, the HS finishes uploading one copy of its descriptor. On line 383034, it finishes uploading another.

comment:7 Changed 6 years ago by rransom

find_intro_circuit is crap. It should associate intro circuits with rend_intro_point_t objects in the same way that find_intro_point does.

The if (!node || !intro_circ) { block in rend_services_introduce should be an if (!intro_circ) { block only. If node is NULL and intro_circ is not, the client probably wants to (emit a log message and) close intro_circ.

comment:8 Changed 5 years ago by nickm

Keywords: 023-backport added
Milestone: Tor: 0.2.3.x-finalTor: 0.2.4.x-final

comment:9 Changed 5 years ago by arma

I closed #9625 as duplicate of this ticket.

comment:10 Changed 5 years ago by nickm

Milestone: Tor: 0.2.4.x-finalTor: 0.2.5.x-final

comment:11 Changed 5 years ago by nickm

Keywords: 025-triaged added

comment:12 Changed 5 years ago by nickm

For 0.2.5, we should investigate find_intro_circuit enough to see whether it's broken and responsible for this bug. If it is, we should fix it in 0.2.5. If not, we can defer this to 0.2.6 or 0.2.???

comment:13 Changed 4 years ago by nickm

Milestone: Tor: 0.2.5.x-finalTor: 0.2.6.x-final

comment:14 Changed 4 years ago by nickm

Milestone: Tor: 0.2.6.x-finalTor: 0.2.???

comment:15 Changed 4 years ago by cypherpunks

I get this too.

comment:16 in reply to:  15 Changed 4 years ago by cypherpunks

Replying to cypherpunks:

I get this too.

Can you get debug-level logs from a recent version? The rar file from last year seems to be damaged.

comment:17 Changed 4 years ago by cypherpunks

Are there still logs needed for this? I have been getting this problem since upgrading from 0.2.4.22 to 0.2.4.23. I am collecting a debug log file; once the issue shows up in it (it's an intermitent problem) I could provide it if it will help expedite a resolution.

comment:18 in reply to:  17 Changed 4 years ago by asn

Some notes from a brief look at 20130515115701.rar​:

The Hidden Service establishes 3 introduction circuits (with circid 33837 / 33835 / 58471) in the beginning:

May 15 11:53:06.000 [info] rend_service_intro_has_opened(): Established circuit 33837 as introduction point for service <servicename>
May 15 11:53:06.000 [info] rend_service_intro_has_opened(): Established circuit 33835 as introduction point for service <servicename>
May 15 11:53:07.000 [info] rend_service_intro_has_opened(): Established circuit 58471 as introduction point for service <servicename>

After that, the hidden service doesn't like some of its introduction circuits and gives up on them a few times:

May 15 11:53:07.000 [info] rend_services_introduce(): Giving up on [scrubbed] as intro point for [scrubbed] (circuit disappeared).
May 15 11:53:08.000 [info] rend_services_introduce(): Giving up on [scrubbed] as intro point for [scrubbed] (circuit disappeared).
May 15 11:53:08.000 [info] rend_services_introduce(): Giving up on [scrubbed] as intro point for [scrubbed] (circuit disappeared).
May 15 11:53:08.000 [info] rend_services_introduce(): Giving up on [scrubbed] as intro point for [scrubbed] (circuit disappeared).
May 15 11:53:09.000 [info] rend_services_introduce(): Giving up on [scrubbed] as intro point for [scrubbed] (circuit disappeared).
...

The interesting thing now is that if you look at all the INTRODUCE2 cells that are being handled, there is some sort of pattern. That is, cells that arrive on circuit 33835 are handled properly, but all INTRODUCE2 cells on circuits 33837 or 58471 produce this internal error. You can check this by doing grep -A1 "Received INTRODUCE2" 20130515115701.log.

So what's the problem with those two specific circuits?

comment:19 in reply to:  7 ; Changed 4 years ago by asn

Replying to rransom:

The if (!node || !intro_circ) { block in rend_services_introduce should be an if (!intro_circ) { block only. If node is NULL and intro_circ is not, the client probably wants to (emit a log message and) close intro_circ.

Hm, that might be a good idea. IIUC, in the case where the IP is not in the consensus anymore, the problem is that even though the rend_intro_point_t gets freed, the intro_circ stays up and can be used by clients. So a client with an old consensus will use that node and send an INTRODUCE2 cell down the circuit that will trigger this warning.

This seeeeeems plausible. However, I'm not sure if it's what causes the problem in 20130515115701.rar​. The weird thing there is that the log message appears only a few seconds after the IPs get picked.

comment:20 in reply to:  19 Changed 4 years ago by dgoulet

Replying to asn:

Replying to rransom:

The if (!node || !intro_circ) { block in rend_services_introduce should be an if (!intro_circ) { block only. If node is NULL and intro_circ is not, the client probably wants to (emit a log message and) close intro_circ.

Hm, that might be a good idea. IIUC, in the case where the IP is not in the consensus anymore, the problem is that even though the rend_intro_point_t gets freed, the intro_circ stays up and can be used by clients. So a client with an old consensus will use that node and send an INTRODUCE2 cell down the circuit that will trigger this warning.

This seeeeeems plausible. However, I'm not sure if it's what causes the problem in 20130515115701.rar​. The weird thing there is that the log message appears only a few seconds after the IPs get picked.

Hrm, seems there is actually almost 3 minutes before seeing the first error message for this bad circuit 58471.

May 15 11:53:07.000 [info] rend_service_intro_has_opened(): Established circuit 58471 as introduction point for service <servicename>
May 15 11:53:07.000 [info] rend_service_intro_established(): Received INTRO_ESTABLISHED cell on circuit 58471 for service <servicename>
May 15 11:56:36.000 [info] rend_service_introduce(): Received INTRODUCE2 cell for service "<servicename>" on circ 58471.
May 15 11:56:36.000 [warn] rend_service_introduce(): Bug: Internal error: Got an INTRODUCE2 cell on an intro circ (for service "<servicename>") with no corresponding rend_intro_point_t.

Fun fact, same goes for the other bad circuit (33837), almost 3 minutes also:

May 15 11:53:06.000 [info] rend_service_intro_has_opened(): Established circuit 33837 as introduction point for service <servicename>
May 15 11:53:07.000 [info] rend_service_intro_established(): Received INTRO_ESTABLISHED cell on circuit 33837 for service <servicename>
May 15 11:56:30.000 [info] rend_service_introduce(): Received INTRODUCE2 cell for service "<servicename>" on circ 33837.
May 15 11:56:30.000 [warn] rend_service_introduce(): Bug: Internal error: Got an INTRODUCE2 cell on an intro circ (for service "<servicename>") with no corresponding rend_intro_point_t.

Where the good valid circuit (33835) sees the first cell ~30 seconds later:

May 15 11:53:06.000 [info] rend_service_intro_has_opened(): Established circuit 33835 as introduction point for service <servicename>
May 15 11:53:07.000 [info] rend_service_intro_established(): Received INTRO_ESTABLISHED cell on circuit 33835 for service <servicename>
May 15 11:53:42.000 [info] rend_service_introduce(): Received INTRODUCE2 cell for service "<servicename>" on circ 33835.

So a ~3 minutes delay, it might be possible that the intro point node was removed from the consensus. I also think that if (!node || !intro_circ) { is wrong here. The node might have disapeared from the list but the intro circuit could still be "valid" thus it should be marked as closed if the node is NULL.

comment:21 Changed 4 years ago by dgoulet

Status: newneeds_review

Ok, I took a stab at it for a fix. Simpler I could make.

See branch in my personal repo: bug8864_025_v1

comment:22 Changed 4 years ago by nickm

I wonder if it's maybe wrong to close the intro circuit immediately when the node is removed from the router list. Doing so makes it possible to tell exactly when the hidden service fetched a new consensus. Does that hurt us?

comment:23 Changed 4 years ago by dgoulet

Small IRC log on that:

09:54 < dgoulet> nickm: so about #8864, it would leak that there is a new consensus for that HS but you'll only know if you are the IP no? I mean who can learn that the IP circuit is closing down is the HS and the IP or am I missing something?
09:55 < nickm> dgoulet: anybody who knows the IP can try connecting to it and see that the circuit is gone?

09:57 < dgoulet> nickm: yes but how can you differentiate from a node not in the consensus and IP rotation? hrm... I'm answering myself, you fetch the consensus, check if the node is in, connect to the IP, no HS there so you know that the HS has updated
10:00 < dgoulet> nickm: tbh, I'm not sure how bad it is though to know that the relay got a new consensus at a precise time :S
10:03 < nickm> dgoulet: what if you see when the relay downloads the consensus? That may be observable through other means.
10:03 < nickm> dgoulet: I'm not sure I'm 100% right, but I think we should figure this out

comment:24 Changed 4 years ago by asn

Some quick thoughts:

Assuming that the patch goes in, I guess the attack here is "You want to find when an HS fetches a new consensus. Monitor all its IPs, find the first time one of them is not in the consensus. At that time, keep on hitting the HS through that IP, and see when it stops responding. That's the point it probably loaded the new consensus.".

It's plausible I guess, but I can't find any plausible attacks that it allows. Maybe we are worrying that it might enable traffic confirmation attacks ("That HS caused this much traffic at exactly THEN")?
However, one might argue that that's already easy with hidden sercices anyway since you can send (and receive) traffic to/from it whenever you want.

More thinking required?

comment:25 Changed 4 years ago by nickm

I'd be okay if we just did this at latest time we could fetch a new consensus. How hard/bad would that be? (And what's the harm of using an IP that's not in the consensus anyway?)

comment:26 in reply to:  25 Changed 4 years ago by asn

Replying to nickm:

I'd be okay if we just did this at latest time we could fetch a new consensus. How hard/bad would that be?

Hm, when is that I wonder?

Clients download a new consensus every 1 hour, right? If yes, the latest time that an always-on client could have fetched a consensus would be its valid-after time + an hour? Or the time its published + an hour, if that's a different thing.

(And what's the harm of using an IP that's not in the consensus anyway?)

Hm, good question.

I guess the main issue is that clients with a newer consensus won't be able to use that IP. Or maybe the IP was found to be a bad relay?

(There are not many nice attacks that IPs can launch right now, but maybe that will change in the future when we introduce scalability etc.)

comment:27 Changed 4 years ago by nickm

I guess the main issue is that clients with a newer consensus won't be able to use that IP.

No, they should be able to use it just fine; we include their onion keys in the hsdesc, right?

Or maybe the IP was found to be a bad relay?

That could be an issue.

comment:28 Changed 4 years ago by nickm

Milestone: Tor: 0.2.???Tor: 0.2.7.x-final

These might also be worth looking at in 0.2.7

comment:29 Changed 4 years ago by arma

Keywords: SponsorR added

comment:30 Changed 4 years ago by s7r

Keywords: SponsorR removed

I receive this warn too. Tor 0.2.5.10 on Debian Wheezy 64 bit.
One server hosts 4 hidden services, mirrors and free services, pushing a big amount of HS traffic. I was able to identify the warn message for 2 of the hosted hidden services.

I have 2 other servers hosting other hidden services (one HS each) and get the warn on both of them as well.

When I've seen the warn in logs, all the HSes were accessible and were functioning normally. I did not have to restart Tor. CPU/RAM usage was also in normal parameters. At least in my cases, the warn did not affect HS functionality, all HSes were still reachable without restarting Tor when I checked (warn message timestamp: 09:11:08.000 and I checked the availability of HSes which were nominated by the warn message at 09:52:58 - about 41 mins later).

I don't know what happened to the clients trying to access the HSes exactly at the time of the warn (if there were any besides the one who caused the warning message).

comment:31 Changed 4 years ago by s7r

Keywords: SponsorR added

comment:32 Changed 4 years ago by nickm

Keywords: 027-triaged-1-in added

Marking some tickets as triaged-in for 0.2.7 based on early triage

comment:33 Changed 4 years ago by asn

Hello,

please see branch bug8864 in my repo for a fix to this issue according to comment:7 and subsequent discussion. I decided to go with the shortest fix possible, because that function is quite hairy and improving the code quality would not be easy. We should open a ticket for this. Instead, I decided to write some more docs for that part of the function, to make its behavior more clear.

It's also worth noting that this patch abides by two design decisions that are not clear in the spec. Namely:
a) If a relay is not in the consensus anymore, we close any intro circs to it.
b) If an intro circuit disappears, we free that intro point (this will change in #8239).
This might or might not be worth noting down in rend-spec.txt.

Also, I decided to term this as a bugfix on 0.2.3.10-alpha because that's when this log message was introduced.

FWIW, my patch is very similar to David's in comment:21, but splits changes into multiple commits, adds a changes file, and fixes some weirdness.

comment:34 Changed 4 years ago by isabela

Points: unclear
Version: Tor: 0.2.3.25Tor: 0.2.7

comment:35 Changed 4 years ago by nickm

I think the code is fine here, but I have one question: is it perhaps risky to leak, by the closure of an IP circuit, that the HS has just fetched a new consensus?

comment:36 in reply to:  35 ; Changed 4 years ago by dgoulet

Replying to nickm:

I think the code is fine here, but I have one question: is it perhaps risky to leak, by the closure of an IP circuit, that the HS has just fetched a new consensus?

Two elements I think here could notice, client and IP.

The IP doesn't know that this circuit is a specific HS with xyz.onion so at best it would know "one of the HS in the network" has maybe (could be a restart) a new consensus. But again that's not entirely true because if the circuit gets closed it's because the IP is out of the consensus but we are looking at it from the IP so that would mean an attacker firewalling heavily to prevent the IP from being in the consensus or modified tor? Could be crazy network issues I agree but...

The client could maybe notice it by establishing every seconds a circuit to the IP and delaying the cells needed on it to keep it open to see if the HS would close it fast. You would learn in theory the exact time of the new consensus but only if the IP was removed from it.

So yeah it seems that it would be possible from the client perspective to learn that as long as the IP is out of the consensus (which could be controlled by an attacker).

comment:37 in reply to:  36 Changed 4 years ago by arma

Replying to dgoulet:

The IP doesn't know that this circuit is a specific HS with xyz.onion so at best it would know "one of the HS in the network" has maybe (could be a restart) a new consensus.

(An intro point can know that it's the intro point for a given hidden service, if it has a list of hidden services, fetches their descriptors, and sees if the intro point key is in any of them. This is a more active adversary but still a reasonable one.)

comment:38 in reply to:  35 Changed 4 years ago by asn

Replying to nickm:

I think the code is fine here, but I have one question: is it perhaps risky to leak, by the closure of an IP circuit, that the HS has just fetched a new consensus?

The alternative behavior to avoid this research question would be:

If we have established an intro point on a node that disappeared from the consensus, keep the circuit open and answer to client requests as normal. Discard the circuit as normally (after its lifetime has expired or its max number of introductions have been performed).

I think this should be equivalent to not freeing the rend_intro_point_t in rend_services_introduce() if node is NULL.

comment:39 Changed 3 years ago by dgoulet

Branch under review in #4862 fixes this ticket.
(bug4862_027_02 - commit: 6d127695ea6640963e56857ab289df06d2f213aa)

comment:40 Changed 3 years ago by nickm

Merged #4862.

comment:41 Changed 3 years ago by nickm

Resolution: fixed
Status: needs_reviewclosed

Merged #4862.

comment:42 Changed 3 years ago by dgoulet

Keywords: SponsorR removed
Sponsor: SponsorR

comment:43 Changed 3 years ago by ageisp0lis

Severity: Blocker

Can't wait to see this fixed in the 0.2.7 stable release! We have Nagios monitoring of many hidden services, and we suspect this bug is a cause of weird socket timeouts and false positive alerts.

comment:44 in reply to:  43 Changed 3 years ago by asn

Replying to ageisp0lis:

Can't wait to see this fixed in the 0.2.7 stable release! We have Nagios monitoring of many hidden services, and we suspect this bug is a cause of weird socket timeouts and false positive alerts.

We attempted to fix the bug in 0.2.7, but it seems like it reappeared as #16702.
I'd suggest you use that ticket for your feedback.

Note: See TracTickets for help on using tickets.