#8864 closed defect (fixed)
Hidden service: Suddenly the service does not respond to new connections (INTRODUCE2 cell on intro circ with no corresponding rend_intro_point_t)
Reported by: | reiam | Owned by: | |
---|---|---|---|
Priority: | Medium | Milestone: | Tor: 0.2.7.x-final |
Component: | Core Tor/Tor | Version: | Tor: 0.2.7 |
Severity: | Blocker | Keywords: | tor-hs, 023-backport, 025-triaged, 027-triaged-1-in |
Cc: | Actual Points: | ||
Parent ID: | Points: | unclear | |
Reviewer: | Sponsor: | SponsorR |
Description
I am hosting a hidden service with tor on "Windows Server 2008 R2 Standard 64-bit".
The clients can connect to my service without problems after I start tor (my service is available as it should be).
After some time (it depends - it can be 10 min, 30 min, up to 2 hours, but no longer than 2 hours) i get the following warning every few seconds:
[warn] rend_service_introduce(): Bug: Internal error: Got an INTRODUCE2 cell on an intro circ (for service "<censored onion address>") with no corresponding rend_intro_point_t.
(see attached tor logs)
If i get this warnings, the service is not reachable for new connections, so i have to restart tor (existing connections seem to be working).
Tor on the client side shows no error message (it seems that the service is reachable for the client, but the client does not get a response from the service, otherwise a message like "hidden service is unavailable (try again later)" would appear - which is NOT the case)
After the restart the service is available again, but after a maximum of 2 hours the same problem appears again (service is not available for new clients).
I have nothing special in my torrc file (only the default file with HiddenServiceDir and HiddenServicePort additionally defined).
Child Tickets
Attachments (3)
Change History (47)
Changed 7 years ago by
Attachment: | tor log.txt added |
---|
comment:1 Changed 7 years ago by
I switched my server to Ubuntu 10.4, but the issue ocurres again.
Is there any torrc config available to prevent from this issue?
comment:2 Changed 7 years ago by
Keywords: | tor-hs added; tor hidden services removed |
---|---|
Milestone: | → Tor: 0.2.3.x-final |
comment:3 Changed 7 years ago by
This one is going to be a bear to track down, I suspect.
First step is to get a debug-level log from the Tor running the hidden service. The more the better. You'll probably prefer doing that on the Linux machine. Extra credit if you have debug-level logs of a Tor client (with the clock synchronized) trying to access the hidden service when it generates one of these log lines.
https://www.torproject.org/docs/faq#Logs
Step two, once you've got the logs so we're not destroying what might be our only hint, is to try upgrading to 0.2.4.12-alpha and see if that changes things.
If it doesn't change things, then we should add some more debugging hints to an upcoming 0.2.4 release.
comment:4 follow-up: 5 Changed 7 years ago by
I attached my debug logfile.
The first time the problem appears is on line 384398.
I'll try the suggested alpa version and let you know if it happens again
comment:5 Changed 7 years ago by
I'll try the suggested alpa version and let you know if it happens again
The problem also exists in "0.2.4.12-alpha"
comment:6 Changed 7 years ago by
Some relevant lines:
2122 May 15 11:53:06.000 [info] rend_service_intro_has_opened(): Established circuit 33837 as introduction point for service <servicename> 2200 May 15 11:53:06.000 [info] rend_service_intro_has_opened(): Established circuit 33835 as introduction point for service <servicename> 2252 May 15 11:53:07.000 [info] rend_service_intro_established(): Received INTRO_ESTABLISHED cell on circuit 33837 for service <servicename> 2256 May 15 11:53:07.000 [info] rend_services_introduce(): Giving up on [scrubbed] as intro point for [scrubbed] (circuit disappeared). 2398 May 15 11:53:07.000 [info] rend_service_intro_established(): Received INTRO_ESTABLISHED cell on circuit 33835 for service <servicename>
On line 379613, the HS finishes launching descriptor-upload directory connections.
On line 382784, the HS finishes uploading one copy of its descriptor. On line 383034, it finishes uploading another.
comment:7 follow-up: 19 Changed 7 years ago by
find_intro_circuit
is crap. It should associate intro circuits with rend_intro_point_t
objects in the same way that find_intro_point
does.
The if (!node || !intro_circ) {
block in rend_services_introduce
should be an if (!intro_circ) {
block only. If node
is NULL
and intro_circ
is not, the client probably wants to (emit a log message and) close intro_circ
.
comment:8 Changed 6 years ago by
Keywords: | 023-backport added |
---|---|
Milestone: | Tor: 0.2.3.x-final → Tor: 0.2.4.x-final |
comment:10 Changed 6 years ago by
Milestone: | Tor: 0.2.4.x-final → Tor: 0.2.5.x-final |
---|
comment:11 Changed 6 years ago by
Keywords: | 025-triaged added |
---|
comment:12 Changed 6 years ago by
For 0.2.5, we should investigate find_intro_circuit enough to see whether it's broken and responsible for this bug. If it is, we should fix it in 0.2.5. If not, we can defer this to 0.2.6 or 0.2.???
comment:13 Changed 6 years ago by
Milestone: | Tor: 0.2.5.x-final → Tor: 0.2.6.x-final |
---|
comment:14 Changed 5 years ago by
Milestone: | Tor: 0.2.6.x-final → Tor: 0.2.??? |
---|
comment:16 Changed 5 years ago by
Replying to cypherpunks:
I get this too.
Can you get debug-level logs from a recent version? The rar file from last year seems to be damaged.
comment:17 follow-up: 18 Changed 5 years ago by
Are there still logs needed for this? I have been getting this problem since upgrading from 0.2.4.22 to 0.2.4.23. I am collecting a debug log file; once the issue shows up in it (it's an intermitent problem) I could provide it if it will help expedite a resolution.
comment:18 Changed 5 years ago by
Some notes from a brief look at 20130515115701.rar
:
The Hidden Service establishes 3 introduction circuits (with circid 33837 / 33835 / 58471) in the beginning:
May 15 11:53:06.000 [info] rend_service_intro_has_opened(): Established circuit 33837 as introduction point for service <servicename> May 15 11:53:06.000 [info] rend_service_intro_has_opened(): Established circuit 33835 as introduction point for service <servicename> May 15 11:53:07.000 [info] rend_service_intro_has_opened(): Established circuit 58471 as introduction point for service <servicename>
After that, the hidden service doesn't like some of its introduction circuits and gives up on them a few times:
May 15 11:53:07.000 [info] rend_services_introduce(): Giving up on [scrubbed] as intro point for [scrubbed] (circuit disappeared). May 15 11:53:08.000 [info] rend_services_introduce(): Giving up on [scrubbed] as intro point for [scrubbed] (circuit disappeared). May 15 11:53:08.000 [info] rend_services_introduce(): Giving up on [scrubbed] as intro point for [scrubbed] (circuit disappeared). May 15 11:53:08.000 [info] rend_services_introduce(): Giving up on [scrubbed] as intro point for [scrubbed] (circuit disappeared). May 15 11:53:09.000 [info] rend_services_introduce(): Giving up on [scrubbed] as intro point for [scrubbed] (circuit disappeared). ...
The interesting thing now is that if you look at all the INTRODUCE2
cells that are being handled, there is some sort of pattern. That is, cells that arrive on circuit 33835
are handled properly, but all INTRODUCE2
cells on circuits 33837
or 58471
produce this internal error. You can check this by doing grep -A1 "Received INTRODUCE2" 20130515115701.log
.
So what's the problem with those two specific circuits?
comment:19 follow-up: 20 Changed 5 years ago by
Replying to rransom:
The
if (!node || !intro_circ) {
block inrend_services_introduce
should be anif (!intro_circ) {
block only. Ifnode
isNULL
andintro_circ
is not, the client probably wants to (emit a log message and) closeintro_circ
.
Hm, that might be a good idea. IIUC, in the case where the IP is not in the consensus anymore, the problem is that even though the rend_intro_point_t
gets freed, the intro_circ
stays up and can be used by clients. So a client with an old consensus will use that node and send an INTRODUCE2
cell down the circuit that will trigger this warning.
This seeeeeems plausible. However, I'm not sure if it's what causes the problem in 20130515115701.rar
. The weird thing there is that the log message appears only a few seconds after the IPs get picked.
comment:20 Changed 5 years ago by
Replying to asn:
Replying to rransom:
The
if (!node || !intro_circ) {
block inrend_services_introduce
should be anif (!intro_circ) {
block only. Ifnode
isNULL
andintro_circ
is not, the client probably wants to (emit a log message and) closeintro_circ
.
Hm, that might be a good idea. IIUC, in the case where the IP is not in the consensus anymore, the problem is that even though the
rend_intro_point_t
gets freed, theintro_circ
stays up and can be used by clients. So a client with an old consensus will use that node and send anINTRODUCE2
cell down the circuit that will trigger this warning.
This seeeeeems plausible. However, I'm not sure if it's what causes the problem in
20130515115701.rar
. The weird thing there is that the log message appears only a few seconds after the IPs get picked.
Hrm, seems there is actually almost 3 minutes before seeing the first error message for this bad circuit 58471.
May 15 11:53:07.000 [info] rend_service_intro_has_opened(): Established circuit 58471 as introduction point for service <servicename> May 15 11:53:07.000 [info] rend_service_intro_established(): Received INTRO_ESTABLISHED cell on circuit 58471 for service <servicename> May 15 11:56:36.000 [info] rend_service_introduce(): Received INTRODUCE2 cell for service "<servicename>" on circ 58471. May 15 11:56:36.000 [warn] rend_service_introduce(): Bug: Internal error: Got an INTRODUCE2 cell on an intro circ (for service "<servicename>") with no corresponding rend_intro_point_t.
Fun fact, same goes for the other bad circuit (33837), almost 3 minutes also:
May 15 11:53:06.000 [info] rend_service_intro_has_opened(): Established circuit 33837 as introduction point for service <servicename> May 15 11:53:07.000 [info] rend_service_intro_established(): Received INTRO_ESTABLISHED cell on circuit 33837 for service <servicename> May 15 11:56:30.000 [info] rend_service_introduce(): Received INTRODUCE2 cell for service "<servicename>" on circ 33837. May 15 11:56:30.000 [warn] rend_service_introduce(): Bug: Internal error: Got an INTRODUCE2 cell on an intro circ (for service "<servicename>") with no corresponding rend_intro_point_t.
Where the good valid circuit (33835) sees the first cell ~30 seconds later:
May 15 11:53:06.000 [info] rend_service_intro_has_opened(): Established circuit 33835 as introduction point for service <servicename> May 15 11:53:07.000 [info] rend_service_intro_established(): Received INTRO_ESTABLISHED cell on circuit 33835 for service <servicename> May 15 11:53:42.000 [info] rend_service_introduce(): Received INTRODUCE2 cell for service "<servicename>" on circ 33835.
So a ~3 minutes delay, it might be possible that the intro point node was removed from the consensus. I also think that if (!node || !intro_circ) {
is wrong here. The node might have disapeared from the list but the intro circuit could still be "valid" thus it should be marked as closed if the node is NULL.
comment:21 Changed 5 years ago by
Status: | new → needs_review |
---|
Ok, I took a stab at it for a fix. Simpler I could make.
See branch in my personal repo: bug8864_025_v1
comment:22 Changed 5 years ago by
I wonder if it's maybe wrong to close the intro circuit immediately when the node is removed from the router list. Doing so makes it possible to tell exactly when the hidden service fetched a new consensus. Does that hurt us?
comment:23 Changed 5 years ago by
Small IRC log on that:
09:54 < dgoulet> nickm: so about #8864, it would leak that there is a new consensus for that HS but you'll only know if you are the IP no? I mean who can learn that the IP circuit is closing down is the HS and the IP or am I missing something?
09:55 < nickm> dgoulet: anybody who knows the IP can try connecting to it and see that the circuit is gone?
09:57 < dgoulet> nickm: yes but how can you differentiate from a node not in the consensus and IP rotation? hrm... I'm answering myself, you fetch the consensus, check if the node is in, connect to the IP, no HS there so you know that the HS has updated
10:00 < dgoulet> nickm: tbh, I'm not sure how bad it is though to know that the relay got a new consensus at a precise time :S
10:03 < nickm> dgoulet: what if you see when the relay downloads the consensus? That may be observable through other means.
10:03 < nickm> dgoulet: I'm not sure I'm 100% right, but I think we should figure this out
comment:24 Changed 5 years ago by
Some quick thoughts:
Assuming that the patch goes in, I guess the attack here is "You want to find when an HS fetches a new consensus. Monitor all its IPs, find the first time one of them is not in the consensus. At that time, keep on hitting the HS through that IP, and see when it stops responding. That's the point it probably loaded the new consensus.".
It's plausible I guess, but I can't find any plausible attacks that it allows. Maybe we are worrying that it might enable traffic confirmation attacks ("That HS caused this much traffic at exactly THEN")?
However, one might argue that that's already easy with hidden sercices anyway since you can send (and receive) traffic to/from it whenever you want.
More thinking required?
comment:25 follow-up: 26 Changed 5 years ago by
I'd be okay if we just did this at latest time we could fetch a new consensus. How hard/bad would that be? (And what's the harm of using an IP that's not in the consensus anyway?)
comment:26 Changed 5 years ago by
Replying to nickm:
I'd be okay if we just did this at latest time we could fetch a new consensus. How hard/bad would that be?
Hm, when is that I wonder?
Clients download a new consensus every 1 hour, right? If yes, the latest time that an always-on client could have fetched a consensus would be its valid-after
time + an hour? Or the time its published + an hour, if that's a different thing.
(And what's the harm of using an IP that's not in the consensus anyway?)
Hm, good question.
I guess the main issue is that clients with a newer consensus won't be able to use that IP. Or maybe the IP was found to be a bad relay?
(There are not many nice attacks that IPs can launch right now, but maybe that will change in the future when we introduce scalability etc.)
comment:27 Changed 5 years ago by
I guess the main issue is that clients with a newer consensus won't be able to use that IP.
No, they should be able to use it just fine; we include their onion keys in the hsdesc, right?
Or maybe the IP was found to be a bad relay?
That could be an issue.
comment:28 Changed 5 years ago by
Milestone: | Tor: 0.2.??? → Tor: 0.2.7.x-final |
---|
These might also be worth looking at in 0.2.7
comment:29 Changed 5 years ago by
Keywords: | SponsorR added |
---|
comment:30 Changed 5 years ago by
Keywords: | SponsorR removed |
---|
I receive this warn too. Tor 0.2.5.10 on Debian Wheezy 64 bit.
One server hosts 4 hidden services, mirrors and free services, pushing a big amount of HS traffic. I was able to identify the warn message for 2 of the hosted hidden services.
I have 2 other servers hosting other hidden services (one HS each) and get the warn on both of them as well.
When I've seen the warn in logs, all the HSes were accessible and were functioning normally. I did not have to restart Tor. CPU/RAM usage was also in normal parameters. At least in my cases, the warn did not affect HS functionality, all HSes were still reachable without restarting Tor when I checked (warn message timestamp: 09:11:08.000 and I checked the availability of HSes which were nominated by the warn message at 09:52:58 - about 41 mins later).
I don't know what happened to the clients trying to access the HSes exactly at the time of the warn (if there were any besides the one who caused the warning message).
comment:31 Changed 5 years ago by
Keywords: | SponsorR added |
---|
comment:32 Changed 5 years ago by
Keywords: | 027-triaged-1-in added |
---|
Marking some tickets as triaged-in for 0.2.7 based on early triage
comment:33 Changed 5 years ago by
Hello,
please see branch bug8864
in my repo for a fix to this issue according to comment:7 and subsequent discussion. I decided to go with the shortest fix possible, because that function is quite hairy and improving the code quality would not be easy. We should open a ticket for this. Instead, I decided to write some more docs for that part of the function, to make its behavior more clear.
It's also worth noting that this patch abides by two design decisions that are not clear in the spec. Namely:
a) If a relay is not in the consensus anymore, we close any intro circs to it.
b) If an intro circuit disappears, we free that intro point (this will change in #8239).
This might or might not be worth noting down in rend-spec.txt.
Also, I decided to term this as a bugfix on 0.2.3.10-alpha
because that's when this log message was introduced.
FWIW, my patch is very similar to David's in comment:21, but splits changes into multiple commits, adds a changes file, and fixes some weirdness.
comment:34 Changed 5 years ago by
Points: | → unclear |
---|---|
Version: | Tor: 0.2.3.25 → Tor: 0.2.7 |
comment:35 follow-ups: 36 38 Changed 5 years ago by
I think the code is fine here, but I have one question: is it perhaps risky to leak, by the closure of an IP circuit, that the HS has just fetched a new consensus?
comment:36 follow-up: 37 Changed 5 years ago by
Replying to nickm:
I think the code is fine here, but I have one question: is it perhaps risky to leak, by the closure of an IP circuit, that the HS has just fetched a new consensus?
Two elements I think here could notice, client and IP.
The IP doesn't know that this circuit is a specific HS with xyz.onion so at best it would know "one of the HS in the network" has maybe (could be a restart) a new consensus. But again that's not entirely true because if the circuit gets closed it's because the IP is out of the consensus but we are looking at it from the IP so that would mean an attacker firewalling heavily to prevent the IP from being in the consensus or modified tor? Could be crazy network issues I agree but...
The client could maybe notice it by establishing every seconds a circuit to the IP and delaying the cells needed on it to keep it open to see if the HS would close it fast. You would learn in theory the exact time of the new consensus but only if the IP was removed from it.
So yeah it seems that it would be possible from the client perspective to learn that as long as the IP is out of the consensus (which could be controlled by an attacker).
comment:37 Changed 5 years ago by
Replying to dgoulet:
The IP doesn't know that this circuit is a specific HS with xyz.onion so at best it would know "one of the HS in the network" has maybe (could be a restart) a new consensus.
(An intro point can know that it's the intro point for a given hidden service, if it has a list of hidden services, fetches their descriptors, and sees if the intro point key is in any of them. This is a more active adversary but still a reasonable one.)
comment:38 Changed 5 years ago by
Replying to nickm:
I think the code is fine here, but I have one question: is it perhaps risky to leak, by the closure of an IP circuit, that the HS has just fetched a new consensus?
The alternative behavior to avoid this research question would be:
If we have established an intro point on a node that disappeared from the consensus, keep the circuit open and answer to client requests as normal. Discard the circuit as normally (after its lifetime has expired or its max number of introductions have been performed).
I think this should be equivalent to not freeing the rend_intro_point_t
in rend_services_introduce()
if node
is NULL
.
comment:39 Changed 5 years ago by
Branch under review in #4862 fixes this ticket.
(bug4862_027_02
- commit: 6d127695ea6640963e56857ab289df06d2f213aa)
comment:42 Changed 4 years ago by
Keywords: | SponsorR removed |
---|---|
Sponsor: | → SponsorR |
comment:43 follow-up: 44 Changed 4 years ago by
Severity: | → Blocker |
---|
Can't wait to see this fixed in the 0.2.7 stable release! We have Nagios monitoring of many hidden services, and we suspect this bug is a cause of weird socket timeouts and false positive alerts.
comment:44 Changed 4 years ago by
Replying to ageisp0lis:
Can't wait to see this fixed in the 0.2.7 stable release! We have Nagios monitoring of many hidden services, and we suspect this bug is a cause of weird socket timeouts and false positive alerts.
We attempted to fix the bug in 0.2.7, but it seems like it reappeared as #16702.
I'd suggest you use that ticket for your feedback.
log1