Opened 14 months ago

Closed 4 months ago

#27471 closed defect (fixed)

HS intermittently fails: Non-fatal assertion failed in send_introduce1

Reported by: tgragnato Owned by: dgoulet
Priority: Very High Milestone: Tor: 0.4.2.x-final
Component: Core Tor/Tor Version: Tor: 0.3.2.1-alpha
Severity: Minor Keywords: tor-hs, network-team-roadmap-august
Cc: Actual Points:
Parent ID: #29995 Points: 3
Reviewer: asn Sponsor: Sponsor27-must

Description

When running 0.3.4.7-rc on Buster from deb.tpo

[WARN] Bug:     /usr/bin/tor(_start+0x2a) [0x55edbc918eea]
[WARN] Bug:     /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7fd267a49b17]
[WARN] Bug:     /usr/bin/tor(main+0x19) [0x55edbc918e99]
[WARN] Bug:     /usr/bin/tor(tor_main+0x3a) [0x55edbc9190ea]
[WARN] Bug:     /usr/bin/tor(tor_run_main+0x1005) [0x55edbc921155]
[WARN] Bug:     /usr/bin/tor(do_main_loop+0x205) [0x55edbc91ea85]
[WARN] Bug:     /usr/lib/x86_64-linux-gnu/libevent-2.1.so.6(event_base_loop+0x5a7) [0x7fd26873f537]
[WARN] Bug:     /usr/lib/x86_64-linux-gnu/libevent-2.1.so.6(+0x229ba) [0x7fd26873e9ba]
[WARN] Bug:     /usr/bin/tor(+0x5271e) [0x55edbc91c71e]
[WARN] Bug:     /usr/bin/tor(connection_handle_read+0xa73) [0x55edbc9decf3]
[WARN] Bug:     /usr/bin/tor(+0x11e016) [0x55edbc9e8016]
[WARN] Bug:     /usr/bin/tor(channel_tls_handle_cell+0x4db) [0x55edbc9a3e4b]
[WARN] Bug:     /usr/bin/tor(command_process_cell+0x318) [0x55edbc9c2368]
[WARN] Bug:     /usr/bin/tor(circuit_receive_relay_cell+0x2c4) [0x55edbc9448f4]
[WARN] Bug:     /usr/bin/tor(+0x788bb) [0x55edbc9428bb]
[WARN] Bug:     /usr/bin/tor(rend_process_relay_cell+0x293) [0x55edbc94b9b3]
[WARN] Bug:     /usr/bin/tor(hs_client_receive_rendezvous_acked+0x79) [0x55edbca31a89]
[WARN] Bug:     /usr/bin/tor(connection_ap_attach_pending+0x178) [0x55edbc9e1e58]
[WARN] Bug:     /usr/bin/tor(connection_ap_handshake_attach_circuit+0x3bd) [0x55edbc9c0f3d]
[WARN] Bug:     /usr/bin/tor(hs_client_send_introduce1+0x232) [0x55edbca31382]
[WARN] Bug:     /usr/bin/tor(tor_bug_occurred_+0xb9) [0x55edbca6e499]
[WARN] Bug:     /usr/bin/tor(log_backtrace+0x43) [0x55edbca537a3]
[WARN] Bug: Non-fatal assertion !(ip == NULL) failed in send_introduce1 at ../src/or/hs_client.c:559. Stack trace:
[WARN] tor_bug_occurred_(): Bug: ../src/or/hs_client.c:559: send_introduce1: Non-fatal assertion !(ip == NULL) failed.

Child Tickets

Change History (49)

comment:1 Changed 14 months ago by dgoulet

Keywords: tor-hs regression? added
Milestone: Tor: 0.3.5.x-final
Priority: Very LowVery High

Wow... this one is very surprising, I've never seen it in the wild until now...

The big picture of what this means is that we got the ACK from the Rendezvous so we were about to send the INTRODUCE1 cell to the intro point but using the HS identifier in the introduction circuit we have, we were unable to find the intro point object leading to the assert().

Thus, something somehow, made our intro point object to be removed or the intro point authentication key on the circuit is unknown to the service...

comment:2 Changed 14 months ago by dgoulet

One theory right now that seems to be plausible but kind of a corner case:

1) When we lookup the intro circuit in connection_ap_handshake_attach_circuit(), we go over all circuits and try to figure out which is the best for this connection. We use circuit_matches_with_rend_stream() to check if the circuit is acceptable by comparing the .onion requested by the SOCKS connection with the .onion on the intro circuit (hs_ident).

What if we had a set of intro points that aren't used by the service anymore so the client got NACK on all of them, then fetches a new descriptor with new intro points, then a new intro circuit is launched and finally the lookup function when the RP acked selected an old intro circuits but still for the same service.

Version 0, edited 14 months ago by dgoulet (next)

comment:3 Changed 14 months ago by dgoulet

Cc: asn added
Owner: set to dgoulet
Status: newaccepted

comment:4 Changed 14 months ago by dgoulet

I'm still unsure why this happened but it is clear that the RP/IP circuit were established. Once the RP ACKed the request, the authentication key in the IP hs_ident wasn't the one in the descriptor.

So between the initial requests for that .onion and the ACK from the RP, the descriptor simply changed and the tor client fetched/stored it.

That makes me think that when we store a *new* HS descriptor for A.onion that is we already had one in our cache and we are getting a new one, we should probably close every circuits related to the old one? Probably something like going over the IP in the old one, if they are different from the new one, close any related circuits.

That way, we don't end up with leftover circuit for the *same* A.onion but with outdated data.

comment:5 in reply to:  4 Changed 14 months ago by asn

Replying to dgoulet:

I'm still unsure why this happened but it is clear that the RP/IP circuit were established. Once the RP ACKed the request, the authentication key in the IP hs_ident wasn't the one in the descriptor.

So between the initial requests for that .onion and the ACK from the RP, the descriptor simply changed and the tor client fetched/stored it.

Hm, plausible.

That makes me think that when we store a *new* HS descriptor for A.onion that is we already had one in our cache and we are getting a new one, we should probably close every circuits related to the old one? Probably something like going over the IP in the old one, if they are different from the new one, close any related circuits.

That way, we don't end up with leftover circuit for the *same* A.onion but with outdated data.

Hm, how does v2 handle this situation?

Also, the above fix seems plausible, but it engineering to be fixed. Another approach would be to gracefully handle this error, and let the Tor client handle it on its own by trying the next intro point (from the new desc this time)?

comment:6 in reply to:  4 Changed 14 months ago by asn

Replying to dgoulet:

I'm still unsure why this happened but it is clear that the RP/IP circuit were established. Once the RP ACKed the request, the authentication key in the IP hs_ident wasn't the one in the descriptor.

So between the initial requests for that .onion and the ACK from the RP, the descriptor simply changed and the tor client fetched/stored it.

That makes me think that when we store a *new* HS descriptor for A.onion that is we already had one in our cache and we are getting a new one, we should probably close every circuits related to the old one? Probably something like going over the IP in the old one, if they are different from the new one, close any related circuits.

That way, we don't end up with leftover circuit for the *same* A.onion but with outdated data.

I'm good with the above "close obsolete circuits" fix as long as it's not complicated (ideally just a few simple lines of code). If it ends up being complicated code, we should consider other approaches, or maybe try to reproduce it.

comment:7 Changed 14 months ago by nickm

Keywords: 035-must added

Add the 035-must tag to some assertion failures, hangs, ci-blockers, etc.

comment:8 Changed 14 months ago by nickm

Mark all 035-must tickets as "very high"

comment:9 Changed 13 months ago by asan

I confirm the problem. I also got it on 0.3.4.8, Linux 64:

Oct 03 15:12:59.000 [warn] tor_bug_occurred_(): Bug: ../src/or/hs_client.c:559: send_introduce1: Non-fatal assertion !(ip == NULL) failed. (on Tor 0.3.4.8 )
Oct 03 15:12:59.000 [warn] Bug: Non-fatal assertion !(ip == NULL) failed in send_introduce1 at ../src/or/hs_client.c:559. Stack trace: (on Tor 0.3.4.8 )
Oct 03 15:12:59.000 [warn] Bug:     /usr/bin/tor(log_backtrace+0x42) [0x55f6b1b0a3e2] (on Tor 0.3.4.8 )
Oct 03 15:12:59.000 [warn] Bug:     /usr/bin/tor(tor_bug_occurred_+0xb7) [0x55f6b1b251d7] (on Tor 0.3.4.8 )
Oct 03 15:12:59.000 [warn] Bug:     /usr/bin/tor(hs_client_send_introduce1+0x232) [0x55f6b1ae7972] (on Tor 0.3.4.8 )
Oct 03 15:12:59.000 [warn] Bug:     /usr/bin/tor(connection_ap_handshake_attach_circuit+0x33c) [0x55f6b1a78f9c] (on Tor 0.3.4.8 )
Oct 03 15:12:59.000 [warn] Bug:     /usr/bin/tor(connection_ap_attach_pending+0x1a0) [0x55f6b1a999d0] (on Tor 0.3.4.8 )
Oct 03 15:12:59.000 [warn] Bug:     /usr/bin/tor(hs_client_receive_rendezvous_acked+0x79) [0x55f6b1ae80b9] (on Tor 0.3.4.8 )
Oct 03 15:12:59.000 [warn] Bug:     /usr/bin/tor(rend_process_relay_cell+0x28b) [0x55f6b1a0416b] (on Tor 0.3.4.8 )
Oct 03 15:12:59.000 [warn] Bug:     /usr/bin/tor(+0x79097) [0x55f6b19fb097] (on Tor 0.3.4.8 )
Oct 03 15:12:59.000 [warn] Bug:     /usr/bin/tor(circuit_receive_relay_cell+0x311) [0x55f6b19fd0b1] (on Tor 0.3.4.8 )
Oct 03 15:12:59.000 [warn] Bug:     /usr/bin/tor(command_process_cell+0x280) [0x55f6b1a7a420] (on Tor 0.3.4.8 )
Oct 03 15:12:59.000 [warn] Bug:     /usr/bin/tor(channel_tls_handle_cell+0x273) [0x55f6b1a5c213] (on Tor 0.3.4.8 )
Oct 03 15:12:59.000 [warn] Bug:     /usr/bin/tor(+0x11db2c) [0x55f6b1a9fb2c] (on Tor 0.3.4.8 )
Oct 03 15:12:59.000 [warn] Bug:     /usr/bin/tor(connection_handle_read+0x892) [0x55f6b1a96542] (on Tor 0.3.4.8 )
Oct 03 15:12:59.000 [warn] Bug:     /usr/bin/tor(+0x53a61) [0x55f6b19d5a61] (on Tor 0.3.4.8 )
Oct 03 15:12:59.000 [warn] Bug:     /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5(event_base_loop+0x7fc) [0x7efce43a43dc] (on Tor 0.3.4.8 )
Oct 03 15:12:59.000 [warn] Bug:     /usr/bin/tor(do_main_loop+0x25f) [0x55f6b19d7e0f] (on Tor 0.3.4.8 )
Oct 03 15:12:59.000 [warn] Bug:     /usr/bin/tor(tor_run_main+0x1165) [0x55f6b19da2e5] (on Tor 0.3.4.8 )
Oct 03 15:12:59.000 [warn] Bug:     /usr/bin/tor(tor_main+0x3a) [0x55f6b19d232a] (on Tor 0.3.4.8 )
Oct 03 15:12:59.000 [warn] Bug:     /usr/bin/tor(main+0x19) [0x55f6b19d2099] (on Tor 0.3.4.8 )
Oct 03 15:12:59.000 [warn] Bug:     /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7efce2d28b45] (on Tor 0.3.4.8 )
Oct 03 15:12:59.000 [warn] Bug:     /usr/bin/tor(+0x500e9) [0x55f6b19d20e9] (on Tor 0.3.4.8 )

It happened when I tried to reload browser tab with onion v3 service which is highly suspected to be blocked or censored using any means including sophisticated network attacks.

comment:10 Changed 13 months ago by dgoulet

Status: acceptedneeds_review

This adds the support to close client introduction circuits when a new descriptor is replacing an old one so we don't end up with unusable circuits leading to what I think the BUG() in this ticket is showing us.

Branch: ticket27471_035_01
PR: https://github.com/torproject/tor/pull/400

comment:11 Changed 13 months ago by dgoulet

Reviewer: asn

comment:12 in reply to:  10 ; Changed 13 months ago by asn

Status: needs_reviewneeds_revision

Replying to dgoulet:

This adds the support to close client introduction circuits when a new descriptor is replacing an old one so we don't end up with unusable circuits leading to what I think the BUG() in this ticket is showing us.

Branch: ticket27471_035_01
PR: https://github.com/torproject/tor/pull/400

Not a huge fan of this patch. I feel like I don't understand enough to ACK or NACK it.

The way I see it, is that this adds 100 non-trivial LoCs for a very unlikely edge-case that we think is causing the issue. Basically the assumption is that we just completed a rendezvous circuit, and between building the circuit and sending an INTRODUCE1 we happen to have fetched a new HS descriptor. Sounds plausible, but I don't quite understand why this edge-case would happen to two people so quickly; it seems pretty rare. What do we think made the client fetch a new HS descriptor at that exact time?

Another thing I don't like here is that we are adding a whole new "feature" of closing these useless connections that usually don't exist so all this new code will be unused 99.9% of the times.

Instead of that I would try to handle the BUG() in question more gracefully, and in the edge-case where we can't find an ip object for the circuit, we close the intro circuit, and establish a new one. Seems to me like this is a more natural way to do it, and less lines of code. Then the useless introduction circuit would just timeout at some point and become a measurement circuit or something.

comment:13 in reply to:  12 Changed 13 months ago by dgoulet

Replying to asn:

Replying to dgoulet:

This adds the support to close client introduction circuits when a new descriptor is replacing an old one so we don't end up with unusable circuits leading to what I think the BUG() in this ticket is showing us.

Branch: ticket27471_035_01
PR: https://github.com/torproject/tor/pull/400

Not a huge fan of this patch. I feel like I don't understand enough to ACK or NACK it.

The way I see it, is that this adds 100 non-trivial LoCs for a very unlikely edge-case that we think is causing the issue. Basically the assumption is that we just completed a rendezvous circuit, and between building the circuit and sending an INTRODUCE1 we happen to have fetched a new HS descriptor. Sounds plausible, but I don't quite understand why this edge-case would happen to two people so quickly; it seems pretty rare. What do we think made the client fetch a new HS descriptor at that exact time?

Right. Well, maybe because more people are getting means it might not be that of an edge-case and that HSv3 is being used more.

The second thing, that code doesn't add "non-trivial" lines imo... The majority of the code added (circuit_get_next_client_intro_circ()) is a known function that we use for many things but this one is specifically for client intro. It is extremely well tested over time.

Another thing I don't like here is that we are adding a whole new "feature" of closing these useless connections that usually don't exist so all this new code will be unused 99.9% of the times.

I don't agree. It is code used every time a client replaces a descriptor in its cache. I really doubt this is unused... Considering a service changes its descriptor every 3 hours, if the RP connection is simply closed between service<->client, this situation will happen.

Instead of that I would try to handle the BUG() in question more gracefully, and in the edge-case where we can't find an ip object for the circuit, we close the intro circuit, and establish a new one. Seems to me like this is a more natural way to do it, and less lines of code. Then the useless introduction circuit would just timeout at some point and become a measurement circuit or something.

Doing this would go like:

The caller (connection_ap_handshake_attach_circuit()) of hs_client_send_introduce1() handles 3 return code: Success, transient error and permanent error.

The former is easy, the second is when the HS subsystem did some actions to fix the problem so the stream can go back in "wait mode". Third one will close the SOCKS conn.

In this case, we would need to return a transient error meaning once we realize we can't send the INTRO1 cell, we will want to reextend the failing intro circ to a new IP and wait. Not too bad. The bad circuit would be reused meaning won't be looked up again.

comment:14 Changed 13 months ago by asn

ACK on the above.

To simplify your patch, would it be possible to unify circuit_get_next_client_intro_circ() and circuit_get_next_service_intro_circ() to reduce duplicate code?

If we do that I think we can get this merged and see if this thing ever appears again.

comment:15 in reply to:  14 ; Changed 13 months ago by dgoulet

Replying to asn:

ACK on the above.

To simplify your patch, would it be possible to unify circuit_get_next_client_intro_circ() and circuit_get_next_service_intro_circ() to reduce duplicate code?

Doable.

If we do that I think we can get this merged and see if this thing ever appears again.

Hmmmm wait, you want the close all circuits or the lets reextend on error?

comment:16 in reply to:  15 Changed 13 months ago by asn

Replying to dgoulet:

Replying to asn:

ACK on the above.

To simplify your patch, would it be possible to unify circuit_get_next_client_intro_circ() and circuit_get_next_service_intro_circ() to reduce duplicate code?

Doable.

If we do that I think we can get this merged and see if this thing ever appears again.

Hmmmm wait, you want the close all circuits or the lets reextend on error?

Please check my branch ticket27471_035_01 and PR:
https://github.com/torproject/tor/pull/414

I went with your approach of closing all open circuits, plus I added a fixup that unifies the two "get_intro_circ" functions into one.

If you like it, let's squash it and put it in merge_ready. If you don't like it, we can put your branch into merge_ready.

comment:17 Changed 13 months ago by nickm

Status: needs_revisionneeds_review

comment:18 Changed 13 months ago by dgoulet

Status: needs_reviewmerge_ready

I'm happy with this! I thought of it a bit more and I think we should close the circuits here instead of re-extending on error. I think we'll fix more issues over time instead of dealing with an error because we have unusable circuits lying around.

I thought of the attacker problem here where someone induces a client to fetch a descriptor (let say a malicious web page) and then somehow notices the close of the intro circuits and because the attacker would know the descriptor has new intro points, it could maybe deduce that it was in fact that .onion and thus I'm its Guard node.

However, I doubt this is a practical attack considering the narrow window where we replace a descriptor while we have an intro circuit inflight. It can happen once at a client but I doubt it can happen enough to lead to information leak.

comment:19 Changed 13 months ago by nickm

Hi! Could one of you take care of squashing this against maint-0.3.5? I just tried, but it didn't squash cleanly.

comment:20 in reply to:  19 Changed 13 months ago by dgoulet

Replying to nickm:

Hi! Could one of you take care of squashing this against maint-0.3.5? I just tried, but it didn't squash cleanly.

Branch: ticket27471_035_02

comment:21 Changed 13 months ago by nickm

Keywords: 025-backport added; 035-must removed

thanks! Merged that to 0.3.6, and leaving here for possible backport assuming nothing goes wrong. :)

comment:22 Changed 13 months ago by nickm

Keywords: 035-backport added; 025-backport removed

comment:23 Changed 13 months ago by asan

It happened again on Linux 0.3.4.8:

Oct 18 14:10:09.000 [warn] tor_bug_occurred_(): Bug: ../src/or/hs_client.c:559: send_introduce1: Non-fatal assertion !(ip == NULL) failed. (on Tor 0.3.4.8 )
Oct 18 14:10:09.000 [warn] Bug: Non-fatal assertion !(ip == NULL) failed in send_introduce1 at ../src/or/hs_client.c:559. Stack trace: (on Tor 0.3.4.8 )
Oct 18 14:10:09.000 [warn] Bug:     /usr/bin/tor(log_backtrace+0x42) [0x558198c723e2] (on Tor 0.3.4.8 )
Oct 18 14:10:09.000 [warn] Bug:     /usr/bin/tor(tor_bug_occurred_+0xb7) [0x558198c8d1d7] (on Tor 0.3.4.8 )
Oct 18 14:10:09.000 [warn] Bug:     /usr/bin/tor(hs_client_send_introduce1+0x232) [0x558198c4f972] (on Tor 0.3.4.8 )
Oct 18 14:10:09.000 [warn] Bug:     /usr/bin/tor(connection_ap_handshake_attach_circuit+0x33c) [0x558198be0f9c] (on Tor 0.3.4.8 )
Oct 18 14:10:09.000 [warn] Bug:     /usr/bin/tor(connection_ap_attach_pending+0x1a0) [0x558198c019d0] (on Tor 0.3.4.8 )
Oct 18 14:10:09.000 [warn] Bug:     /usr/bin/tor(hs_client_receive_rendezvous_acked+0x79) [0x558198c500b9] (on Tor 0.3.4.8 )
Oct 18 14:10:09.000 [warn] Bug:     /usr/bin/tor(rend_process_relay_cell+0x28b) [0x558198b6c16b] (on Tor 0.3.4.8 )
Oct 18 14:10:09.000 [warn] Bug:     /usr/bin/tor(+0x79097) [0x558198b63097] (on Tor 0.3.4.8 )
Oct 18 14:10:09.000 [warn] Bug:     /usr/bin/tor(circuit_receive_relay_cell+0x311) [0x558198b650b1] (on Tor 0.3.4.8 )
Oct 18 14:10:09.000 [warn] Bug:     /usr/bin/tor(command_process_cell+0x280) [0x558198be2420] (on Tor 0.3.4.8 )
Oct 18 14:10:09.000 [warn] Bug:     /usr/bin/tor(channel_tls_handle_cell+0x273) [0x558198bc4213] (on Tor 0.3.4.8 )
Oct 18 14:10:09.000 [warn] Bug:     /usr/bin/tor(+0x11db2c) [0x558198c07b2c] (on Tor 0.3.4.8 )
Oct 18 14:10:09.000 [warn] Bug:     /usr/bin/tor(connection_handle_read+0x892) [0x558198bfe542] (on Tor 0.3.4.8 )
Oct 18 14:10:09.000 [warn] Bug:     /usr/bin/tor(+0x53a61) [0x558198b3da61] (on Tor 0.3.4.8 )
Oct 18 14:10:09.000 [warn] Bug:     /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5(event_base_loop+0x7fc) [0x7ff36f52d3dc] (on Tor 0.3.4.8 )
Oct 18 14:10:09.000 [warn] Bug:     /usr/bin/tor(do_main_loop+0x25f) [0x558198b3fe0f] (on Tor 0.3.4.8 )
Oct 18 14:10:09.000 [warn] Bug:     /usr/bin/tor(tor_run_main+0x1165) [0x558198b422e5] (on Tor 0.3.4.8 )
Oct 18 14:10:09.000 [warn] Bug:     /usr/bin/tor(tor_main+0x3a) [0x558198b3a32a] (on Tor 0.3.4.8 )
Oct 18 14:10:09.000 [warn] Bug:     /usr/bin/tor(main+0x19) [0x558198b3a099] (on Tor 0.3.4.8 )
Oct 19 12:30:09.000 [warn] Bug:     /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7ff36deb1b45] (on Tor 0.3.4.8 )
Oct 19 12:30:09.000 [warn] Bug:     /usr/bin/tor(+0x500e9) [0x558198b3a0e9] (on Tor 0.3.4.8 )

I hope that you will port your changes to stable releases.

comment:24 Changed 12 months ago by dgoulet

Cc: asn removed
Keywords: 034-backport added; regression? removed
Version: Tor: 0.3.4.7-rcTor: 0.3.2.1-alpha

This actually applies back to 032 so flagging for our latest stable backport at least.

I'll provide an 034 branch soon, the merge isn't that simple...

comment:25 Changed 12 months ago by dgoulet

Status: merge_readyneeds_revision

comment:26 Changed 12 months ago by dgoulet

Status: needs_revisionmerge_ready

Pushed an 034 branch for the backport:

034: ticket27471_034_01
035: ticket27471_035_01

comment:27 Changed 12 months ago by nickm

Milestone: Tor: 0.3.5.x-finalTor: 0.3.4.x-final

If I merge this 035 one into 0.3.5, it has conflicts with master. Did you want me to try to resolve them, or just take the master version, or what?

comment:28 Changed 12 months ago by dgoulet

See comment:21 , it is already merged into master apparently?

comment:29 Changed 12 months ago by nickm

Milestone: Tor: 0.3.4.x-finalTor: 0.3.5.x-final

so, I could take it in 0.3.5, then do an "ours" merge into master. Is that what you have in mind?

comment:30 Changed 12 months ago by dgoulet

Correction (_02 branch for 035):

034: ticket27471_034_01
035: ticket27471_035_02

comment:31 Changed 12 months ago by nickm

Milestone: Tor: 0.3.5.x-finalTor: 0.3.4.x-final

Merged to 0.3.5; marking for backport.

comment:32 Changed 10 months ago by nickm

I'm not so sure about a backport here -- it's a pretty big patch. If it's not too bad, I suggest "no backport" for this.

comment:33 Changed 10 months ago by traumschule

This happened once after a clock jump on nightly (#28992).

comment:34 in reply to:  31 Changed 10 months ago by asan

Replying to nickm:

Merged to 0.3.5

The ticket is still in merge_ready state. Is this patch already applied in recently released 0.3.5.7 or 0.4.0.1-alpha? If yes, and you don't have plans to backport it, we should mark it as resolved.

comment:35 Changed 10 months ago by asn

Resolution: fixed
Status: merge_readyclosed

Let's not backport this given that it's a pretty big patch that doesn't fix a security issue or a fatal assertion. If someone disagrees, please reopen ticket.

comment:36 Changed 7 months ago by asan

Resolution: fixed
Status: closedreopened

Was it backported to 0.3.5? I got this in 0.3.5.8:

15:33:01.000 [warn] tor_bug_occurred_(): Bug: ../src/feature/hs/hs_client.c:597: send_introduce1: Non-fatal assertion !(ip == NULL) failed. (on Tor 0.3.5.8 )
15:33:01.000 [warn] Bug: Non-fatal assertion !(ip == NULL) failed in send_introduce1 at ../src/feature/hs/hs_client.c:597. Stack trace: (on Tor 0.3.5.8 )
15:33:01.000 [warn] Bug:     /usr/bin/tor(log_backtrace_impl+0x47) [0x5606cacbcab7] (on Tor 0.3.5.8 )
15:33:01.000 [warn] Bug:     /usr/bin/tor(tor_bug_occurred_+0xbe) [0x5606cacb822e] (on Tor 0.3.5.8 )
15:33:01.000 [warn] Bug:     /usr/bin/tor(hs_client_send_introduce1+0x202) [0x5606cabbe3c2] (on Tor 0.3.5.8 )
15:33:01.000 [warn] Bug:     /usr/bin/tor(connection_ap_handshake_attach_circuit+0x2e4) [0x5606cab54284] (on Tor 0.3.5.8 )
15:33:01.000 [warn] Bug:     /usr/bin/tor(connection_ap_attach_pending+0x1a0) [0x5606cab58080] (on Tor 0.3.5.8 )
15:33:01.000 [warn] Bug:     /usr/bin/tor(hs_client_receive_rendezvous_acked+0x79) [0x5606cabbea59] (on Tor 0.3.5.8 )
15:33:01.000 [warn] Bug:     /usr/bin/tor(rend_process_relay_cell+0x273) [0x5606cac05d23] (on Tor 0.3.5.8 )
15:33:01.000 [warn] Bug:     /usr/bin/tor(+0xb03fe) [0x5606cab723fe] (on Tor 0.3.5.8 )
15:33:01.000 [warn] Bug:     /usr/bin/tor(circuit_receive_relay_cell+0x369) [0x5606cab746e9] (on Tor 0.3.5.8 )
15:33:01.000 [warn] Bug:     /usr/bin/tor(command_process_cell+0x278) [0x5606cab556b8] (on Tor 0.3.5.8 )
15:33:01.000 [warn] Bug:     /usr/bin/tor(channel_tls_handle_cell+0x273) [0x5606cab3a3f3] (on Tor 0.3.5.8 )
15:33:01.000 [warn] Bug:     /usr/bin/tor(+0x9c26f) [0x5606cab5e26f] (on Tor 0.3.5.8 )
15:33:01.000 [warn] Bug:     /usr/bin/tor(connection_handle_read+0x96e) [0x5606cab27f7e] (on Tor 0.3.5.8 )
15:33:01.000 [warn] Bug:     /usr/bin/tor(+0x6bc41) [0x5606cab2dc41] (on Tor 0.3.5.8 )
15:33:01.000 [warn] Bug:     /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5(event_base_loop+0x7fc) [0x7fbb500373dc] (on Tor 0.3.5.8 )
15:33:01.000 [warn] Bug:     /usr/bin/tor(do_main_loop+0xb1) [0x5606cab2f021] (on Tor 0.3.5.8 )
15:33:01.000 [warn] Bug:     /usr/bin/tor(tor_run_main+0x11f5) [0x5606cab1d055] (on Tor 0.3.5.8 )
15:33:01.000 [warn] Bug:     /usr/bin/tor(tor_main+0x3a) [0x5606cab1a22a] (on Tor 0.3.5.8 )
15:33:01.000 [warn] Bug:     /usr/bin/tor(main+0x19) [0x5606cab19d89] (on Tor 0.3.5.8 )
15:33:01.000 [warn] Bug:     /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7fbb4e9bbb45] (on Tor 0.3.5.8 )
15:33:01.000 [warn] Bug:     /usr/bin/tor(+0x57dd9) [0x5606cab19dd9] (on Tor 0.3.5.8 )

Is it the same bug?

comment:37 Changed 7 months ago by asn

Actual Points: #29995
Points: 3
Sponsor: Sponsor27-must

Seems like the same bug, and that version should contain the fix. Thanks for letting us know.

comment:38 Changed 7 months ago by asn

Status: reopenednew

comment:39 Changed 7 months ago by asn

Actual Points: #29995
Parent ID: #29995

comment:40 Changed 7 months ago by asn

Potentially related to #29034.

comment:41 Changed 7 months ago by gaba

Keywords: network-team-roadmap-2019-Q1Q2 added

Add keyword for tickets in the network team roadmap.

comment:42 Changed 5 months ago by asn

Unclear if the fix from #29034 helped this issue. We managed to reproduce #29034 with HS circuits but need more investigation.

Last edited 5 months ago by asn (previous) (diff)

comment:43 Changed 5 months ago by nickm

Keywords: 034-unreached-backport added; 034-backport removed

0.3.4 is EOL; this ticket gets no further backporting.

comment:44 Changed 5 months ago by gaba

Owner: dgoulet deleted
Status: newassigned

dgoulet will assign himself to the ones he is working on right now.

comment:45 Changed 5 months ago by dgoulet

Owner: set to dgoulet
Status: assignedaccepted

My main theory here is related to #28970.

We've wiped a descriptor from the client cache but kept an intro circuit opened. Then, a new descriptor came in for the same service. The RP was launched and when it joined, the old intro circuit was selected leading to this BUG() of a missing IP object.

The original fix we did was good but not enough where we closed intro circuits when we were _replacing_ a descriptor but the case remains of the cache cleanup + desc fetch.

comment:46 Changed 5 months ago by dgoulet

Fix should be in #30921

comment:47 Changed 5 months ago by dgoulet

Keywords: 035-backport 034-unreached-backport removed
Milestone: Tor: 0.3.4.x-finalTor: 0.4.2.x-final

No backport.

comment:48 Changed 4 months ago by gaba

Keywords: network-team-roadmap-august added; network-team-roadmap-2019-Q1Q2 removed

comment:49 Changed 4 months ago by dgoulet

Resolution: fixed
Status: acceptedclosed

As per discussion in Stockholm, we consider that most likely #29034, #30921 and #30871 (merged in master, awaiting backport) should make this problem go away. Same assumption for #28970.

Take 4, lets be hopeful! We can re-open if this still happens on >= tor-0.4.1.3-alpha.

Note: See TracTickets for help on using tickets.