Opened 5 years ago

Last modified 8 hours ago

#8185 merge_ready defect

circuit_package_relay_cell(): Bug: outgoing relay cell has n_chan==NULL. Dropping.

Reported by: mr-4 Owned by: nickm
Priority: High Milestone: Tor: 0.3.1.x-final
Component: Core Tor/Tor Version: Tor: 0.2.4.9-alpha
Severity: Major Keywords: tor-relay logging needs-analysis harmless? annoyance 031-backport 030-backport 029-backport
Cc: user33331111 Actual Points:
Parent ID: Points: 3
Reviewer: isis Sponsor:

Description

Don't know whether the above message is related to bugs #8093 or #6252 (if so, apologies!), but decided to report this anyway. The full message is:

[warn] {BUG} circuit_package_relay_cell(): Bug: outgoing relay cell has n_chan==NULL. Dropping.

There are no particular circumstances which I've noticed this message to appear (in fact, I've noticed it only when checking the tor logs a couple of hours later) - tor has been running more than a day at that point.

Child Tickets

Change History (57)

comment:1 Changed 5 years ago by nickm

Milestone: Tor: 0.2.4.x-final

comment:2 Changed 5 years ago by nickm

Keywords: tor-relay added

comment:3 Changed 5 years ago by nickm

Status: newneeds_review

Hm. I think the best we can do here is some kind of diagnostic branch to improve the bug message. Somebody is calling relay_send_command_from_edge on a circuit that hasn't actually negotiated with anything yet. (Do you have optimistic data enabled?)

Branch "bug8185_diagnostic" in my public repository tries to improve the log message in a way that might help to track this down.

comment:4 in reply to:  3 Changed 5 years ago by mr-4

Replying to nickm:

Hm. I think the best we can do here is some kind of diagnostic branch to improve the bug message. Somebody is calling relay_send_command_from_edge on a circuit that hasn't actually negotiated with anything yet. (Do you have optimistic data enabled?)

What is "optimistic data"?

Branch "bug8185_diagnostic" in my public repository tries to improve the log message in a way that might help to track this down.

Do I apply that on top of the latest tor version?

comment:5 Changed 5 years ago by nickm

Status: needs_reviewneeds_information

Merged the diagnostic branch to 0.2.4. Let's see if this causes anything we can fix. Putting this in needs-information pending more info.

(You probably aren't using optimistic data unless you know about it. Generally, yes, you can test branches by applying them to the latest tor)

comment:6 Changed 5 years ago by mr-4

I see that in the latest tor version (0.2.4.12) there is a message related to that bug, so I assume I don't need to do anything except deploy that tor version and check the logs (something I will do at the weekend), right?

comment:7 Changed 4 years ago by nickm

Right; the latest Tor version has new diagnostic code that tries to do a better job of figuring out why this bug might be happening.

comment:8 Changed 4 years ago by mr-4

Nothing to report so far for the above - I am not seeing this message after about 3 days of continuous tor operation, though I've got 2 other bug messages appearing, which are described in bugs #8858 and #8859.

I'll keep you posted if something turns up in my logs.

comment:9 Changed 4 years ago by mikeperry

This likely was a dup of #8024.

comment:10 Changed 4 years ago by nickm

Keywords: 024-backport added
Milestone: Tor: 0.2.4.x-finalTor: 0.2.5.x-final

Marking as 0.2.5 since we don't know why it's happening, and hasn't happened again. Marking as 024-backport in case the code that diagnoses it actually finds something.

comment:11 Changed 4 years ago by mr-4

Don't know whether the message below is related to this bug, but thought to report it here: Soon after tor start, I get this:

Aug 31 22:59:04.000 [warn] {OR} connection_edge_process_relay_cell (at origin) failed.
Aug 31 22:59:04.000 [warn] {PROTOCOL} circuit_receive_relay_cell (backward) failed. Closing.

It only happened about 3 times in quick succession and then stopped. I am still on Tor version 0.2.4.14-alpha (git-f5729b8c1d45933f)

comment:12 Changed 4 years ago by nickm

Milestone: Tor: 0.2.5.x-finalTor: 0.2.???

comment:13 Changed 4 years ago by andrea

Sending to 0.2.?? due to lack of information or evidence this is still occurring.

comment:14 Changed 4 years ago by mr-4

I haven't seen this since I last reported it (comment 11).

comment:15 Changed 20 months ago by cypherpunks

Severity: Normal

circuit_package_relay_cell: Bug: outgoing relay cell sent from src/or/relay.c:701 has n_chan==NULL. Dropping. (on Tor 0.2.8.1-alpha-dev 2d879bd39febfcce)

comment:16 Changed 20 months ago by cypherpunks

 relay_send_command_from_edge_(streamid_t stream_id, circuit_t *circ,
                               uint8_t relay_command, const char *payload,
                               size_t payload_len, crypt_path_t *cpath_layer,
                               const char *filename, int lineno)
 {
   cell_t cell;
   relay_header_t rh;
   cell_direction_t cell_direction;
   /* XXXX NM Split this function into a separate versions per circuit type? */
 
   tor_assert(circ);
   tor_assert(payload_len <= RELAY_PAYLOAD_SIZE);
 
+  if (circ->marked_for_close) /* XXX bug, race, normal? */
+    return -1;

comment:17 Changed 20 months ago by nickm

Milestone: Tor: 0.2.???Tor: 0.2.8.x-final
Status: needs_informationneeds_review

comment:18 Changed 20 months ago by nickm

Keywords: must-fix-before-028-rc added

Marking these as must-fix-before-028-rc.

Actually, some of them may not need to get 'fixed' before the rc, but I believe that they should either get fixed, or we should have a good explanation of why they don't need to get fixed.

comment:19 Changed 19 months ago by nickm

Keywords: 024-backport must-fix-before-028-rc removed
Milestone: Tor: 0.2.8.x-finalTor: 0.2.9.x-final

comment:20 Changed 19 months ago by nickm

Points: medium

comment:21 Changed 19 months ago by nickm

Keywords: TorCoreTeam201604 added

Every postponed needs_review ticket should get a review in April

comment:22 Changed 18 months ago by isis

Reviewer: isis

Adding myself as the reviewer because I started hitting this bug at some point between 0.2.7.x and 0.2.8.x while running my patches for #7144 under chutney (seemingly unrelated to my patches).

comment:23 Changed 18 months ago by isis

I've tested:

  • 0.2.9-alpha
  • 0.2.8 (maint)
  • 0.2.8.1-alpha
  • 0.2.7.2-alpha
  • 0.2.7.6

but I can't seem to trigger this bug anymore.

comment:24 Changed 18 months ago by cypherpunks

This bug message depends circuit in question and data client plans to append there. If entry relay lost n_chan (connection to middle relay attached to used circuit) then it sends destroy cell for some reason to client. Such cell processed by command_process_destroy_cell, thus marks circuit for close, clears n_chan, while keeps strteams attached to marked for close circuit. If any those stream's data client plans to package before circuit freed then message triggered.

Last edited 18 months ago by cypherpunks (previous) (diff)

comment:25 Changed 18 months ago by cypherpunks

Status: needs_reviewneeds_information

thus marks circuit for close, clears n_chan, while keeps strteams attached to marked for close circuit.

No. Seems like wrong patch. circuit_mark_for_close calls

    origin_circuit_t *ocirc = TO_ORIGIN_CIRCUIT(circ);
    edge_connection_t *conn;
    for (conn=ocirc->p_streams; conn; conn=conn->next_stream)
      connection_edge_destroy(circ->n_circ_id, conn);
    ocirc->p_streams = NULL;

Marked for close circuit has no attached streams.

comment:26 Changed 18 months ago by cypherpunks

Status: needs_informationneeds_revision
   if (CIRCUIT_IS_ORIGIN(circ)) {
-    if (pathbias_check_close(TO_ORIGIN_CIRCUIT(circ), reason) == -1) {
+    if (pathbias_check_close(TO_ORIGIN_CIRCUIT(circ), reason) == -1 &&
+        circ->n_chan) {
       /* Don't close it yet, we need to test it first */
       return;
     }

Maybe?

comment:27 Changed 18 months ago by cypherpunks

Ouch, circuit_mark_for_close_ for 0.2.8 is different, different reasons for different versions could lead to the same message.

comment:28 Changed 18 months ago by cypherpunks

Status: needs_revisionneeds_review

Maybe?

No, pathbias_send_usable_probe checks for such case already.
Lets to fix it for >=0.2.8.x only. Review comment:16 patch again plus comment:24 details.

comment:29 Changed 18 months ago by isis

Status: needs_reviewneeds_information

On 0.2.8.x, sending a destroy cell from the entry to the client while the client is trying to send data still doesn't seem to trigger this for me…

Since a bunch of the code for marking circuits for close and freeing them changed in 0.2.8, I agree that trying to track this bug down in N different tor versions is probably going to be too confusing. We should probably just focus on trying to find it in maint-0.2.8?

I'm putting this ticket back into 'needs_information' state, and hopefully more people will come along and try to figure out where this bug is coming from.

comment:30 in reply to:  29 Changed 18 months ago by cypherpunks

Replying to isis:

On 0.2.8.x, sending a destroy cell from the entry to the client while the client is trying to send data still doesn't seem to trigger this for me…

Since a bunch of the code for marking circuits for close and freeing them changed in 0.2.8, I agree that trying to track this bug down in N different tor versions is probably going to be too confusing. We should probably just focus on trying to find it in maint-0.2.8?

I'm putting this ticket back into 'needs_information' state, and hopefully more people will come along and try to figure out where this bug is coming from.

     if (!chan) {
       log_warn(LD_BUG,"outgoing relay cell sent from %s:%d has n_chan==NULL."
                " Dropping.", filename, lineno);
+      if (circ->marked_for_close) {
+        log_warn(LD_BUG,
+                 "Circuit n_circ_id %u was marked for close at %s:%d"
+                 " for reason %d", (unsigned)circ->n_circ_id,
+                 circ->marked_for_close_file, circ->marked_for_close,
+                 circ->marked_for_close_orig_reason);
+      }
       return 0; /* just drop it */
     }

comment:31 Changed 18 months ago by nickm

Keywords: TorCoreTeam-postponed-201604 added; TorCoreTeam201604 removed

These two weren't done in april. They *could* get done in May, if progress happens.

comment:32 Changed 18 months ago by gk

FWIW: I just hit this with a hardened-nightly:

May 10 10:33:06.000 [warn] circuit_package_relay_cell(): Bug: outgoing relay cell sent from src/or/relay.c:701 has n_chan==NULL. Dropping. (on Tor 0.2.9.0-alpha-dev 641cdc345c7a0e81)

comment:33 Changed 17 months ago by gk

Cc: user33331111 added

#19136 is a duplicate.

comment:34 Changed 17 months ago by isabela

Points: medium3

comment:35 Changed 15 months ago by isabela

Keywords: isaremoved added
Milestone: Tor: 0.2.9.x-finalTor: 0.2.???

comment:36 Changed 11 months ago by teor

Milestone: Tor: 0.2.???Tor: 0.3.???

Milestone renamed

comment:37 Changed 10 months ago by nickm

Keywords: tor-03-unspecified-201612 added
Milestone: Tor: 0.3.???Tor: unspecified

Finally admitting that 0.3.??? was a euphemism for Tor: unspecified all along.

comment:38 Changed 8 months ago by s7r

#19711 is a newer report of the same issue, happening on 0.3.x - hopefully we can work there and close both.

comment:39 Changed 7 months ago by cypherpunks

I've just noticed it on my latest stable Tor version. I have no idea what triggered this bug.

Mar 22 XX:XX:23.000 [warn] circuit_package_relay_cell(): Bug: outgoing relay cell sent from ../src/or/relay.c:701 has n_chan==NULL. Dropping. (on Tor 0.2.9.10 )
Mar 22 XX:XX:23.000 [warn] circuit_package_relay_cell(): Bug: outgoing relay cell sent from ../src/or/relay.c:701 has n_chan==NULL. Dropping. (on Tor 0.2.9.10 )
Mar 22 XX:XX:23.000 [warn] circuit_package_relay_cell(): Bug: outgoing relay cell sent from ../src/or/relay.c:701 has n_chan==NULL. Dropping. (on Tor 0.2.9.10 )

comment:40 Changed 5 months ago by nickm

Keywords: tor-03-unspecified-201612 removed

Remove an old triaging keyword.

comment:41 Changed 5 months ago by nickm

Keywords: TorCoreTeam-postponed-201604 removed

comment:42 Changed 5 months ago by nickm

Keywords: isaremoved removed

comment:43 Changed 5 months ago by nickm

Keywords: logging needs-analysis added
Milestone: Tor: unspecifiedTor: 0.3.2.x-final
Status: needs_informationnew

Kicking this into 0.3.2. The fix may be to edit the log message so it only logs above PROTOCOL_WARN if our favorite theory turns out to be false.

comment:44 Changed 4 weeks ago by cypherpunks

Keywords: harmless? annoyance added
Milestone: Tor: 0.3.2.x-finalTor: 0.3.3.x-final
Priority: MediumHigh

For recent duplicate tickets: #19136 #19711 #23105

comment:45 Changed 3 weeks ago by cypherpunks

#23679 is a duplicate.

comment:46 Changed 3 weeks ago by cypherpunks

Milestone: Tor: 0.3.3.x-finalTor: 0.3.2.x-final
Severity: NormalMajor

Got it again, seems it's affecting many of us,

[warn] circuit_package_relay_cell(): Bug: outgoing relay cell sent from src/or/relay.c:836 has n_chan==NULL. Dropping. (on Tor 0.3.1.5-alpha 83389502ee631465)
[warn] circuit_package_relay_cell(): Bug: outgoing relay cell sent from src/or/relay.c:836 has n_chan==NULL. Dropping. (on Tor 0.3.1.5-alpha 83389502ee631465)

comment:47 Changed 3 weeks ago by cypherpunks

Okay, the stack trace that Nick added may prove useful, here are the ones that popped up for me:

[warn] circuit_package_relay_cell(): Bug: outgoing relay cell sent from src/or/relay.c:825 has n_chan==NULL. Dropping. (on Tor 0.3.2.1-alpha )
[warn] Bug: . Stack trace: (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(log_backtrace+0x42) [0x55a025ae0c72] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(relay_send_command_from_edge_+0x50f) [0x55a0259d598f] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(connection_edge_send_command+0x62) [0x55a0259d5c72] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(connection_edge_package_raw_inbuf+0x1fa) [0x55a0259d5f9a] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(connection_edge_process_inbuf+0xd7) [0x55a025a702c7] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(+0x1071e9) [0x55a025a681e9] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(+0x4e1f1) [0x55a0259af1f1] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/libevent-2.0.so.5(event_base_loop+0x812) [0x7ff8cbc66002] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(do_main_loop+0x23d) [0x55a0259b028d] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(tor_main+0x1c15) [0x55a0259b3c05] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(main+0x19) [0x55a0259abab9] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1) [0x7ff8cadfd3f1] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(+0x4ab09) [0x55a0259abb09] (on Tor 0.3.2.1-alpha )
[warn] circuit_package_relay_cell(): Bug: outgoing relay cell sent from src/or/relay.c:825 has n_chan==NULL. Dropping. (on Tor 0.3.2.1-alpha )
[warn] Bug: . Stack trace: (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(log_backtrace+0x42) [0x55a025ae0c72] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(relay_send_command_from_edge_+0x50f) [0x55a0259d598f] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(connection_edge_send_command+0x62) [0x55a0259d5c72] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(connection_edge_package_raw_inbuf+0x1fa) [0x55a0259d5f9a] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(connection_edge_process_inbuf+0xd7) [0x55a025a702c7] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(+0x1071e9) [0x55a025a681e9] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(+0x4e1f1) [0x55a0259af1f1] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/libevent-2.0.so.5(event_base_loop+0x812) [0x7ff8cbc66002] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(do_main_loop+0x23d) [0x55a0259b028d] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(tor_main+0x1c15) [0x55a0259b3c05] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(main+0x19) [0x55a0259abab9] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1) [0x7ff8cadfd3f1] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(+0x4ab09) [0x55a0259abb09] (on Tor 0.3.2.1-alpha )
[warn] circuit_package_relay_cell(): Bug: outgoing relay cell sent from src/or/relay.c:825 has n_chan==NULL. Dropping. (on Tor 0.3.2.1-alpha )
[warn] Bug: . Stack trace: (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(log_backtrace+0x42) [0x55a025ae0c72] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(relay_send_command_from_edge_+0x50f) [0x55a0259d598f] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(connection_edge_send_command+0x62) [0x55a0259d5c72] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(connection_edge_package_raw_inbuf+0x1fa) [0x55a0259d5f9a] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(connection_edge_process_inbuf+0xd7) [0x55a025a702c7] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(+0x1071e9) [0x55a025a681e9] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(+0x4e1f1) [0x55a0259af1f1] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/libevent-2.0.so.5(event_base_loop+0x812) [0x7ff8cbc66002] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(do_main_loop+0x23d) [0x55a0259b028d] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(tor_main+0x1c15) [0x55a0259b3c05] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(main+0x19) [0x55a0259abab9] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1) [0x7ff8cadfd3f1] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(+0x4ab09) [0x55a0259abb09] (on Tor 0.3.2.1-alpha )
[warn] circuit_package_relay_cell(): Bug: outgoing relay cell sent from src/or/relay.c:825 has n_chan==NULL. Dropping. (on Tor 0.3.2.1-alpha )
[warn] Bug: . Stack trace: (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(log_backtrace+0x42) [0x55a025ae0c72] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(relay_send_command_from_edge_+0x50f) [0x55a0259d598f] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(connection_edge_send_command+0x62) [0x55a0259d5c72] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(connection_edge_package_raw_inbuf+0x1fa) [0x55a0259d5f9a] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(connection_edge_process_inbuf+0xd7) [0x55a025a702c7] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(+0x10763b) [0x55a025a6863b] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(+0x4e1f1) [0x55a0259af1f1] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/libevent-2.0.so.5(event_base_loop+0x812) [0x7ff8cbc66002] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(do_main_loop+0x23d) [0x55a0259b028d] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(tor_main+0x1c15) [0x55a0259b3c05] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(main+0x19) [0x55a0259abab9] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1) [0x7ff8cadfd3f1] (on Tor 0.3.2.1-alpha )
[warn] Bug:     /home/alfred/tor-browser_en-US/Browser/TorBrowser/Tor/tor(+0x4ab09) [0x55a0259abb09] (on Tor 0.3.2.1-alpha )
Last edited 3 weeks ago by cypherpunks (previous) (diff)

comment:48 Changed 3 weeks ago by nickm

Owner: set to nickm
Status: newaccepted

Interesting. It looks like the callstack here is:

  • circuit_package_relay_cell()
  • relay_send_command_from_edge_()
  • relay_send_command_from_edge()
  • connection_edge_package_raw_inbuf()
  • connection_edge_process_inbuf()

What else do we know?

  • This is an entry connection attached to an origin circuit.
  • As of connection_edge_process_inbuf(), the connection was in state AP_CONN_STATE_OPEN or AP_CONN_STATE_CONNECT_WAIT. (or else we wouldn't have called package_raw_inbuf())
  • As of connection_edge_package_raw_inbuf(), the connection was not marked for close. (Or else the function would have exited.)
  • By the time we reach the bottom of the callstack, the circuit's n_chan was NULL, which shouldn't be possible.

I think what I'd most like to know at this point is the exact status of the circuit -- is it completely corrupted, marked for close, or in-progress for a build?

I'm guessing that this connection was once attached to a working circuit (otherwise the earlier attempt to send the BEGIN cell would have failed with the same error) and then the circuit became non-working.

What if the channel becomes closed, and so channel_closed() we call circuit_unlink_all_from_channel()? That will clear the n_chan field of the circuit, and mark the circuit. But the streams attached to the circuit won't get marked for close or have their circuits removed until circuit_about_to_free() calls connection_edge_destroy().

I'm going to go with the theory that the circuit is marked, and add logging to catch in case it isn't. Cooking up a patch now...

comment:49 Changed 3 weeks ago by nickm

Status: acceptedneeds_review

Two branches in my public git repository:

  • bug8185_diagnostic_032 logs more information if this bug triggers.
  • bug8185_025 has a fix, assuming my theory above is right.
  • bug8185_031 is the fix, merged forward to 0.3.1, with merge conflict fixed.

comment:50 Changed 3 weeks ago by ahf

I think both the diagnostic patch and the proposed fix looks good.

Isis: do you think this should go into 0.3.2.2-alpha? I think this could be marked as merge ready.

comment:51 Changed 3 weeks ago by nickm

Status: needs_reviewmerge_ready

comment:52 in reply to:  50 Changed 3 weeks ago by isis

Replying to ahf:

I think both the diagnostic patch and the proposed fix looks good.

Isis: do you think this should go into 0.3.2.2-alpha? I think this could be marked as merge ready.


Yeah, this should go in, especially the diagnostics. I'm not wholly convinced the fix will actually fix this bug, since (in my experience) I believe I was hitting this bug while in the process of constructing a (loose-source) circuit, but nonetheless it's a good theory and the fix is a logical thing to do even if it doesn't end up fixing this bug entirely.

comment:53 Changed 3 weeks ago by nickm

Keywords: 031-backport 030-backport 029-backport added
Milestone: Tor: 0.3.2.x-finalTor: 0.3.1.x-final

Okay. I've merged bug8185_diagnostic_032 and bug8185_031 into master. I'll mark the latter for possible backport.

comment:54 Changed 2 weeks ago by cypherpunks

Okay. What do we have after hibernation?

Tor NOTICE: Your system clock just jumped 45518 seconds forward; assuming established circuits no longer work. 
Tor NOTICE: Heartbeat: Tor's uptime is 3:46 hours, with 1 circuits open. I've sent 1.97 MB and received 12.17 MB. 
Tor NOTICE: Average packaged cell fullness: 53.187%. TLS write overhead: 4% 
Tor WARN: tor_bug_occurred_(): Bug: scheduler_kist.c:520: kist_scheduler_schedule: Non-fatal assertion !((diff < 0)) failed. (Future instances of this warning will be silenced.) (on Tor 0.3.2.1-alpha ) 
Tor WARN: Bug: Non-fatal assertion !((diff < 0)) failed in kist_scheduler_schedule at scheduler_kist.c:520. (Stack trace not available) (on Tor 0.3.2.1-alpha ) 
Tor WARN: circuit_package_relay_cell(): Bug: outgoing relay cell sent from src/or/relay.c:825 has n_chan==NULL. Dropping. (on Tor 0.3.2.1-alpha ) 
Tor WARN: Bug: . (Stack trace not available) (on Tor 0.3.2.1-alpha ) 
Tor NOTICE: Tor has successfully opened a circuit. Looks like client functionality is working.
Tor WARN: Failed to find node for hop #1 of our path. Discarding this circuit. 
Tor NOTICE: Scheduler type KISTLite has been enabled. 

Seems with 1 circuits open points to the root cause of this bug.

Update: after second hibernation

Tor NOTICE: Our directory information is no longer up-to-date enough to build circuits: We're missing descriptors for some of our primary entry guards 
Tor NOTICE: I learned some more directory information, but not enough to build a circuit: We're missing descriptors for some of our primary entry guards 
Tor NOTICE: We now have enough directory information to build circuits.

joined the previous logs.

Last edited 2 weeks ago by cypherpunks (previous) (diff)

comment:55 Changed 2 weeks ago by teor

FYI, the scheduler bug in the log is fixed in #23696:

Tor WARN: tor_bug_occurred_(): Bug: scheduler_kist.c:520: kist_scheduler_schedule: Non-fatal assertion !((diff < 0)) failed. (Future instances of this warning will be silenced.) (on Tor 0.3.2.1-alpha ) 
Tor WARN: Bug: Non-fatal assertion !((diff < 0)) failed in kist_scheduler_schedule at scheduler_kist.c:520. (Stack trace not available) (on Tor 0.3.2.1-alpha ) 

comment:56 Changed 2 weeks ago by cypherpunks

Good news. With hibernation we have: before

[10-07 19:14:15] Torbutton INFO: controlPort >> 650 STREAM 901 CLOSED 192 5.8.6.1:443 REASON=DONE
[10-07 19:14:40] Torbutton INFO: controlPort >> 650 STREAM 898 CLOSED 192 5.8.6.1:443 REASON=DONE
...

~20 streams. Streams of the closed tab are closing for ~30 seconds. After hibernation

[10-08 00:21:39] Torbutton INFO: controlPort >> 650 STREAM 884 CLOSED 192 5.8.6.1:443 REASON=DONE
...

~30 streams. And for each of those we have

Tor WARN: tor_bug_occurred_(): Bug: scheduler_kist.c:520: kist_scheduler_schedule: Non-fatal assertion !((diff < 0)) failed. (Future instances of this warning will be silenced.) (on Tor 0.3.2.1-alpha ) 
Tor WARN: Bug: Non-fatal assertion !((diff < 0)) failed in kist_scheduler_schedule at scheduler_kist.c:520. (Stack trace not available) (on Tor 0.3.2.1-alpha ) 
Tor WARN: circuit_package_relay_cell(): Bug: outgoing relay cell sent from src/or/relay.c:825 has n_chan==NULL. Dropping. (on Tor 0.3.2.1-alpha ) 
Tor WARN: Bug: . (Stack trace not available) (on Tor 0.3.2.1-alpha ) 

comment:57 Changed 8 hours ago by cypherpunks

#23928 is a duplicate

Note: See TracTickets for help on using tickets.