Opened 11 months ago

Last modified 4 months ago

#32564 needs_information defect

Assertion pol->magic failed

Reported by: Logforme Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version: Tor: 0.4.1.6
Severity: Normal Keywords: assert, crash, backport
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Guard relay 855BC2DABE24C861CD887DB9B2E950424B49FC34 crashed with the following log:

Nov 21 00:18:01.000 [err] tor_assertion_failed_(): Bug: ../src/core/or/circuitmux_ewma.c:165: TO_EWMA_POL_CIRC_DATA: Assertion pol->magic == 0x761e7747U failed; aborting. (on Tor 0.4.1.6 )
Nov 21 00:18:01.000 [err] Bug: Assertion pol->magic == 0x761e7747U failed in TO_EWMA_POL_CIRC_DATA at ../src/core/or/circuitmux_ewma.c:165: . Stack trace: (on Tor 0.4.1.6 )
Nov 21 00:18:01.000 [err] Bug:     /usr/bin/tor(log_backtrace_impl+0x47) [0x55d5e9f968e7] (on Tor 0.4.1.6 )
Nov 21 00:18:01.000 [err] Bug:     /usr/bin/tor(tor_assertion_failed_+0x147) [0x55d5e9f919c7] (on Tor 0.4.1.6 )
Nov 21 00:18:01.000 [err] Bug:     /usr/bin/tor(+0x8fe84) [0x55d5e9e14e84] (on Tor 0.4.1.6 )
Nov 21 00:18:01.000 [err] Bug:     /usr/bin/tor(+0xb839f) [0x55d5e9e3d39f] (on Tor 0.4.1.6 )
Nov 21 00:18:01.000 [err] Bug:     /usr/bin/tor(circuit_receive_relay_cell+0x29a) [0x55d5e9e419fa] (on Tor 0.4.1.6 )
Nov 21 00:18:01.000 [err] Bug:     /usr/bin/tor(command_process_cell+0x2fc) [0x55d5e9e23a1c] (on Tor 0.4.1.6 )
Nov 21 00:18:01.000 [err] Bug:     /usr/bin/tor(channel_tls_handle_cell+0x333) [0x55d5e9e030d3] (on Tor 0.4.1.6 )
Nov 21 00:18:01.000 [err] Bug:     /usr/bin/tor(+0xa773f) [0x55d5e9e2c73f] (on Tor 0.4.1.6 )
Nov 21 00:18:01.000 [err] Bug:     /usr/bin/tor(connection_handle_read+0x990) [0x55d5e9df0500] (on Tor 0.4.1.6 )
Nov 21 00:18:01.000 [err] Bug:     /usr/bin/tor(+0x707ee) [0x55d5e9df57ee] (on Tor 0.4.1.6 )
Nov 21 00:18:01.000 [err] Bug:     /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5(event_base_loop+0x6a0) [0x7fb82bb5f5a0] (on Tor 0.4.1.6 )
Nov 21 00:18:01.000 [err] Bug:     /usr/bin/tor(do_main_loop+0x105) [0x55d5e9df6b25] (on Tor 0.4.1.6 )
Nov 21 00:18:01.000 [err] Bug:     /usr/bin/tor(tor_run_main+0x1225) [0x55d5e9de4545] (on Tor 0.4.1.6 )
Nov 21 00:18:01.000 [err] Bug:     /usr/bin/tor(tor_main+0x3a) [0x55d5e9de193a] (on Tor 0.4.1.6 )
Nov 21 00:18:01.000 [err] Bug:     /usr/bin/tor(main+0x19) [0x55d5e9de14b9] (on Tor 0.4.1.6 )
Nov 21 00:18:01.000 [err] Bug:     /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1) [0x7fb82a3b32e1] (on Tor 0.4.1.6 )
Nov 21 00:18:01.000 [err] Bug:     /usr/bin/tor(_start+0x2a) [0x55d5e9de150a] (on Tor 0.4.1.6 )

After the relay automatically restarted the log had the following warning:

Nov 21 00:18:07.000 [warn] Incorrect ed25519 signature(s)

Possibly the same as #16423 according to Nick.

Relay is one of two relays running on a Debian box. Memory and CPU usage are normal.

Odd things about the relay:

  • 1. About a week ago I had log entries (rotated unfortunately) to the effect of not being able to apply consensus diffs (wrong hash) and eventually "no longer serving directory info to clients"
  • 2. Lately the two relays has seen an upswing in traffic. Sometimes their combined BW hits my ISPs ceiling.
  • 3. Over time the memory usages of the relays grows. Initially they are around 700MB. Once they use most of the RAM (4GB) I reboot the machine. When the assert happened the RAM usage was nowhere near that.
  • 4. I run my home brewed monitoring software that uses "SETEVENT BW" and calls "GETINFO orconn-status ns/id/<fingerprint> status/fresh-relay-descs" every 10 minutes.

Child Tickets

TicketStatusOwnerSummaryComponent
#33290merge_readynickmAdd diagnostics for confusing corruption issue #32564 in ewmaCore Tor/Tor

Change History (12)

comment:1 Changed 11 months ago by dgoulet

So the only path into the EWMA subsystem I can find from circuit_receive_relay_cell() is through this path:

  • circuit_receive_relay_cell()
    • append_cell_to_circuit_queue()
      • update_circuit_on_cmux_()
        • circuitmux_set_num_cells()
          • circuitmux_make_circuit_inactive() _OR_
          • circuitmux_make_circuit_active()

Both either call notify_circ_inactive() or notify_circ_active() which EWMA subsystem sets those function pointers.

Which means that we have in the chanid_circid_muxinfo_map map an entry that has either things that were freed() or never initialized.

comment:2 Changed 11 months ago by nickm

Keywords: crash backport added
Milestone: Tor: 0.4.3.x-final
Priority: MediumHigh

comment:3 Changed 9 months ago by ahf

Keywords: 043-must added

comment:4 Changed 9 months ago by nickm

Owner: set to nickm
Status: newassigned

I'll try to figure out what's going on here.

comment:5 Changed 9 months ago by nickm

Mark 043-must tickets as high priority

comment:6 Changed 8 months ago by nickm

So, there is only one kind of circuitmux_policy_circ_data_t in 0.4.1.x, so we don't need to worry about the case where this is another kind of data. The pointer here is either:

  • An invalid pointer that is not pointing to an ewma_policy_circ_data_t.
  • A stale pointer that used to point to an ewma_policy_circ_data_t which has been freed.
  • A pointer that points to an ewma_policy_circ_data_t which has gotten corrupt.

But I can't find plausible vectors for any of these things to happen. Because of the infrequency of this bug, I am almost tempted to suspect faulty memory, but that should always be a last-resort hypothesis.

The best I can figure out here right now is to add a diagnostic patch. I'll open a child ticket.

comment:7 Changed 8 months ago by nickm

Owner: nickm deleted

comment:8 Changed 8 months ago by nickm

Status: assignedneeds_information

Unless somebody else has insight here, I'm going to have to hope that the diagnostic patch in #33290 finds something.

comment:9 Changed 7 months ago by dgoulet

Milestone: Tor: 0.4.3.x-finalTor: 0.4.4.x-final

Until we can get more information, deferring to 044.

comment:10 Changed 7 months ago by teor

Keywords: 044-should added; 043-must removed

This ticket is in 0.4.4, so it can't be 043-must.

Last edited 7 months ago by teor (previous) (diff)

comment:11 Changed 5 months ago by nickm

Keywords: 044-must added

Add 044-must to all crash tickets in 0.4.4

comment:12 Changed 4 months ago by nickm

Keywords: 044-should 044-must removed
Milestone: Tor: 0.4.4.x-finalTor: unspecified
Priority: HighMedium

This doesn't seem to be happening a lot and it is still needs_info; we can't do it in 044 without knowing that it is still happening and getting the above info solved.

Note: See TracTickets for help on using tickets.