Opened 4 months ago

Closed 3 months ago

#21369 closed defect (fixed)

Tor crashes with tor_assertion_failed_() [Assertion buf->datalen < INT_MAX failed in write_to_buf at ../src/or/buffers.c:832]

Reported by: svengo Owned by: nickm
Priority: Very High Milestone: Tor: 0.3.0.x-final
Component: Core Tor/Tor Version: Tor: 0.2.9.9
Severity: Critical Keywords: 029-backport, review-group-16
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

One (of two) tor instance has crashed two times today:

Feb  1 13:15:58 arnor Tor-eriador[5110]: tor_assertion_failed_(): Bug: ../src/or/buffers.c:832: write_to_buf: Assertion buf->datalen < INT_MAX failed; aborting. (on Tor 0.2.9.9 )
Feb  1 13:15:58 arnor Tor-eriador[5110]: Bug: Assertion buf->datalen < INT_MAX failed in write_to_buf at ../src/or/buffers.c:832.
Stack trace: (on Tor 0.2.9.9 )
Feb  1 13:15:58 arnor Tor-eriador[5110]: Bug:     /usr/bin/tor(log_backtrace+0x42) [0x556620673ba2] (on Tor 0.2.9.9 )
Feb  1 13:15:58 arnor Tor-eriador[5110]: Bug:     /usr/bin/tor(tor_assertion_failed_+0x8d) [0x55662068b71d] (on Tor 0.2.9.9 )
Feb  1 13:15:58 arnor Tor-eriador[5110]: Bug:     /usr/bin/tor(+0xadf7a) [0x5566205daf7a] (on Tor 0.2.9.9 )
Feb  1 13:15:58 arnor Tor-eriador[5110]: Bug:     /usr/bin/tor(connection_write_to_buf_impl_+0x294) [0x556620620294] (on Tor 0.2.9.9 )
Feb  1 13:15:58 arnor Tor-eriador[5110]: Bug:     /usr/bin/tor(+0x1081e6) [0x5566206351e6] (on Tor 0.2.9.9 )
Feb  1 13:15:58 arnor Tor-eriador[5110]: Bug:     /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5(event_base_loop+0x7fc) [0x7f5a40489a0c] (on Tor 0.2.9.9 )
Feb  1 13:15:58 arnor Tor-eriador[5110]: Bug:     /usr/bin/tor(do_main_loop+0x27d) [0x55662057079d] (on Tor 0.2.9.9 )
Feb  1 13:15:58 arnor Tor-eriador[5110]: Bug:     /usr/bin/tor(tor_main+0x1c25) [0x556620574105] (on Tor 0.2.9.9 )
Feb  1 13:15:58 arnor Tor-eriador[5110]: Bug:     /usr/bin/tor(main+0x19) [0x55662056c1b9] (on Tor 0.2.9.9 )
Feb  1 13:15:58 arnor Tor-eriador[5110]: Bug:     /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f5a3f5c4830] (on Tor 0.2.9.9 )
Feb  1 13:15:58 arnor Tor-eriador[5110]: Bug:     /usr/bin/tor(_start+0x29) [0x55662056c209] (on Tor 0.2.9.9 )
Feb  1 13:15:58 arnor systemd[1]: tor@eriador.service: Main process exited, code=killed, status=6/ABRT

And three hours later:

Feb  1 16:54:26 arnor Tor-eriador[24009]: tor_assertion_failed_(): Bug: ../src/or/buffers.c:832: write_to_buf: Assertion buf->datalen < INT_MAX failed; aborting. (on Tor 0.2.9.9 )
Feb  1 16:54:26 arnor Tor-eriador[24009]: Bug: Assertion buf->datalen < INT_MAX failed in write_to_buf at ../src/or/buffers.c:832. Stack trace: (on Tor 0.2.9.9 )
Feb  1 16:54:26 arnor Tor-eriador[24009]: Bug:     /usr/bin/tor(log_backtrace+0x42) [0x55744384dba2] (on Tor 0.2.9.9 )
Feb  1 16:54:26 arnor Tor-eriador[24009]: Bug:     /usr/bin/tor(tor_assertion_failed_+0x8d) [0x55744386571d] (on Tor 0.2.9.9 )
Feb  1 16:54:26 arnor Tor-eriador[24009]: Bug:     /usr/bin/tor(+0xadf7a) [0x5574437b4f7a] (on Tor 0.2.9.9 )
Feb  1 16:54:26 arnor Tor-eriador[24009]: Bug:     /usr/bin/tor(connection_write_to_buf_impl_+0x294) [0x5574437fa294] (on Tor 0.2.9.9 )
Feb  1 16:54:26 arnor Tor-eriador[24009]: Bug:     /usr/bin/tor(+0x1081e6) [0x55744380f1e6] (on Tor 0.2.9.9 )
Feb  1 16:54:26 arnor Tor-eriador[24009]: Bug:     /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5(event_base_loop+0x7fc) [0x7f164f746a0c] (on Tor 0.2.9.9 )
Feb  1 16:54:26 arnor Tor-eriador[24009]: Bug:     /usr/bin/tor(do_main_loop+0x27d) [0x55744374a79d] (on Tor 0.2.9.9 )
Feb  1 16:54:26 arnor Tor-eriador[24009]: Bug:     /usr/bin/tor(tor_main+0x1c25) [0x55744374e105] (on Tor 0.2.9.9 )
Feb  1 16:54:26 arnor Tor-eriador[24009]: Bug:     /usr/bin/tor(main+0x19) [0x5574437461b9] (on Tor 0.2.9.9 )
Feb  1 16:54:26 arnor Tor-eriador[24009]: Bug:     /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f164e881830] (on Tor 0.2.9.9 )
Feb  1 16:54:26 arnor Tor-eriador[24009]: Bug:     /usr/bin/tor(_start+0x29) [0x557443746209] (on Tor 0.2.9.9 )
Feb  1 16:54:26 arnor systemd[1]: tor@eriador.service: Main process exited, code=killed, status=6/ABRT

The rest of the time everything was okay.

Child Tickets

Change History (17)

comment:1 Changed 4 months ago by nickm

  • Component changed from - Select a component to Core Tor/Tor
  • Keywords 029-backport added
  • Milestone set to Tor: 0.3.0.x-final
  • Priority changed from Medium to Very High
  • Summary changed from Tor crashes with tor_assertion_failed_() to Tor crashes with tor_assertion_failed_() [Assertion buf->datalen < INT_MAX failed in write_to_buf at ../src/or/buffers.c:832]

comment:2 Changed 3 months ago by nickm

So, I can only think of two possibilities here:

  1. There was somehow an underflow in buf->datalen, and we subtracted more from it than we should have, thus taking it to a large positive value.
  1. There were really 2 gigabytes of data allocated in a single buffer.

comment:3 Changed 3 months ago by nickm

Also, I sure wish I knew what this function was:

 Feb  1 16:54:26 arnor Tor-eriador[24009]: Bug:     /usr/bin/tor(+0x1081e6) [0x55744380f1e6

comment:4 follow-ups: Changed 3 months ago by nickm

How is this Tor configured? Is it a relay, a hidden service, a client? Can you paste the torrc (or as much of it as is not private)?

comment:5 Changed 3 months ago by nickm

  • Owner set to nickm
  • Status changed from new to assigned

comment:6 Changed 3 months ago by nickm

  • Status changed from assigned to needs_review

My branch bug21269_check adds some testing code to check some possible causes of this bug. If you can reproduce the bug with this code enabled, that would be great.

It's also worth reviewing this for merge into 0.3.0, to see if it helps prevent or locate the bug.

comment:7 Changed 3 months ago by nickm

err, that should be bug21369_check.

comment:8 in reply to: ↑ 4 Changed 3 months ago by teor

Replying to nickm:

How is this Tor configured? Is it a relay, a hidden service, a client? Can you paste the torrc (or as much of it as is not private)?

I strongly suspect it's this relay:

https://atlas.torproject.org/#details/6DE61A6F72C1E5418A66BFED80DFB63E4C77668F

It has a DirPort, and also supports begindir over ORPort.

Replying to nickm:

Also, I sure wish I knew what this function was:

 Feb  1 16:54:26 arnor Tor-eriador[24009]: Bug:     /usr/bin/tor(+0x1081e6) [0x55744380f1e6

I think connection_dirserv_add_dir_bytes_to_outbuf() is the most likely candidate here. remaining/bytes is a signed integer that could easily go negative if the offset gets out of sync, and then the int64_t/ssize_t/size_t cast would make it a large positive integer (it's on 64-bit, note the x86_64-linux-gnu backtrace line).

Every other caller is either irrelevant to a relay, or passes a strlen(), or an existing size_t (ok, they might be corrupted earlier, but that would cause other issues, right?).

comment:9 Changed 3 months ago by teor

Also, eriador is a FallbackDir, and it is on IPv6.

So its begindir load is likely to be a little higher than other relays, and clients will arbitrarily terminate connections with it, if they get another successful connection first.

comment:10 Changed 3 months ago by nickm

Teor said:

I think connection_dirserv_add_dir_bytes_to_outbuf() is the most likely candidate here. remaining/bytes is a signed integer that could easily go negative if the offset gets out of sync, and then the int64_t/ssize_t/size_t cast would make it a large positive integer (it's on 64-bit, note the x86_64-linux-gnu backtrace line).

Hm. If this is the case, then I bet my patch would detect that case and stop it. What if I add an extra check in connection_dirserve_add_dir_bytes_to_outbuf() ?

comment:11 Changed 3 months ago by nickm

  • Keywords review-group-16 added

comment:12 in reply to: ↑ 4 Changed 3 months ago by svengo

Replying to nickm:

How is this Tor configured? Is it a relay, a hidden service, a client? Can you paste the torrc (or as much of it as is not private)?

Eriador is a simple, no exit relay. Here is my configuration:

sven@arnor:~$ cat /var/run/tor-instances/eriador.defaults
DataDirectory /var/lib/tor-instances/eriador
PidFile /var/run/tor-instances/eriador/tor.pid
RunAsDaemon 0
User _tor-eriador
SyslogIdentityTag eriador

ControlSocket /var/run/tor-instances/eriador/control GroupWritable RelaxDirModeCheck
SocksPort unix:/var/run/tor-instances/eriador/socks WorldWritable

CookieAuthentication 1
CookieAuthFileGroupReadable 1
CookieAuthFile /var/run/tor-instances/eriador/control.authcookie

Log notice syslog
sven@arnor:~$ cat /etc/tor/instances/eriador/torrc
# This is the tor configuration file for tor instance eriador.
#
# To start/reload/etc this instance, run "systemctl tor@eriador start" (or reload, or..).
# This instance will run as user _tor-eriador; its data directory is /var/lib/tor-instances/eriador.
#
# Append to the list of socks interfaces configured via
# /usr/share/tor/tor-service-defaults-torrc-instances
# which is unix:/var/run/tor-instances/eriador/socks

+SocksPort auto

#ControlPort 9051
HashedControlPassword 16:CBBCB54F3924BCA06001CFBFE8C15F7D8DDB768CE9B34E567E6A36303B
ORPort 4051
Nickname eriador
ContactInfo hwertiout695@safe-mail.net
DirPort 4951
ExitPolicy reject *:* # no exits allowed
NumCPUs 2
EntryStatistics 1
ConnDirectionStatistics 1
HiddenServiceStatistics 1
DirReqStatistics 1
MyFamily 9FBEB75E8BC142565F12CBBE078D63310236A334

# IPv6
ORPort [2001:41d0:1:8989::1]:4051
DirPort [2001:41d0:1:8989::1]:4951 NoAdvertise

The bug has not occurred since then. The machine has many free resources (4 cpu cores), so this cannot be the problem:

sven@arnor:~$ uptime
 16:28:00 up 12 days, 12:27,  1 user,  load average: 0,41, 0,46, 0,45
sven@arnor:~$ free
              total        used        free      shared  buff/cache   available
Mem:       16337668      822700    13083000      183580     2431968    14917912
Swap:       1046524           0     1046524

comment:13 Changed 3 months ago by nickm

Teor made some suggestions, which I'm trying to take into account in a new branch.

The new branch is bug21369_check_029 -- I've rebased the old patch onto 0.2.9, since we might want to backport a fix here eventually.

comment:14 Changed 3 months ago by teor

  • Status changed from needs_review to merge_ready

Seems sensible to me. Let's get it merged and start squashing bugs.
(Someone has run this, right?)

comment:15 Changed 3 months ago by nickm

Squashed the branch as bug21369_check_029_squashed.

Passes check and test-network-all.

comment:16 Changed 3 months ago by teor

Passes make test-network-all with the #21570 logging changes, so the new BUG warnings aren't being logged all the time.

Let's merge this.

comment:17 Changed 3 months ago by nickm

  • Resolution set to fixed
  • Status changed from merge_ready to closed

Merged; thanks for the review!

Note: See TracTickets for help on using tickets.