Opened 8 years ago

Closed 5 years ago

#8031 closed defect (fixed)

Assertion fast_memeq(md->body, "onion-key", 9) failed;

Reported by: cypherpunks Owned by:
Priority: High Milestone: Tor: 0.2.4.x-final
Component: Core Tor/Tor Version: Tor: 0.2.4.6-alpha
Severity: Keywords: tor-relay assertion crash 023-backport
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

I am running Tor v0.2.4.6-alpha (git-b13c6becc892d971) running on Linux with Libevent 1.4.13-stable and OpenSSL 0.9.8o.

The system is running i686 debian linux 2.6.38.2 with some grsec patches applied.

Tor crashed with the following message in the console:

console

[... cut... ] [notice] Opening Directory listener on 0.0.0.0:3128

src/or/microdesc.c:492 microdesc_cache_rebuild: Assertion

fast_memeq(md->body, "onion-key", 9) failed; aborting.

Aborted.

end of console

/var/log/tor/log from just before:

Jan 17 23:44:09.000 [warn] Your computer is too slow to handle this many circuit creation requests! Please consider using the MaxAdvertisedBandwidth config option or choosing a more restricted exit policy. [50 similar message(s) suppressed in last 60 seconds]
Jan 17 23:45:09.000 [warn] Your computer is too slow to handle this many circuit creation requests! Please consider using the MaxAdvertisedBandwidth config option or choosing a more restricted exit policy. [184 similar message(s) suppressed in last 60 seconds]
Jan 17 23:46:09.000 [warn] Your computer i

end of /var/log/tor/log (it ends abruptly)

The log is big (17M) and dates a year or so back.
Looking through it now, I discovered that it routinely complained about things such as:
====
Jan 04 09:04:31.000 [warn] Failing because we have 1016 connections already. Please raise your ulimit -n. [272740629 similar message(s) suppressed in last 21600 seconds]
Jan 04 09:48:12.000 [warn] Cannot seed RNG -- no entropy source found.
Jan 04 09:55:16.000 [warn] Couldn't open "/var/lib/tor2/state.tmp" (/var/lib/tor2/state) for writing: Too many open files
Jan 04 09:55:16.000 [warn] Unable to write state to file "/var/lib/tor2/state"; will try again later
====

This feels like mismanagement on my part, but the console didn't say anything about it, so I never checked the logs before. It was running in a chroot and I must have forgotten to give it read permission for the chrooted /dev/{u,}random.
I hope this has nothing to do with the bug, but maybe a new change request should be added to write messages like that on the console?

I am available on irc as "sadoper" if you need further information.

Child Tickets

Change History (29)

comment:1 Changed 8 years ago by asn

Keywords: tor-relay added

comment:2 Changed 8 years ago by nickm

Milestone: Tor: 0.2.4.x-final

Do you have a copy of the cached-microdescs and cached-microdescs.new files that caused this error?

comment:3 Changed 8 years ago by asn

(This looks related to #2022, btw.)

comment:4 Changed 8 years ago by rransom

This is probably also related to #2954 and a bunch of other tickets over the years about Tor failing to write cached-* properly, then complaining later when it can't read the junk that it wrote.

Switch to storing each directory document (e.g. descriptor or microdescriptor) in its own file (the DJB-style option), or switch to SQLite (the Windows/Sendmail/Firefox/Tor-style option). You're never going to fix Tor's file-writing code.

comment:5 Changed 8 years ago by cypherpunks

I just started my tor relay again, it crashed again.

Same error:
src/or/microdesc.c:492 microdesc_cache_rebuild: Assertion fast_memeq(md->body, "onion-key", 9) failed; aborting.

The microdescs are available here:

https://jacquemain.info/cached-microdescs

https://jacquemain.info/cached-microdescs.new

comment:6 Changed 8 years ago by cypherpunks

Could you please gzip original cached-microdescs.new and post link to archived file? And for cached-microdescs probably the same.

comment:7 Changed 8 years ago by cypherpunks

Except memory corruption there are only one way to get this assert. Only if dump_microdescriptor() failed in some but wrote something to file, like annotation only or md->body partially. If that happens then every next md->off will point to some wrong place, and assert happen. We need to see actual log files, at least we should to see "After rebuilding microdesc cache, offsets seem wrong. " string there, even if theory wrong and code contains another way to trigger this assert.

comment:8 Changed 8 years ago by cypherpunks

fsync() may need to call during finish_writing_to_file(). mmap just after non flushed fclose, just wonder what it could map and where if buffers around.

comment:9 Changed 8 years ago by cypherpunks

The fdopen() function associates a stream with the existing file
descriptor, fd. The mode of the stream (one of the values "r", "r+",
"w", "w+", "a", "a+") must be compatible with the mode of the file
descriptor.

Is

a Open for appending (writing at end of file). The file is cre‐

ated if it does not exist. The stream is positioned at the end
of the file.

compatible with:

O_CREAT|O_TRUNC

?
What happen if such file already exist?

comment:10 Changed 8 years ago by cypherpunks

https://lwn.net/Articles/457667/
fsync is hard, subtle and fragile.

comment:11 Changed 8 years ago by cypherpunks

there are only one way to get this assert.

finish_writing_to_file() is another way to trigger assert, if it failed then no cache replace happens so offsets can't be correct for any md.

comment:12 in reply to:  description Changed 8 years ago by arma

Replying to cypherpunks:

It was running in a chroot and I must have forgotten to give it read permission for the chrooted /dev/{u,}random.

It's more likely in this case that your Tor didn't have any available file descriptors, and it wanted to use a file descriptor to open your /dev/random.

comment:13 Changed 8 years ago by cypherpunks

You can reproduce crash for second way. First you need to trigger rebuilding the microdescriptor cache, the simple way is to swap cached-microdescs and cached-microdescs.new files so size of journal was bigger than cache for sure. Second you need to change file attributes on a file system to attribute file as immutable: chattr +i cached-microdescs (it probably need a root privs). Start Tor, see crash:

[Warning] Error replacing "./cached-microdescs": Operation not permitted
[Error] microdesc_cache_rebuild(): Bug: After rebuilding microdesc cache, offsets seem wrong.  At offset xxx, I expected to find a microdescriptor starting  with "onion-key".  Instead I got "xxxxxxxxxxxxxxxxx".
[Error] microdesc_cache_rebuild(): Bug: microdesc.c:491: microdesc_cache_rebuild: Assertion fast_memeq(md->body, "onion-key", 9) failed; aborting.

comment:14 Changed 8 years ago by cypherpunks

Posted cached-microdescs older than cached-microdescs.new, If jamming was happen during flush to cached-microdescs.tmp then cached-microdescs should be newer. Jamming occurs in the cached-microdescs.new, found 1 for a while. offset 0x73000 (btw, 0x73000 % 512 == 0)

@last-listed 2013-01-18 01:59:21
onion-key
-----BEGIN RSA PUBLIC KEY-----
MIGJAoGBAK1INwbsiJkcOQxNfZAvfBtNk2Iy3Sd+iqmsHiKhjmmbD0WL3WT3UBPz
xGGd6duwk2aJyNgkhdYXhG5E+GhYYvLPfmP2caybDsS/WocseFZILbqBDrvilXGV
8mW+Mkqge7ezm241fktXxU8HxQsWMnelOdaPBM/7/eRxe+3PFCNfAgMBAAE=
-----END RSA PUBLIC KEY-----
p accept 20-23,43,53,79-81,88,110,143,194,220,389,443,464,531,543-544,554,563,636,706,749,873,902-904,981,989-995,1194,1220,1293,1500,1533,1677,1723,1755,1863,2082-2083,2086-2087,2095-2096,2102-2104,3128,3389,3690,@last-listed 2013-01-21 00:38:41
onion-key
-----BEGIN RSA PUBLIC KEY-----
MIGJAoGBAOKDEtBuLINkwnt0Dbdvgu8V+7FiY6JzV35KYmR9lDXddk2BoVSP5Z2z
QTvtNDnf/vHPD/PDZHGudTjhsyka3wYqXf9TVcef+P2Aju/bPPcDwuYceFC1IWuP
N6lpgWVhUR7ZIaoS8kgR7Yc9gbimtvBl/KG7L0hEZLS8DHZsNhwhAgMBAAE=
-----END RSA PUBLIC KEY-----

Missed any md for 19 and 20 Jan.

comment:15 Changed 8 years ago by cypherpunks

    f = start_writing_to_stdio_file(cache->journal_fname,
                                    OPEN_FLAGS_APPEND|O_BINARY,
                                    0600, &open_file);

That means append to cached-microdescs.new.tmp not to cached-microdescs.new
What purpose to append to file that shouldn't exist at first? Any random junk after some failure no need to count as valid file.

comment:16 Changed 8 years ago by cypherpunks

oops. wrong comment.

comment:17 Changed 8 years ago by cypherpunks

Sorry I press buttons faster than I can think. Too hard without irc.

comment:18 Changed 8 years ago by nickm

Hm. Looking at what's different here from the non-md case:

1) We're using stdio. stdio has buffering. Maybe we shouldn't be using stdio.

2) We're asserting on failure. That's *definitely* stupid. Even if we can't figure out how to get the file corruption to go away in all cases, we can definitely try to make this not be an assertion.

comment:19 Changed 7 years ago by nickm

(Closed #8267 as a duplicate of this)

comment:20 Changed 7 years ago by nickm

Priority: normalmajor

comment:21 in reply to:  18 ; Changed 7 years ago by nickm

Keywords: 023-backport added
Status: newneeds_review

I've got an attempt to fix this stuff in branch "bug8031" in my public repository. It's against 0.2.3, but I'm only suggesting it for 0.2.4 for now. What it does is:

  • Uses fd syscalls instead of stdio to rebuild microdescriptor stuff.
  • Actually checks error codes on return values from finish_writing_to_file.
  • Makes sure that our offset in the file keeps pace with the computed offset as we're rebuilding the cache. (I think this is needless, but hey, we have a bug.)

I don't know if we can call this one 'fixed' after merging these, but we can rule out a couple of possible causes.

comment:22 in reply to:  21 Changed 7 years ago by andrea

Replying to nickm:

I've got an attempt to fix this stuff in branch "bug8031" in my public repository. It's against 0.2.3, but I'm only suggesting it for 0.2.4 for now. What it does is:

  • Uses fd syscalls instead of stdio to rebuild microdescriptor stuff.
  • Actually checks error codes on return values from finish_writing_to_file.
  • Makes sure that our offset in the file keeps pace with the computed offset as we're rebuilding the cache. (I think this is needless, but hey, we have a bug.)

I don't know if we can call this one 'fixed' after merging these, but we can rule out a couple of possible causes.

I don't know if we can call it fixed either, but these changes all look like an improvement to me and should be merged.

comment:23 Changed 7 years ago by nickm

Milestone: Tor: 0.2.4.x-finalTor: 0.2.3.x-final

Merged that into 0.2.4 ; leaving this open to see if it recurs, and putting into 0.2.3 for possible backport if it doesn't.

comment:24 Changed 7 years ago by arma

looks too early to consider this for 0.2.3.26.

that said, is there some really trivial version of a backport, like "don't assert and ignore the file"?

comment:25 Changed 7 years ago by nickm

that said, is there some really trivial version of a backport, like "don't assert and ignore the file"?

Hm. That isn't actually something we *did* in 0.2.4, so I can't really call doing anything like that a backport. It would be a new untested thing, which isn't such a good idea.

I think the minimal thing that might fix it, if our theory above is right, would be a backport of 6905c1f60d2d0e658481df70fdcaa73251becacc alone.

comment:26 Changed 7 years ago by arma

Sounds like we could also let it bake for a while in 0.2.4, and maybe we'd discover an actual fix (or maybe 0.2.3 would become oldstable and not matter anymore).

comment:27 Changed 7 years ago by nickm

6905c1f60d2d0e658481df70fdcaa73251becacc introduced bug #8822, so if we backport this, we need to backport that. Good call on not backporting this right away.

comment:28 Changed 5 years ago by arma

Recommend closing rather than trying to mess with an 0.2.3.26 here.

comment:29 Changed 5 years ago by nickm

Milestone: Tor: 0.2.3.x-finalTor: 0.2.4.x-final
Resolution: fixed
Status: needs_reviewclosed

Recommend closing rather than trying to mess with an 0.2.3.26 here.

So mote it be.

Note: See TracTickets for help on using tickets.