Opened 8 years ago

Closed 7 years ago

Last modified 7 years ago

#2954 closed defect (fixed)

ides corrupted its cached-microdescs.new file

Reported by: mikeperry Owned by: nickm
Priority: Medium Milestone: Tor: 0.2.3.x-final
Component: Core Tor/Tor Version:
Severity: Keywords: tor-auth
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

I spent some time trying to diagnose some very strange micro-descriptor behaviour that seemed to be causing my directory authority to oom. After some investigation involving truncated/embedded microdescriptors, it turned out to most likely be due to my box ooming and corrupting the microdescriptor cache which then triggered more ooms, and not the other way around.

Removing the corrupted cache stopped the ooming and the warns about large keys and malloc failures. (Or, it could have just been that the exploit attempt stopped sometime during all the ooming, and then removing the cache ended the issue?).

The microdesc code seems to be pretty sparse when it comes to logs, dumping weird microdescs, and other diagnostics. This took much longer than it should have to get to the bottom of it. But perhaps it was just due to byzantine failures and won't happen again.

Child Tickets

Change History (20)

comment:1 Changed 8 years ago by mikeperry

Actual Points: 2
Points: 2
Resolution: fixed
Status: newclosed
Summary: Investigate weird dirauth warns/errors/oom/exploit attempts?Weird dirauth warns/errors/oom/exploit attempts?

This ticket is just for record keeping. I'm satisfied assuming this was just Eris, not Mallory.

comment:2 Changed 8 years ago by mikeperry

Summary: Weird dirauth warns/errors/oom/exploit attempts?Weird dirauth microdesc malloc failures, warns, ooms, exploit attempts?

comment:3 Changed 8 years ago by mikeperry

FYI: Here were the log lines:

Apr 09 07:33:59.585 [warn] Unparseable microdescriptor
Apr 09 07:33:59.905 [warn] parse error: Malformed object: mismatched end tag RSA PUBLIC KEY
Apr 09 07:33:59.585 [warn] parse error: Malformed object: missing object end line
Apr 09 07:33:59.585 [warn] Unparseable microdescriptor
Apr 09 07:33:59.905 [warn] parse error: Malformed object: mismatched end tag RSA PUBLIC KEY
Apr 09 07:33:59.905 [warn] Unparseable microdescriptor
Apr 09 07:34:00.143 [warn] parse error: Malformed object: missing object end line
Apr 09 07:34:00.143 [warn] Unparseable microdescriptor
Apr 09 07:34:05.296 [err] Out of memory on realloc(). Dying.

Apr 09 09:25:57.799 [warn] Unparseable microdescriptor
Apr 09 09:25:57.799 [warn] crypto error while reading public key from string: malloc failure (in bignum routines:BN_EXPAND_INTERNAL)
Apr 09 09:25:57.799 [warn] crypto error while reading public key from string: nested asn1 error (in asn1 encoding routines:ASN1_TEMPLATE_NOEXP_D2I)
Apr 09 09:25:57.799 [warn] crypto error while reading public key from string: ASN1 lib (in PEM routines:PEM_ASN1_read_bio)
Apr 09 09:25:57.799 [warn] parse error: Couldn't parse public key.
Apr 09 09:25:57.799 [warn] Unparseable microdescriptor

Apr 09 10:41:06.278 [warn] parse error: Malformed object: missing object end line
Apr 09 10:41:06.279 [warn] Unparseable microdescriptor: @last-listed 2010-02-04 01:50:01
Apr 09 10:41:07.486 [warn] parse error: Malformed object: missing object end line
Apr 09 10:41:07.486 [warn] Unparseable microdescriptor: @last-listed 2010-02-06 05:50:01
Apr 09 10:41:09.900 [warn] parse error: Malformed object: missing object end line

The microdesc code apparently does not log anything below warn, nor does it log unparseable descriptors. Inspecting the microdesc cache revealed that several microdescs appeared to be just running into the next without proper termination, perhaps a side effect of earlier crashes/ooms.

comment:4 Changed 8 years ago by mikeperry

Keywords: MikePerryIterationFires20110417 added

comment:5 in reply to:  3 ; Changed 8 years ago by rransom

Milestone: Tor: 0.2.3.x-final
Resolution: fixed
Status: closedreopened

Replying to mikeperry:

FYI: Here were the log lines:

Apr 09 10:41:06.278 [warn] parse error: Malformed object: missing object end line
Apr 09 10:41:06.279 [warn] Unparseable microdescriptor: @last-listed 2010-02-04 01:50:01
Apr 09 10:41:07.486 [warn] parse error: Malformed object: missing object end line
Apr 09 10:41:07.486 [warn] Unparseable microdescriptor: @last-listed 2010-02-06 05:50:01
Apr 09 10:41:09.900 [warn] parse error: Malformed object: missing object end line

(ides emitted these log lines while loading microdescriptors from its cached-microdescs.new file.)

Notice the @last-listed dates -- ides had been corrupting its microdesc cache for over a year, but didn't OOM in the process of trying to parse the entire tail of its MD cache until this month, when the file had become much longer.

Here is a longer piece of one of those ‘Unparseable microdescriptor’s:

Apr 09 10:41:14.550 [warn] Unparseable microdescriptor: @last-listed 2010-08-13 07:50:01
onion-key
-----BEGIN RSA PUBLIC KEY-----
MIGJAoGBAMeiFlr3EKP5qVMthV8Mi6NYvONH1ZlWNrg3947qNQj6OOE57hK/qT61
Ovx717sEtdfuksSXxxVVd8K1ym5gMP4ffAZWFYc5Z3PxusNEs+0EjwyVLxrrwnY/
hKG+XjXdW48TWQoad3HyRMMdQUfm+sSf6nEusEeRgg9gv+JHF1G/AgMBAAE=
---@last-listed 2010-08-18 04:20:01
onion-key
-----BEGIN RSA PUBLIC KEY-----
MIGJAoGBALOgBn1u7gQCEIiowkX0cMVi20yZNoUXFbEn2HKreqGO/ZssPEcdAXbS
1QdONiazdwVC7oFmdJ0OtS+OPyKPkoBqw0lR9CtOBXlJ45n+r7X2Yks0BHCt68Xx
uqnP/1jODPsex2hxaa5WU0HXIh7idsIdJCrfZPw39V/Abw4mllKNAgMBAAE=
-----END RSA PUBLIC KEY-----
family slippy
@last-listed 2010-08-18 04:20:01
onion-key
-----BEGIN RSA PUBLIC KEY-----
MIGJAoGBALTs9+vmYkA4VIlzbeRydehhMVEYyifxCm1dibfv9A93we8QM/UvUkSk

The microdesc code apparently does not log anything below warn, nor does it log unparseable descriptors.

(Mike had to modify the ‘Unparseable microdescriptor’ log_warn call to dump the descriptor into the log file.)

Inspecting the microdesc cache revealed that several microdescs appeared to be just running into the next without proper termination, perhaps a side effect of earlier crashes/ooms.

microdescs_add_list_to_cache and dump_microdescriptor are scary. Perhaps we should be prefixing each item in the cached-*.new files with a line containing the cached item's length and a short (32 or fewer bits) hash, and trying to resynchronize if we read a damaged item.

I'm reopening this ticket because I see no evidence that the underlying bug has been fixed. In particular, git blame shows that nothing relevant in src/or/microdesc.c or src/common/util.c has been changed since 2010-01-25, and microdescs were still being written improperly months later.

Mike: Did you keep a copy of your cached-microdescs* files, or just delete them?

comment:6 Changed 8 years ago by rransom

Actual Points: 2
Keywords: MikePerryIterationFires20110417 removed
Owner: changed from mikeperry to nickm
Points: 2
Status: reopenedassigned
Summary: Weird dirauth microdesc malloc failures, warns, ooms, exploit attempts?ides corrupted its cached-microdescs.new file

I've moved Mike's ‘agile’ markings for his investigation of the April 9, 2011 ides crash to ticket #2957, so we can use this ticket to describe the actual bug.

comment:7 in reply to:  5 Changed 8 years ago by rransom

Priority: criticalnormal

Replying to rransom:

Notice the @last-listed dates -- ides had been corrupting its microdesc cache for over a year, but didn't OOM in the process of trying to parse the entire tail of its MD cache until this month, when the file had become much longer.

Again, I speculated rather than RTFSing... It didn't try to parse the entire tail of the MD cache, just the text until just before the next line that begins with “@” or “onion-key”.

Inspecting the microdesc cache revealed that several microdescs appeared to be just running into the next without proper termination, perhaps a side effect of earlier crashes/ooms.

microdescs_add_list_to_cache and dump_microdescriptor are scary. Perhaps we should be prefixing each item in the cached-*.new files with a line containing the cached item's length and a short (32 or fewer bits) hash, and trying to resynchronize if we read a damaged item.

This doesn't seem to be necessary in order to prevent a runaway parser, but it would help us recover more items from the cache when something does go wrong.

Decreasing priority, because I am now convinced that the cache corruption didn't increase ides's memory consumption significantly.

comment:8 in reply to:  description Changed 8 years ago by cypherpunks

Replying to mikeperry:

corrupted its cached-microdescs.new file

It's impossible by Tor itself.

    f = start_writing_to_stdio_file(cache->journal_fname,
                                    OPEN_FLAGS_APPEND|O_BINARY,
                                    0600, &open_file);

cached-microdescs could be broken during calls of microdesc_cache_rebuild().

microdesc_cache_clean() happens during rebuild only, so every old md storing until then.

#2230 and

finish_writing_to_file(open_file); /*XXX Check me.*/

is reasons of OOM and corruptions.

comment:9 Changed 8 years ago by cypherpunks

FOI: ides running 0.2.2.x.
microdesc stuff is different there. no microdesc_cache_clean() at all.

comment:10 Changed 8 years ago by cypherpunks

microdescs_parse_from_string() could trigger double microdesc_free() for the same md.

  while (s < eos) {
    start_of_next_microdesc = find_start_of_next_microdesc(s, eos);
    if (!start_of_next_microdesc)
      start_of_next_microdesc = eos;

    if (tokenize_string(area, s, start_of_next_microdesc, tokens,
                        microdesc_token_table, flags)) {
      log_warn(LD_DIR, "Unparseable microdescriptor");
      goto next;
    }

    md = tor_malloc_zero(sizeof(microdesc_t));
    {
      const char *cp = tor_memstr(s, start_of_next_microdesc-s,
                                  "onion-key");
      tor_assert(cp);

      md->bodylen = start_of_next_microdesc - cp;
      if (copy_body)
        md->body = tor_strndup(cp, md->bodylen);
      else
        md->body = (char*)cp;
      md->off = cp - start;
    }

    if ((tok = find_opt_by_keyword(tokens, A_LAST_LISTED))) {
      if (parse_iso_time(tok->args[0], &md->last_listed)) {
        log_warn(LD_DIR, "Bad last-listed time in microdescriptor");
        goto next;
      }
    }
...
    md = NULL;
  next:
    microdesc_free(md);

    memarea_clear(area);
    smartlist_clear(tokens);
    s = start_of_next_microdesc;
  }

With series of corrupted microdescs, so that first has broken A_LAST_LISTED and next to it has another broken token.

fixed code could be like:

  next:
    microdesc_free(md);
    md = NULL;

    memarea_clear(area);
    smartlist_clear(tokens);
    s = start_of_next_microdesc;

comment:11 Changed 8 years ago by cypherpunks

For downloaded microdescs it's trigerable remotely with series: 1. with illegal K_FAMILY, 2. with another unparsable md.

comment:12 in reply to:  10 ; Changed 8 years ago by arma

Replying to cypherpunks:

fixed code could be like:

md = NULL;

That fix is scheduled to go out in 0.2.2.25-alpha shortly. Thanks!

(I wonder if that is the cause of #2954 or just something related.)

comment:13 in reply to:  12 Changed 8 years ago by rransom

Replying to arma:

Replying to cypherpunks:

fixed code could be like:

md = NULL;

That fix is scheduled to go out in 0.2.2.25-alpha shortly. Thanks!

(I wonder if that is the cause of #2954 or just something related.)

It isn't the cause of #2954.

comment:14 Changed 7 years ago by nickm

I think for this one, the problem might have been (as I think cpunks was saying above?) the lack of error handling code in microdescs_add_list_to_cache() interacting badly with some other bug. That would explain why this hasn't recurred, in spite of there not having been anything obvious to have fixed the corruption. The fix in that case would be to address the XXXs in microdescs_add_list_to_cache(). This leads to the question: what's the right thing to do if we can't write microdesciptors to the cache? I lean towards, "Abort, don't make the change, and log a warning."

comment:15 Changed 7 years ago by arma

Sounds fine to me.

comment:16 Changed 7 years ago by nickm

Status: assignedneeds_review

Please review branch bug2954_more.

comment:17 Changed 7 years ago by ln5

Looks OK. Haven't tested it though. Please let me know if you think that'd be worth the time.

comment:18 Changed 7 years ago by nickm

Resolution: fixed
Status: needs_reviewclosed

Thanks for reviewing! It still looks good to me, so I'll merge.

comment:19 Changed 7 years ago by nickm

Keywords: tor-auth added

comment:20 Changed 7 years ago by nickm

Component: Tor Directory AuthorityTor
Note: See TracTickets for help on using tickets.