ides corrupted its cached-microdescs.new file

changed milestone to %Tor: 0.2.3.x-final

added component::core tor/tor milestone::Tor: 0.2.3.x-final owner::nickm priority::medium resolution::fixed status::closed tor-auth type::defect labels

This ticket is just for record keeping. I'm satisfied assuming this was just Eris, not Mallory.

Trac:
Actualpoints: N/A to 2
Points: N/A to 2
Status: new to closed
Summary: Investigate weird dirauth warns/errors/oom/exploit attempts? to Weird dirauth warns/errors/oom/exploit attempts?
Resolution: N/A to fixed

Trac:
Summary: Weird dirauth warns/errors/oom/exploit attempts? to Weird dirauth microdesc malloc failures, warns, ooms, exploit attempts?

FYI: Here were the log lines:

Apr 09 07:33:59.585 [warn] Unparseable microdescriptor Apr 09 07:33:59.905 [warn] parse error: Malformed object: mismatched end tag RSA PUBLIC KEY Apr 09 07:33:59.585 [warn] parse error: Malformed object: missing object end line Apr 09 07:33:59.585 [warn] Unparseable microdescriptor Apr 09 07:33:59.905 [warn] parse error: Malformed object: mismatched end tag RSA PUBLIC KEY Apr 09 07:33:59.905 [warn] Unparseable microdescriptor Apr 09 07:34:00.143 [warn] parse error: Malformed object: missing object end line Apr 09 07:34:00.143 [warn] Unparseable microdescriptor Apr 09 07:34:05.296 [err] Out of memory on realloc(). Dying.

Apr 09 09:25:57.799 [warn] Unparseable microdescriptor Apr 09 09:25:57.799 [warn] crypto error while reading public key from string: malloc failure (in bignum routines:BN_EXPAND_INTERNAL) Apr 09 09:25:57.799 [warn] crypto error while reading public key from string: nested asn1 error (in asn1 encoding routines:ASN1_TEMPLATE_NOEXP_D2I) Apr 09 09:25:57.799 [warn] crypto error while reading public key from string: ASN1 lib (in PEM routines:PEM_ASN1_read_bio) Apr 09 09:25:57.799 [warn] parse error: Couldn't parse public key. Apr 09 09:25:57.799 [warn] Unparseable microdescriptor

Apr 09 10:41:06.278 [warn] parse error: Malformed object: missing object end line Apr 09 10:41:06.279 [warn] Unparseable microdescriptor: @last-listed 2010-02-04 01:50:01 Apr 09 10:41:07.486 [warn] parse error: Malformed object: missing object end line Apr 09 10:41:07.486 [warn] Unparseable microdescriptor: @last-listed 2010-02-06 05:50:01 Apr 09 10:41:09.900 [warn] parse error: Malformed object: missing object end line

The microdesc code apparently does not log anything below warn, nor does it log unparseable descriptors. Inspecting the microdesc cache revealed that several microdescs appeared to be just running into the next without proper termination, perhaps a side effect of earlier crashes/ooms.

Trac:
Keywords: N/A deleted, MikePerryIterationFires20110417 added

Replying to mikeperry:

FYI: Here were the log lines:

Apr 09 10:41:06.278 [warn] parse error: Malformed object: missing object end line
Apr 09 10:41:06.279 [warn] Unparseable microdescriptor: @last-listed 2010-02-04 01:50:01
Apr 09 10:41:07.486 [warn] parse error: Malformed object: missing object end line
Apr 09 10:41:07.486 [warn] Unparseable microdescriptor: @last-listed 2010-02-06 05:50:01
Apr 09 10:41:09.900 [warn] parse error: Malformed object: missing object end line

(ides emitted these log lines while loading microdescriptors from its cached-microdescs.new file.)

Notice the @last-listed dates -- ides had been corrupting its microdesc cache for over a year, but didn't OOM in the process of trying to parse the entire tail of its MD cache until this month, when the file had become much longer.

Here is a longer piece of one of those ‘Unparseable microdescriptor’s:

Apr 09 10:41:14.550 [warn] Unparseable microdescriptor: @last-listed 2010-08-13 07:50:01
onion-key
-----BEGIN RSA PUBLIC KEY-----
MIGJAoGBAMeiFlr3EKP5qVMthV8Mi6NYvONH1ZlWNrg3947qNQj6OOE57hK/qT61
Ovx717sEtdfuksSXxxVVd8K1ym5gMP4ffAZWFYc5Z3PxusNEs+0EjwyVLxrrwnY/
hKG+XjXdW48TWQoad3HyRMMdQUfm+sSf6nEusEeRgg9gv+JHF1G/AgMBAAE=
---@last-listed 2010-08-18 04:20:01
onion-key
-----BEGIN RSA PUBLIC KEY-----
MIGJAoGBALOgBn1u7gQCEIiowkX0cMVi20yZNoUXFbEn2HKreqGO/ZssPEcdAXbS
1QdONiazdwVC7oFmdJ0OtS+OPyKPkoBqw0lR9CtOBXlJ45n+r7X2Yks0BHCt68Xx
uqnP/1jODPsex2hxaa5WU0HXIh7idsIdJCrfZPw39V/Abw4mllKNAgMBAAE=
-----END RSA PUBLIC KEY-----
family slippy
@last-listed 2010-08-18 04:20:01
onion-key
-----BEGIN RSA PUBLIC KEY-----
MIGJAoGBALTs9+vmYkA4VIlzbeRydehhMVEYyifxCm1dibfv9A93we8QM/UvUkSk

The microdesc code apparently does not log anything below warn, nor does it log unparseable descriptors.

(Mike had to modify the ‘Unparseable microdescriptor’ log_warn call to dump the descriptor into the log file.)

Inspecting the microdesc cache revealed that several microdescs appeared to be just running into the next without proper termination, perhaps a side effect of earlier crashes/ooms.

microdescs_add_list_to_cache and dump_microdescriptor are scary. Perhaps we should be prefixing each item in the cached-*.new files with a line containing the cached item's length and a short (32 or fewer bits) hash, and trying to resynchronize if we read a damaged item.

I'm reopening this ticket because I see no evidence that the underlying bug has been fixed. In particular, git blame shows that nothing relevant in src/or/microdesc.c or src/common/util.c has been changed since 2010-01-25, and microdescs were still being written improperly months later.

Mike: Did you keep a copy of your cached-microdescs* files, or just delete them?

Trac:
Status: closed to reopened
Resolution: fixed to N/A
Milestone: N/A to Tor: 0.2.3.x-final

I've moved Mike's ‘agile’ markings for his investigation of the April 9, 2011 ides crash to ticket #2957 (closed), so we can use this ticket to describe the actual bug.

Trac:
Summary: Weird dirauth microdesc malloc failures, warns, ooms, exploit attempts? to ides corrupted its cached-microdescs.new file
Owner: mikeperry to nickm
Keywords: MikePerryIterationFires20110417 deleted, N/A added
Points: 2 to N/A
Status: reopened to assigned
Actualpoints: 2 to N/A

Replying to rransom:

Notice the @last-listed dates -- ides had been corrupting its microdesc cache for over a year, but didn't OOM in the process of trying to parse the entire tail of its MD cache until this month, when the file had become much longer.

Again, I speculated rather than RTFSing... It didn't try to parse the entire tail of the MD cache, just the text until just before the next line that begins with “@” or “onion-key”.

Inspecting the microdesc cache revealed that several microdescs appeared to be just running into the next without proper termination, perhaps a side effect of earlier crashes/ooms.

microdescs_add_list_to_cache and dump_microdescriptor are scary. Perhaps we should be prefixing each item in the cached-*.new files with a line containing the cached item's length and a short (32 or fewer bits) hash, and trying to resynchronize if we read a damaged item.

This doesn't seem to be necessary in order to prevent a runaway parser, but it would help us recover more items from the cache when something does go wrong.

Decreasing priority, because I am now convinced that the cache corruption didn't increase ides's memory consumption significantly.

Trac:
Priority: critical to normal

Replying to mikeperry:

corrupted its cached-microdescs.new file

It's impossible by Tor itself.

    f = start_writing_to_stdio_file(cache->journal_fname,
                                    OPEN_FLAGS_APPEND|O_BINARY,
                                    0600, &open_file);

cached-microdescs could be broken during calls of microdesc_cache_rebuild().

microdesc_cache_clean() happens during rebuild only, so every old md storing until then.

#2230 (moved) and

finish_writing_to_file(open_file); /*XXX Check me.*/

is reasons of OOM and corruptions.

FOI: ides running 0.2.2.x. microdesc stuff is different there. no microdesc_cache_clean() at all.

microdescs_parse_from_string() could trigger double microdesc_free() for the same md.

  while (s < eos) {
    start_of_next_microdesc = find_start_of_next_microdesc(s, eos);
    if (!start_of_next_microdesc)
      start_of_next_microdesc = eos;

    if (tokenize_string(area, s, start_of_next_microdesc, tokens,
                        microdesc_token_table, flags)) {
      log_warn(LD_DIR, "Unparseable microdescriptor");
      goto next;
    }

    md = tor_malloc_zero(sizeof(microdesc_t));
    {
      const char *cp = tor_memstr(s, start_of_next_microdesc-s,
                                  "onion-key");
      tor_assert(cp);

      md->bodylen = start_of_next_microdesc - cp;
      if (copy_body)
        md->body = tor_strndup(cp, md->bodylen);
      else
        md->body = (char*)cp;
      md->off = cp - start;
    }

    if ((tok = find_opt_by_keyword(tokens, A_LAST_LISTED))) {
      if (parse_iso_time(tok->args[0], &md->last_listed)) {
        log_warn(LD_DIR, "Bad last-listed time in microdescriptor");
        goto next;
      }
    }
...
    md = NULL;
  next:
    microdesc_free(md);

    memarea_clear(area);
    smartlist_clear(tokens);
    s = start_of_next_microdesc;
  }

With series of corrupted microdescs, so that first has broken A_LAST_LISTED and next to it has another broken token.

fixed code could be like:

  next:
    microdesc_free(md);
    md = NULL;

    memarea_clear(area);
    smartlist_clear(tokens);
    s = start_of_next_microdesc;

For downloaded microdescs it's trigerable remotely with series: 1. with illegal K_FAMILY, 2. with another unparsable md.

Replying to cypherpunks:

fixed code could be like: md = NULL;

That fix is scheduled to go out in 0.2.2.25-alpha shortly. Thanks!

(I wonder if that is the cause of #2954 (moved) or just something related.)

Replying to arma:

Replying to cypherpunks:

fixed code could be like: md = NULL;

That fix is scheduled to go out in 0.2.2.25-alpha shortly. Thanks!

(I wonder if that is the cause of #2954 (moved) or just something related.)

It isn't the cause of #2954 (moved).

I think for this one, the problem might have been (as I think cpunks was saying above?) the lack of error handling code in microdescs_add_list_to_cache() interacting badly with some other bug. That would explain why this hasn't recurred, in spite of there not having been anything obvious to have fixed the corruption. The fix in that case would be to address the XXXs in microdescs_add_list_to_cache(). This leads to the question: what's the right thing to do if we can't write microdesciptors to the cache? I lean towards, "Abort, don't make the change, and log a warning."

Sounds fine to me.

Please review branch bug2954_more.

Trac:
Status: assigned to needs_review

Looks OK. Haven't tested it though. Please let me know if you think that'd be worth the time.

Thanks for reviewing! It still looks good to me, so I'll merge.

Trac:
Status: needs_review to closed
Resolution: N/A to fixed

Trac:
Keywords: N/A deleted, tor-auth added

Trac:
Component: Tor Directory Authority to Tor

closed

mentioned in issue #2957 (closed)

mentioned in issue #8031 (moved)

moved to tpo/core/tor#2954 (closed)

ides corrupted its cached-microdescs.new file

Child items ...

Activity