Opened 4 years ago

Last modified 3 years ago

#19834 assigned enhancement

Rethink how we handle issues while sanitizing bridge descriptors

Reported by: karsten Owned by: metrics-team
Priority: Low Milestone:
Component: Metrics/CollecTor Version:
Severity: Normal Keywords: metrics-2018
Cc: iwakeh Actual Points:
Parent ID: #20548 Points:
Reviewer: Sponsor:


The bridge descriptor sanitizer parses tarballs containing non-sanitized bridge descriptors, modifies their content by removing bridge IP addresses and other sensitive parts, and writes sanitized versions of those bridge descriptors to disk.

The sanitizer needs to recognize the lines contained in bridge descriptors to distinguish between lines that must be changed and others that can be kept unchanged, and it needs to be able to understand the exact format of certain lines in order to sanitize their contents.

This process can go wrong in various ways, and we need to decide how to handle those situations. Possible situations are:

  1. A tarball is malformed or can otherwise not be opened.
  2. A tarball contains one or more files that cannot be opened.
  3. A tarball file contains an unknown descriptor type.
  4. An internal problem prohibits sanitizing descriptor parts (e.g., missing secret for sanitizing IP address).
  5. A descriptor is missing parts that are required for properly sanitizing its contents.
  6. A descriptor contains an unrecognized line.
  7. A descriptor line doesn't follow the expected format, contains fewer or more arguments, etc.

Possible ways of handling such situations are:

  1. Skip a line we don't understand and keep the rest of the descriptor.
  2. Skip a descriptor.
  3. Skip the file contained in the tarball and continue with the next.
  4. Abort processing the tarball.
  5. Skip the entire tarball, including discarding any descriptors processed before running into the problem, and attempt to process the tarball again in the next execution.
  6. Abstain from processing a given descriptor type until a problem has been resolved.
  7. Discard any descriptors processed in a tarball until running into the problem, abort the current execution, and refuse starting the next execution until the problem has been resolved.
  8. (in addition to A-G). Inform the operator by logging the problem.
  9. (in addition to A-G). Warn the operator and ask them to resolve the problem.

Looking at this list, I think that my preferred ways of handling problems would be something like:

  • B+H in situations 5, 6, and 7;
  • E+I in situations 1, 2, and 3; and
  • G+I in situation 4.

That's not exactly what we're currently doing. And I'm not even sure if somebody else operating a CollecTor instance with the bridgedescs module would have the same preferences.

Let's discuss!

Child Tickets

Change History (3)

comment:1 Changed 4 years ago by iwakeh

Parent ID: #20548

Depends on parent #20548.

comment:2 Changed 3 years ago by karsten

Keywords: metrics-2018 added

comment:3 Changed 3 years ago by karsten

Owner: set to metrics-team
Status: newassigned
Note: See TracTickets for help on using tickets.