Opened 7 weeks ago

Closed 3 weeks ago

#31611 closed defect (fixed)

Work out why chutney didn't fail due to #31495 cannot configure bridges

Reported by: teor Owned by: nickm
Priority: Medium Milestone: Tor: 0.4.2.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: network-team-roadmap-september, 042-should
Cc: nickm, teor, gaba Actual Points:
Parent ID: #29211 Points:
Reviewer: teor Sponsor: Sponsor31-must

Description

In #31495, Tor Browser failed, but our chutney CI job did not fail. Let's work out why that happened.

Child Tickets

Change History (8)

comment:1 Changed 6 weeks ago by nickm

Keywords: 042-should added

comment:2 Changed 3 weeks ago by ahf

Owner: set to nickm
Status: newassigned

Distributing 0.4.2 tickets between network team members.

comment:3 Changed 3 weeks ago by nickm

So, on investigation: bug #31495 only happened when the configuration change arrived over the control port via SETCONF. If instead it arrived in the initial configuration, or via a SIGHUP, then #31495 would not occur and the configuration would not be rejected. Chutney doesn't use SETCONF, so it didn't find #31495.

But, why were the two cases handled differently?

That's because when we build a new configuration from scratch, we initialize a new configuration objects, assign the defaults to it, and then assign each of the user-provided options in sequence. But when we get new values via SETCONF, we first duplicate the existing configuration, and then use setconf to apply new values on top of it. The bug here was in config_dup()'s handling of NULL routerset values. Instead of encoding them as a null entry, we encoded them as an empty string, which was then re-parsed as a semantically different value. This was due to a bug in routerset's var_type implementation.

We could add a check here to make sure that config_dup() always returns an equal object, although that wouldn't have helped in this instance, since the defeault "equal" implementation for a var_type defers to the "encode" implementation, which is what had the bug.

comment:4 Changed 3 weeks ago by nickm

Moving forward, here are the possible solutions I can see:

  1. When we fuzz configurations, we test that if a configuration passes validation, it still passes validation when it is duplicated.
  1. (a) We can document more carefully the dangers of configuration values where NULL and "" mean different things. (In the case of EntryNodes, NULL means "all nodes are included", and "" means "no nodes are included".)
  1. (b) We avoid configuration types where NULL and "" mean different things. We would have to add a new routerset type meaning "all routers", (maybe, "*"?) and have that be the default for EntryNodes. [We probably could not make NULL mean the same as "" for all cases, since there are lots of string-valued configuration types.]
  1. We write a testing tool that uses stem to try assigning configurations and making sure that they pass and fail with the expected errors messages.
  1. We change the implementation of SETCONF so that instead of starting with config_dup(), it remembers the actual sequence of configuration values, appends the controller's new configuration values, and replays all of them in sequence. [This solution would require arbitrary amounts of memory.]

comment:5 Changed 3 weeks ago by nickm

Status: assignedneeds_review

Putting this in needs_review -- there is not a chosen solution here yet, and there is no code, but I would like feedback on which of the several possible solutions look like good ideas. None of them is trivial.

comment:6 in reply to:  4 ; Changed 3 weeks ago by teor

Reviewer: teor
Status: needs_reviewnew

Replying to nickm:

Moving forward, here are the possible solutions I can see:

  1. When we fuzz configurations, we test that if a configuration passes validation, it still passes validation when it is duplicated.

Seems sensible, but fuzzing takes much longer to show failures than tests.
Let's defer this task, unless the other options don't get us what we want.

  1. (a) We can document more carefully the dangers of configuration values where NULL and "" mean different things. (In the case of EntryNodes, NULL means "all nodes are included", and "" means "no nodes are included".)

Yes, let's add comments in 0.4.2.

  1. (b) We avoid configuration types where NULL and "" mean different things. We would have to add a new routerset type meaning "all routers", (maybe, "*"?) and have that be the default for EntryNodes. [We probably could not make NULL mean the same as "" for all cases, since there are lots of string-valued configuration types.]

Let's consider making NULL and "" mean the same thing in 0.4.3.

  1. We write a testing tool that uses stem to try assigning configurations and making sure that they pass and fail with the expected errors messages.

Let's consider a change like this.

Here are some specific suggestions:

  • 0.4.2: add an extra config test to stem, so it is run by tor's "make test-stem"
    • let's try to add a stem test that would have caught this bug?
  • 0.4.3: add a SETCONF step to the newly added config parsing tests run by "make check"
    • maybe? This seems like overkill.
  1. We change the implementation of SETCONF so that instead of starting with config_dup(), it remembers the actual sequence of configuration values, appends the controller's new configuration values, and replays all of them in sequence. [This solution would require arbitrary amounts of memory.]

This feels like a bad idea, it seems to lock in a particular config model. It might make configs really hard to split or maintain in future.

comment:7 in reply to:  6 Changed 3 weeks ago by nickm

Status: newneeds_review

Replying to teor:

Replying to nickm:

Moving forward, here are the possible solutions I can see:

  1. When we fuzz configurations, we test that if a configuration passes validation, it still passes validation when it is duplicated.

Seems sensible, but fuzzing takes much longer to show failures than tests.
Let's defer this task, unless the other options don't get us what we want.

  1. (a) We can document more carefully the dangers of configuration values where NULL and "" mean different things. (In the case of EntryNodes, NULL means "all nodes are included", and "" means "no nodes are included".)

Yes, let's add comments in 0.4.2.

Added #31907 for this.

  1. (b) We avoid configuration types where NULL and "" mean different things. We would have to add a new routerset type meaning "all routers", (maybe, "*"?) and have that be the default for EntryNodes. [We probably could not make NULL mean the same as "" for all cases, since there are lots of string-valued configuration types.]

Let's consider making NULL and "" mean the same thing in 0.4.3.

Added as #31908.

  1. We write a testing tool that uses stem to try assigning configurations and making sure that they pass and fail with the expected errors messages.

Let's consider a change like this.

Here are some specific suggestions:

  • 0.4.2: add an extra config test to stem, so it is run by tor's "make test-stem"
    • let's try to add a stem test that would have caught this bug?

#31909 is for this.

  • 0.4.3: add a SETCONF step to the newly added config parsing tests run by "make check"
    • maybe? This seems like overkill.
  1. We change the implementation of SETCONF so that instead of starting with config_dup(), it remembers the actual sequence of configuration values, appends the controller's new configuration values, and replays all of them in sequence. [This solution would require arbitrary amounts of memory.]

This feels like a bad idea, it seems to lock in a particular config model. It might make configs really hard to split or maintain in future.

I agree, but I thought I should mention it for completeness.

Now that the tickets above are opened, can we close this? Please close if you agree.

comment:8 Changed 3 weeks ago by teor

Resolution: fixed
Status: needs_reviewclosed
Note: See TracTickets for help on using tickets.