Opened 7 months ago

Last modified 3 weeks ago

#21642 assigned enhancement

Prop275: Eliminate "published" times from microdescriptor consensus

Reported by: nickm Owned by: nickm
Priority: Medium Milestone: Tor: 0.3.4.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords:
Cc: arma, prop275, dir-bandwidth, needs-analysis Actual Points: .5
Parent ID: Points: 2
Reviewer: Sponsor: Sponsor4

Description

Proposal 275 will proposal will save a great deal of consensus and consensus-diff bandwidth down the road, with comparatively little effort. Let's get it implemented.

Child Tickets

Change History (32)

comment:1 Changed 7 months ago by nickm

Keywords: TorCoreTeam201703 added
Owner: set to nickm
Status: newassigned

comment:2 Changed 7 months ago by nickm

Steps to take now:

  • In 029, allow published times in the future.
  • In 031, never look at published times in consensus documents.
  • Once 029 is the oldest version still supported, put all microdescriptor published times at 2038-01-01 00:00:00 or something.

comment:3 Changed 7 months ago by nickm

To review:

  • In my public tor repository, prop275_minimal_029, which permits published times to be in the future without breaking Tor.
  • In my public tor repository, prop275_more_031, which stops all reliance on the published time in non-vote documents.
  • In my public torspec.git repository, prop275_part1, where I update the specs to warn about the eventual behavior change.

For later:

* #21669 tracks the change where we eventually make the 'published' times uniform.

  • #21659 tracks the change where we eventually make the 'published' times uniform.
Last edited 7 months ago by nickm (previous) (diff)

comment:4 Changed 7 months ago by nickm

Actual Points: .5
Status: assignedneeds_review

comment:5 in reply to:  3 ; Changed 7 months ago by teor

Replying to nickm:

To review:

  • In my public tor repository, prop275_minimal_029, which permits published times to be in the future without breaking Tor.

I thought we were only doing security backports to 0.2.9?
Or does LTS buy us more latitude for change?

For later:

  • #21669 tracks the change where we eventually make the 'published' times uniform.

The ticket #21669 doesn't exist yet.

comment:6 in reply to:  5 ; Changed 7 months ago by nickm

Replying to teor:

Replying to nickm:

To review:

  • In my public tor repository, prop275_minimal_029, which permits published times to be in the future without breaking Tor.

I thought we were only doing security backports to 0.2.9?
Or does LTS buy us more latitude for change?

I think that the LTS gives us a tiny bit more latitude, early in its LTS status, to do very-safe futureproofing. Of course, I'm making up the rules as I go along here---what do you think?

For later:

  • #21669 tracks the change where we eventually make the 'published' times uniform.

The ticket #21669 doesn't exist yet.

Whoops, typo. Should have said "#21659". Will edit.

comment:7 in reply to:  6 Changed 7 months ago by teor

Replying to nickm:

Replying to teor:

Replying to nickm:

To review:

  • In my public tor repository, prop275_minimal_029, which permits published times to be in the future without breaking Tor.

I thought we were only doing security backports to 0.2.9?
Or does LTS buy us more latitude for change?

I think it's important, ok, and low risk, as long as the major distributions will take it.
(Are any of them only taking security fixes for 0.2.9?)

comment:8 Changed 7 months ago by nickm

I think it's important, ok, and low risk, as long as the major distributions will take it.
(Are any of them only taking security fixes for 0.2.9?)

I think we've been judicious enough about backporting stuff that this shouldn't have too hard a time of getting in. And if it does, then they'll backport it later on, when their not-quite-Tor-029 series break in 2018/2019.

comment:9 Changed 6 months ago by nickm

Keywords: review-group-17 added

comment:10 Changed 6 months ago by dgoulet

Status: needs_reviewneeds_information

In prop275_minimal_029, the changes file is confusing to me:

+      [...] This change will
+      eventually allow us to stop listing meaningful "published" dates

Kind of weird that we want to stop doing something "meaningful" :)... maybe it was suppose to be "meaningless" or "useless" or "wrong" or ... ? If not, what does that mean exactly?

comment:11 Changed 6 months ago by nickm

The plan would be to change all the 'published' dates to something like 2030-01-01.

comment:12 Changed 6 months ago by dgoulet

Status: needs_informationneeds_review

Aaaaaaah! ok that makes more sense. Thanks. Back in need review

comment:13 Changed 5 months ago by dgoulet

Status: needs_reviewneeds_revision

lgtm; One quick comment:

In routerstatus_format_entry(), this is used if not vote status is present:

format_iso_time(published, rs->published_on_to_format);

This matches the commit description which is that published_on_to_format is only used has an output for the control port. However, I think we should add a strong validation of that and possibly a tor_assert(format == NS_CONTROL_PORT) in that if? If not, at least add a comment on how that is used for that function and what the caller should expect?

comment:14 Changed 5 months ago by nickm

Status: needs_revisionneeds_review

We can't use an assertion there -- we also set it for the routerstatus_format_entry() call in networkstatus_compute_consensus(). I've added one comment and clarified another.

comment:15 Changed 5 months ago by dgoulet

Status: needs_reviewmerge_ready

lgtm!

comment:16 Changed 5 months ago by arma

Thoughts:

A) I think the patch in prop275_minimal_029 is safe and fine.

B) Check out in mark_my_descriptor_dirty_if_too_old() how we decide to republish our descriptor if our consensus entry says our descriptor is more than 18 hours old (#3327):

    else if (rs->published_on < slow_cutoff)
      retry_fast_reason = "version listed in consensus is quite old";

So if we do the next step, which would be setting the timestamp to 2037 or whatever, we will disable this hack. Are we still using the hack much? I think we added it because we have a bunch of subtle bugs where relays forget to republish, or where relays *do* republish but the dir auths drop their newly published descriptor because it's too similar to the last one. I don't think we found or fixed those bugs.

At first I thought "hey, no problem, relays cache both flavors of consensus, because directory_caches_dir_info is so lenient." But then I realized that in the above code snippet, the rs is gotten from networkstatus_get_latest_consensus(), which will be the microdesc consensus. So I have resumed thinking that if we set published_on to always 2037, we will be disabling this hack. :(

comment:17 in reply to:  16 Changed 5 months ago by arma

Replying to arma:

So I have resumed thinking that if we set published_on to always 2037, we will be disabling this hack. :(

If we want to re-enable the hack (assuming that when we do the published_on change, we do it only in the microdescriptor consensus, and not in the vanilla consensus), here is code that would do that:

My bug21642-hack1 branch.

Last edited 5 months ago by arma (previous) (diff)

comment:18 Changed 5 months ago by arma

Cc: arma added

comment:19 Changed 5 months ago by arma

Another option for hack1 is to just change the hack to "if you are missing from the consensus, decide that the authorities forgot about you so now's a good time to republish". The problem with that option though is that each time we hit whatever bugs there are, there's a several-hour gap where the relay isn't used by clients. That's not ideal, but maybe it's good enough? Yuck.

comment:20 Changed 5 months ago by arma

I added a separate bug21642-hack2 branch which plays around with allowing the r lines to omit timestamp in the future. It wasn't so bad actually. Not intended for actual use, but rather as a proof of concept for one way we could do it.

comment:21 Changed 5 months ago by arma

nickm's prop275_more_031 looks fine to me, other than the mark_my_descriptor_dirty_if_too_old() issue.

comment:22 in reply to:  16 Changed 5 months ago by arma

Replying to arma:

So if we do the next step, which would be setting the timestamp to 2037 or whatever, we will disable this hack. Are we still using the hack much?

It turns out we have a way to check -- when relays publish a new descriptor, they can include a header:

    case DIR_PURPOSE_UPLOAD_DIR: {
      const char *why = router_get_descriptor_gen_reason();
[...]
      if (why) {
        smartlist_add_asprintf(headers, "X-Desc-Gen-Reason: %s\r\n", why);
      }

So I instrumented moria1 like so:

diff --git a/src/or/directory.c b/src/or/directory.c
index 67c28e1..4eb461f 100644
--- a/src/or/directory.c
+++ b/src/or/directory.c
@@ -4462,6 +4462,12 @@ directory_handle_command_post,(dir_connection_t *conn, co
     const char *msg = "[None]";
     uint8_t purpose = authdir_mode_bridge(options) ?
                       ROUTER_PURPOSE_BRIDGE : ROUTER_PURPOSE_GENERAL;
+    {
+      char *genreason = http_get_header(headers, "X-Desc-Gen-Reason: ");
+      log_info(LD_DIRSERV,
+               "New descriptor post, because: %s",
+               genreason ? genreason : "not specified");
+    }
     was_router_added_t r = dirserv_add_multiple_descriptors(body, purpose,
                                              conn->base_.address, &msg);
     tor_assert(msg);

Over the course of about 15 minutes, I see:

$ grep "New descriptor post, because:" moria1-info|cut -d: -f5-|sort|uniq -c
     24  bandwidth has changed
      5  config change
     10  DirPort found reachable
     88  not listed in consensus
   9017  not specified
     20  ORPort found reachable
      9  rotated onion key
     79  time for new descriptor
     15  version listed in consensus is quite old

So, initial conclusion is "yes, it's still used".

comment:23 Changed 5 months ago by nickm

That's 15/>9000 uploads, though? It seems like it might not matter a great deal then. I think we can also merge these patches now, and investigate how to fix the "version listed in consensus is quite old" bug before we actually change the published logic. Agreed?

comment:24 in reply to:  23 Changed 5 months ago by arma

Replying to nickm:

That's 15/>9000 uploads, though? It seems like it might not matter a great deal then. I think we can also merge these patches now, and investigate how to fix the "version listed in consensus is quite old" bug before we actually change the published logic. Agreed?

Alas, disagree. I think those "not specified" uploads represent a bug where some relays upload when they don't need to (i.e. they didn't mark their descriptor dirty in any way before uploading), and where nearly all of those uploads are discarded because they have only cosmetic changes.

Here's the stat after 12.5 hours:

$ grep "New descriptor post, because:" moria1-info|cut -d: -f5-|sort|uniq -c
   1324  bandwidth has changed
    279  config change
      3  configured managed proxies
    436  DirPort found reachable
      2  dns resolvers back
      2  IP address changed
   3434  not listed in consensus
 421560  not specified
   1120  ORPort found reachable
    674  rotated onion key
   4003  time for new descriptor
    868  version listed in consensus is quite old

And out of the "is quite old", we have

$ grep -A1 "is quite old" moria1-info|grep "Added descriptor"|cut -d\' -f2-|sort|uniq -c|wc -l
426

So 426 of our ~7300 relays stayed in the consensus in the last 12.5 hours because of this hack.

comment:25 Changed 5 months ago by nickm

Okay, I'm convinced that we shouldn't merge prop275_more_031 now, until we have a fix for the "version listed in consensus is quite old" bug. I think that means we can return to this in 0.3.2. But I still think we can safely merge the prop275_minimal_029 part. Y/N?

comment:26 in reply to:  25 Changed 5 months ago by arma

Replying to nickm:

But I still think we can safely merge the prop275_minimal_029 part. Y/N?

Yes!

comment:27 Changed 5 months ago by arma

In fact, I was pondering either a world where we merge something like my hack1 branch, and then relays wouldn't need to look at the microdesc consensus published_on values anymore.

Or a world where we design some new hack where we put all the timestamps to the same future value, *except* for the ones that are sufficiently old to trigger the "is quite old" hack.

I'm going to try to track down the "woah that's a lot of descriptor publish attempts" bug more in the meantime.

comment:28 Changed 5 months ago by nickm

Merged prop275_minimal_029 into maint-0.2.9

comment:29 Changed 5 months ago by nickm

Milestone: Tor: 0.3.1.x-finalTor: 0.3.2.x-final
Status: merge_readyassigned

Deferring the rest.

comment:30 Changed 4 months ago by nickm

Keywords: review-group-17 removed

comment:31 Changed 4 months ago by nickm

Keywords: TorCoreTeam201703 removed

comment:32 Changed 3 weeks ago by nickm

Cc: prop275 dir-bandwidth needs-analysis added
Milestone: Tor: 0.3.2.x-finalTor: 0.3.4.x-final
Note: See TracTickets for help on using tickets.