Opened 10 years ago

Closed 7 years ago

#1035 closed defect (fixed)

Relay on dynamic IP marked as stable and guard

Reported by: rudis Owned by: Sebastian
Priority: Medium Milestone: Tor: 0.2.4.x-final
Component: Core Tor/Tor Version: 0.2.0.35
Severity: Keywords: tor-auth
Cc: rudis, karsten, Sebastian Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description (last modified by nickm)

Hi,

I'm running a Tor relay on a dynamic IP which changes every day. At the moment
my node is marked as stable and guard which AFAIK shouldn't happen as it isn't
stable. I got the information from https://trunk.torstatus.kgprog.com:444/.

This may be related to bug #969 which was marked as fixed.

Thanks,
Simon

[Automatically added by flyspray2trac: Operating System: All]

Child Tickets

Attachments (2)

dynamic-stables-guards-fractions-2010-08-26.png (56.7 KB) - added by karsten 9 years ago.
Fraction of relays on static/dynamic IP addresses obtaining the Guard/Stable flag
dynamic-stables-guards-fractions-2010-08-31.png (56.2 KB) - added by karsten 9 years ago.
Fraction of relays on static/dynamic IP addresses obtaining the Guard/Stable flag where dynamic IP address means 4 or more different IP addresses each assigned for less than 1 week

Download all attachments as: .zip

Change History (47)

comment:1 Changed 9 years ago by nickm

Milestone: Tor: 0.2.2.x-final

comment:2 Changed 9 years ago by nickm

Let's have a look through the networkstatus db to see how often routers with changed IPs are marked as Stable/Guard to see if this is still happening.

comment:3 Changed 9 years ago by nickm

Description: modified (diff)
Owner: set to karsten
Status: newassigned

Assigning this one to Karsten for investigation, since he seems to be Keeper of Networkstatus Deltas these days. Karsten, please un-assign it if it isn't up your alley.

comment:4 Changed 9 years ago by karsten

Attached is a graph on the fraction of relays on dynamic and static IP addresses obtaining the Guard and/or Stable flag. I'm considering a relay to be on a dynamic IP address if it's listed with 4 or more different IP addresses in the past 12 months.

So, only very few relays on dynamic IP addresses obtain the Guard flag, but up to 1/4 of them get the Stable flag. What did we expect these two fractions to be? Something close to zero?

I can look more at the details if someone tells me what to look for. Maybe there's a better way to distinguish static and dynamic IP addresses?

And yes, investigating these things is easy for me. If there are other bugs that need similar analysis, feel free to assign them to me.

Changed 9 years ago by karsten

Fraction of relays on static/dynamic IP addresses obtaining the Guard/Stable flag

comment:5 Changed 9 years ago by nickm

Having 4 IP addresses in 12 months isn't a big deal as far as stability is concerned. Having 20 or more -- or having many IPs that were held for less than a week -- would probably pose more of a problem for stability. Are there many Stable nodes like that?

Changed 9 years ago by karsten

Fraction of relays on static/dynamic IP addresses obtaining the Guard/Stable flag where dynamic IP address means 4 or more different IP addresses each assigned for less than 1 week

comment:6 Changed 9 years ago by karsten

I changed the criterion for calling an IP address dynamic by requiring 4 or more different IP addresses each assigned for less than 1 week. This reduces the number of relays classified as being on a dynamic IP address from 10273 to 8001 in the same time period. The graph changes a bit, but the number of relays on dynamic IP addresses having the Stable or Guard flag is still not close to zero. Is this expected?

comment:7 Changed 9 years ago by nickm

According to Roger, no.

comment:8 Changed 9 years ago by nickm

06:27 < armadev> i think you shouldn't get the stable flag if you're never up

more than 24 hours

06:28 < armadev> guard would be ok with me

comment:9 Changed 9 years ago by arma

Nick suggests there's a bug where we don't reset wfu/mtbf when the relay's IP address changes. That sounds quite plausible: if it's recently reachable every time we check whether we found it reachable, we'll never mark it not-running, so we'll never register a state change.

comment:10 Changed 9 years ago by arma

Triage: this bug should not block 0.2.2.x-rc. When we finally get around to fixing the wfu/mtbf bugs (which may well involve a proposal and a new design), we can fix them on the authority side, so there's no need to hold up the new stable.

comment:11 Changed 9 years ago by karsten

Owner: changed from karsten to nickm

Reassigning to nickm for actual bug fixing. Feel free to reassign to me if
there's more archive parsing required.

comment:12 Changed 9 years ago by nickm

So to clarify, the desired behavior is that when your IP changes, it should be as if you just shut down your router and came back as a new router? That seems suboptimal.

Perhaps instead if your IP changes, we should treat you as having been down since the last time you were observed to be up. This will count as downtime for the purpose of resetting the current MTBF run, and also take a little chunk out of WFU. But maybe it doesn't limit WFU enough. Authorities learn about IP changes more frequently than clients, after all, so an IP change that is only 15 minutes long from the authorities' perspective will be a couple hours from the clients' perspective.

So maybe we should treat an IP change as if it were equal to automatic downtime of the same length as the average propagation time of a server descriptor to a client?

comment:13 Changed 9 years ago by Sebastian

It does appear to me that average propagation time is the right thing here. But what is the average propagation time? This is likely to be two different things when microdescs and normal descriptors are served in parallel, and we don't want to punish the IP-changing relay as much.

comment:14 Changed 9 years ago by nickm

Status: assignedneeds_review

Patch in branch bug1035 in my public. Please review?

comment:15 Changed 9 years ago by Sebastian

Looks good to me with a trivial compile without warnings fix in my branch bug1035. My question about what to do with the penalty calculation when microdescriptors are involved still stands, tho, but I can agree to defer that until we have fully switched to microdescs

comment:16 Changed 9 years ago by nickm

Merged your changes onto my branch. I'd still like another set of eyes here; this code is a bit tricky.

comment:17 Changed 9 years ago by rransom

We should penalize relays that change their ports, too.

comment:18 Changed 9 years ago by Sebastian

Owner: changed from nickm to Sebastian
Status: needs_reviewassigned

working on a patch for that

comment:19 in reply to:  14 Changed 9 years ago by rransom

Replying to nickm:

Patch in branch bug1035 in my public. Please review?

In commit bea0a31c1c859:

  • 78s/sucessfully/successfully/
  • /^#define MIN_DOWNTIME_FOR_ADDR_CHANGE/d

Other than that, looks good through commit 59a3d536d80d4 (except for the port-change issue Sebastian is fixing).

comment:20 Changed 9 years ago by Sebastian

Status: assignedneeds_review

I pushed a new branch (bug1035) to fix all the aforementioned issues. Also added a commit that adds a question, please drop it before merging if I am missing something

comment:21 Changed 9 years ago by Sebastian

pushed a forced update with requested changes from nick. One thing to consider is if we want to avoid logging bridge identities as bridge authority

comment:22 Changed 9 years ago by nickm

arma doesn't like the idea of not logging here, since "we log things to help users, and some of those logs help users."

My rationale is that users tend to assume that if SafeLogging is set, they can post their logs without negative repercussions, and posting a big pile of bridge addresses does have negative repurcussions.

comment:23 Changed 9 years ago by arma

sebastian's 9b64227ffd38e would appear to have added an assert possibility,
where if the function gets called with at_port != 0, but at_addr 0, then

+  } else if (addr_changed || port_changed) {

will be true, but then

    tor_assert(at_addr);

I don't think we call it that way, currently, but still worth fixing.

comment:24 Changed 9 years ago by arma

As for Sebastian's idea of omitting the log line entirely for the bridge authority, check out #1117 -- we'll want log lines like this when trying to sort out that bug.

comment:25 in reply to:  24 ; Changed 9 years ago by Sebastian

Replying to arma:

As for Sebastian's idea of omitting the log line entirely for the bridge authority, check out #1117 -- we'll want log lines like this when trying to sort out that bug.

I copied this approach from dirserv_orconn_tls_done() where we also don't log some things when we are a bridge authority. Maybe that will need fixing too when this is resolved.

comment:26 in reply to:  23 ; Changed 9 years ago by arma

Replying to arma:

 +  } else if (addr_changed || port_changed) {

This patch could be as easy as changing to

} else if (addr_changed && port_changed) {

comment:27 Changed 9 years ago by arma

Ok. I like the overall idea here. Merge whenever other people become happy.

I think the "omit the log message when you're a bridge authority" is pointless and could be mildly harmful down the road, but I don't feel strongly enough to fight it if everybody else thinks it's critical to keep the info-level logs that Lucky doesn't keep by default anyway from having sensitive info.

comment:28 in reply to:  26 ; Changed 9 years ago by rransom

Replying to arma:

Replying to arma:

 +  } else if (addr_changed || port_changed) {

This patch could be as easy as changing to

} else if (addr_changed && port_changed) {

No.

comment:29 in reply to:  27 Changed 9 years ago by rransom

Replying to arma:

Ok. I like the overall idea here. Merge whenever other people become happy.

I think the "omit the log message when you're a bridge authority" is pointless and could be mildly harmful down the road, but I don't feel strongly enough to fight it if everybody else thinks it's critical to keep the info-level logs that Lucky doesn't keep by default anyway from having sensitive info.

The bridge authority must be operated by someone who has a clue. I see absolutely no reason to keep sensitive information out of the bridge authority's info-level logs.

comment:30 in reply to:  28 Changed 9 years ago by arma

Replying to rransom:

No.

Fair point.

Another option is

  if (at_port)
    tor_assert(at_addr);

somewhere toward the top.

I guess technically this isn't much different from the current behavior, except we fail earlier and more obviously.

comment:31 in reply to:  25 Changed 9 years ago by arma

Replying to Sebastian:

Replying to arma:

As for Sebastian's idea of omitting the log line entirely for the bridge authority, check out #1117 -- we'll want log lines like this when trying to sort out that bug.

I copied this approach from dirserv_orconn_tls_done() where we also don't log some things when we are a bridge authority. Maybe that will need fixing too when this is resolved.

Ah. Go back and read that function again -- it does the opposite of what you think.

      if (!bridge_auth || ri->purpose == ROUTER_PURPOSE_BRIDGE) {

means "if you're a bridge authority, only do this for descriptors with purpose bridge." Basically that means this function, if you're a bridge authority, logs and remembers reachability info only about bridges.

comment:32 Changed 8 years ago by Sebastian

Staring at this more, I'm convinced that all the ways how we could possibly end up calling this with an orport and not an address is a serious bug, even with nickm's safety check before calling rep_hist_note_router_reachable() from dirserv_orconn_tls_done(). I think asserting that either both are set or unset is the right thing to do here.

Updated my branch with a commit to change that, also taking out the conditional logging. Please review again?

comment:33 Changed 8 years ago by nickm

I am liking this now. I think it is good to merge. I'll give others a day or two to look at it also.

One subtle point is that now we must only call rep_hist_note_router_reachable() when the address of the router matches the one in the routerinfo we have. (We do this.) If we were to call it for addresses other than the listed address, then an attacker could do an IP-forwarding MITM to make a node appear to be at two addresses in rapid succession over and over, thus hammering the node's apparent uptime. Maybe we should comment that.

comment:34 Changed 8 years ago by nickm

Resolution: Nonefixed
Status: needs_reviewclosed

On second review, I still like it. Merging.

comment:35 Changed 8 years ago by BarkerJr

I think this bug needs to be reopened and revisited. The stability of a router is not impacted by an IP or port change. The connections to the old IP/port remain active. The point of the Stable flag is not if a router is reachable, but instead if long-running transactions get broken.

comment:36 Changed 8 years ago by arma

Resolution: fixed
Status: closedreopened

comment:37 Changed 8 years ago by arma

Reopened so we answer barkerjr's question rather than forgetting about it.

comment:38 in reply to:  35 Changed 8 years ago by rransom

Replying to BarkerJr:

I think this bug needs to be reopened and revisited. The stability of a router is not impacted by an IP or port change. The connections to the old IP/port remain active. The point of the Stable flag is not if a router is reachable, but instead if long-running transactions get broken.

A change in IP address and/or port clearly affects the WFU (weighted fractional uptime) that clients will experience, so the authorities should consider it in their calculation of WFU (and in their decision as to whether to assign the Guard flag).

We should assign the Stable flag based on MTBF (mean time between failures), but we cannot easily measure that. If you know how to improve our measurement of MTBF, patches welcome. (A specification of a good algorithm would be helpful, too.)

However, most events that will cause a relay to change its IP address do cause all of its connections to break, so IP address changes will still need to be counted against a relay's MTBF.

comment:39 Changed 8 years ago by Sebastian

Component: Tor RelayTor Directory Authority
Milestone: Tor: 0.2.2.x-finalTor: unspecified

We could do a reachability test to the old ip/port, and only if that fails count the change against the relay.

comment:40 Changed 8 years ago by arma

Oh hey, this bug is still open. I have a bug with what we committed a few months back (9b64227ffd38e9):

Jul 07 14:08:27.000 [info] rep_hist_note_router_reachable(): Router BFA1BA13CAC96F8C13689FE1FDC65359ED6AC6D7 still seems Running, but its address appears to have changed since the last time it was reachable.  I'm going to treat it as having been down for 7200 seconds

When moria1 restarts, it says this to every relay that it finds reachable. Presumably that's because

  addr_changed = at_addr &&
    tor_addr_compare(at_addr, &hist->last_reached_addr, CMP_EXACT) != 0;
  port_changed = at_port && at_port != hist->last_reached_port;

and last_reached_foo starts out empty.

Perhaps we want to only say addr or port changed if they changed from something non-empty?

comment:41 Changed 8 years ago by Sebastian

Milestone: Tor: unspecifiedTor: 0.2.3.x-final
Priority: minornormal
Status: reopenedassigned

Sounds like we should definitely fix this

comment:42 Changed 7 years ago by nickm

Milestone: Tor: 0.2.3.x-finalTor: 0.2.4.x-final

comment:43 Changed 7 years ago by nickm

Keywords: tor-auth added

comment:44 Changed 7 years ago by nickm

Component: Tor Directory AuthorityTor

comment:45 in reply to:  41 Changed 7 years ago by arma

Resolution: fixed
Status: assignedclosed

Replying to Sebastian:

Sounds like we should definitely fix this

Agree.

I filed it as #8218, and I'm going to close this ticket since #1035 was an earlier, and finished, issue.

Note: See TracTickets for help on using tickets.