Opened 6 months ago

Last modified 4 months ago

#33880 assigned defect

Confusing "Your relay has a very large number of connections to other relays" relay message

Reported by: arma Owned by: nickm
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version: Tor: 0.3.1.1-alpha
Severity: Normal Keywords: prop311, ipv6, 035-backport, 042-backport, 043-backport, 044-deferred
Cc: rany@…, nickm Actual Points:
Parent ID: #33048 Points: 0.5
Reviewer: Sponsor: Sponsor55-can

Description

A new relay operator reports this complaint in their logs, showing up hourly:

Your relay has a very large number of connections to other relays. Is your
outbound address the same as your relay address? Found 13 connections to 8
relays. Found 13 current canonical connections, in 0 of which we were a
non-canonical peer. 5 relays had more than 1 connection, 0 had more than 2, and
0 had more than 4 connections.

I checked, and their outbound address was the same as their relay address.

Upon investigation, it looks like the redundant connections are to directory authorities.

My theory is that the change from #17592 (which went into 0.3.1.1-alpha, commit d5a151a0) is responsible: while before that canonical relay-to-relay connections would expire after either side first reached its randomized "15 to 22.5 minute" timeout, now they expire after either side reaches its "45 to 75 minute" timeout. And since directory authorities test reachability every 1280 seconds (around 21.3 minutes), that means it is expected that most relays will have duplicate canonical connections with directory authorities.

Possible fixes:

(A) Change the notice-level log to make it clearer that it's not scary, or at least it's not actionable. Maybe that means making it info-level so nobody will see it. Probably not the best option, assuming there *are* cases where we do want relay operators to hear it.

(B) In channel_check_for_duplicates(), change MIN_RELAY_CONNECTIONS_TO_WARN 5 to a high enough number that even if we have 2 canonical conns per authority, plus a bit more, the log message still doesn't trigger.

(C) In channel_check_for_duplicates(), skip over connections to directory authorities in some way, since we know they will be special.

(D) Make connections to or from directory authorities expire quicker, on the theory that they don't really need the same level of padding protection as other connections.

(E) Your idea here?

I'd be fine with any of B,C,D. Whichever one can be done with an easy, short, and non-invasive patch is my favorite. Maybe that's "B, raise it to 30"? That would make the message trigger when we have connections to more than 30 relays and also we have more than 45 connections open. Or we could pick the more conservative "raise it to 40", on the theory that small numbers are more likely to have edge cases and less indicative of major network problems anyway.

And while we're at it, it might be smart to say in the log message what action we want the relay operator take, e.g. "Please report:".

Child Tickets

TicketStatusOwnerSummaryComponent
#24841closednickmYour relay has a very large number of connections to other relays. Is your outbound address the same as your relay address?Core Tor/Tor

Change History (13)

comment:1 Changed 6 months ago by rany

Cc: rany@… added

comment:2 Changed 6 months ago by arma

See #24841 for what I believe is the same bug.

comment:3 Changed 6 months ago by arma

rany: did your relay have an open ipv6 orport at the time?

comment:4 Changed 6 months ago by rany

Yes, I had an open IPv6 ORPort. If it helps, I was previously using HE.NET TunnelBroker but later switched to native IPv6.

comment:5 Changed 6 months ago by arma

Intriguing!

That provides support for teor's theory that it's actually parallel ipv4 and ipv6 connections from the directory authorities, not two overlapping ipv4 connections.

Especially since your number ranged from 4 to 5, and the numbers in #24841 and #26199 ranged from 3 to 6. Not all of the dir auths do ipv6 testing, so if anybody had a number of 8 or 9 it would be easier to rule out the ipv6 theory.

comment:6 Changed 6 months ago by rany

I would like to add that that message is now gone. So I don't think I could provide any logs anymore.

It is still very intriguing. While I think this is a minor issue, it should be resolved because it can be confusing for a new relay operator like myself.

I was actually confused and thought I did something wrong.

comment:7 Changed 6 months ago by rany

A good solution, I think, might be to only show this message after a set amount of days have gone by. That way the Tor relay would have more total connections and the message wouldn't be triggered.

Another solution maybe to not count directory authorities in this check.

comment:8 Changed 6 months ago by teor

Parent ID: #24841

comment:9 in reply to:  7 ; Changed 6 months ago by teor

Sponsor: Sponsor55-can

Replying to rany:

Another solution maybe to not count directory authorities in this check.

I don't think it matters exactly how we do that. I have a slight preference for doing all these things:
(A1) Log at info level, until we reach MIN_RELAY_CONNECTIONS_TO_WARN.
(A2) Ask operators to report at warn level, but ignore at info level.
(B1) Change MIN_RELAY_CONNECTIONS_TO_WARN to be at least 2*9, let's say 50.
(B2) Also warn if any relay has more than 4 connections (that is, more than 2 sides multiplied by 2 IP addresses, if there is disagreement over canonical connections).
(C) Ignore 2 connections to directory authorities, and 1 connection to other relays.

I think we should backport A1 and A2. We should also backport at least one of B1 or C.

(Note that in #33048, relays start making IPv4 and IPv6 connections to other relays. So we might want to always ignore 2 connections to each relay. We may also need to backport that change.)

comment:10 Changed 6 months ago by teor

Cc: nickm added
Keywords: 035-backport 042-backport 043-backport added
Milestone: Tor: 0.4.4.x-final
Owner: set to nickm
Parent ID: #24841#33048
Points: 0.5
Status: newassigned
Version: Tor: 0.3.1.1-alpha

Copying over details from #24841.

comment:11 Changed 6 months ago by teor

Keywords: prop311 ipv6 added

comment:12 in reply to:  9 Changed 6 months ago by teor

Replying to teor:

(Note that in #33048, relays start making IPv4 and IPv6 connections to other relays. So we might want to always ignore 2 connections to each relay. We may also need to backport that change.)

I opened #33905 for this change, it shouldn't need to be backported.

comment:13 Changed 4 months ago by nickm

Keywords: 044-deferred added
Milestone: Tor: 0.4.4.x-finalTor: unspecified

Bulk-remove tickets from 0.4.4. Add the 044-deferred label to them.

Note: See TracTickets for help on using tickets.