Opened 4 years ago

Closed 4 years ago

#13667 closed defect (fixed)

Prevent port scanning of hidden services

Reported by: arma Owned by:
Priority: High Milestone: Tor: 0.2.8.x-final
Component: Core Tor/Tor Version:
Severity: Keywords: tor-hs, 025-backport
Cc: tom Actual Points:
Parent ID: Points:
Reviewer: Sponsor: SponsorR

Description

If you connect to a hidden service that's listening on virtual port 5222, and send it a begin cell for port 80, it will send you back an end cell but leave the circuit up.

I actually thought the design was more defensive: that if you ever asked for a virtual port that wasn't assigned, then it would close the circuit on you, to prevent scanning to find out what ports *are* open.

But it turns out I never built it that way. We should fix it.

With thanks to Ivan Pustogarov for noticing.

Child Tickets

Change History (33)

comment:1 Changed 4 years ago by arma

The code to look at is in connection_exit_begin_conn() where it does

    if (rend_service_set_connection_addr_port(n_stream, origin_circ) < 0) {
      log_info(LD_REND,"Didn't find rendezvous service (port %d)",
               n_stream->base_.port);
      relay_send_end_cell_from_edge(rh.stream_id, circ,
                                    END_STREAM_REASON_EXITPOLICY,
                                    origin_circ->cpath->prev);
      connection_free(TO_CONN(n_stream));
      tor_free(address);
      return 0;
    }

and instead it should return something negative (and maybe clean up differently, I haven't looked).

comment:2 Changed 4 years ago by dgoulet

Started poking around a bit with this one. Here is what I think is possible.

1) Send back "END_STREAM_REASON_CONNECTREFUSED" which does *not* fully tell if the port is open or not. It could be that the service *might* exists but misconfigured or not started.

2) Drop the circuit with no reason which seems like a firewall with a DROP policy. However, I think it's pretty easy to link that to a "close port" for HS.

3) Send back a mysterious reason with "END_STREAM_REASON_MISC" but I think it is also too easy to associate that reason with a closed port for HS.

Usually firewall have two policies either DROP or REJECT (that sends a TCP reset) so at best I think we can make port scanning harder but not completely stop it. I would advocate for 1) here since it's actually a true and valid reason if the port is not supported by the HS and the other side (client) can't really confirm if it's a problem of the remote service or HS port.

comment:3 Changed 4 years ago by nickm

Keywords: 025-backport added
Milestone: Tor: 0.2.5.x-finalTor: 0.2.6.x-final

comment:4 Changed 4 years ago by nickm

Priority: normalmajor

comment:5 Changed 4 years ago by nickm

Changing this to END_STREAM_REASON_CONNECTREFUSED is fine. Also let's consider

4) DROP after sending some number of CONNECTREFUSED ?

comment:6 Changed 4 years ago by arma

'1' doesn't make me very satisfied. It means that if there is a port that's open, you can keep asking and you'll find it. That sounds like the same situation as now.

'2' indeed doesn't hide whether the port worked, but it sure slows down scanning. Can we argue that it slows down scanning enough to make it basically useless on a large scale? (A downside is that if somebody *does* decide to scan anyway, they'll sure be putting a lot of pain on the network.)

Does '4', for a low number, basically approximate one of the earlier options? E.g. we'd have to also include configured but actually down services, or you could just ask for the same one k times in a row and if it hangs up then you know it was the 'defense'.

Are there arguments against '2' other than 'it's not a complete solution'?

comment:7 Changed 4 years ago by yawning

Drawing from the PT bag of tricks...

5) Send back END_STREAM_REASON_DONE as if the application closed the connection, optionally after either a certain amount of time has passed or a certain amount of data has been received from the peer. Once this happens once, tag the circuit as "kind of sketch", and apply the same policy for all new streams regardless of if the destination port is actually valid or not.

It makes debugging harder, but slows down scanning as much as 2 (though a intelligent scanner would try common ports first), and it gives the bad guys more work to do post processing the results.

Not sure about the implications for "close the stream after a while" in this context, parts of it scare me but it may be something that's less scary if the parameters are set correctly.

comment:8 Changed 4 years ago by dgoulet

It seems clear here that we can't stop a network scan to finally succeed whatever behavior so our best course of actions is to slow down and made it as hard as we can for the attacker to scan.

Option '5' scares me a bit in terms of added overhead. Seems like adding delay to the bad circuit opens the door to some DoS for which an attacker could just open 65534 circuits to the HS with the wrong port and those circuits would stay open for an unknown amount of time?... If that would be acceptable for some reasons, adding a random delay before sending back the reason will also slow down quite a bit the scanner :).

I would go for sending back END_STREAM_REASON_DONE (application closed the connection) and DROP the circuit after.

comment:9 in reply to:  8 Changed 4 years ago by yawning

Replying to dgoulet:

It seems clear here that we can't stop a network scan to finally succeed whatever behavior so our best course of actions is to slow down and made it as hard as we can for the attacker to scan.

Indeed.

Option '5' scares me a bit in terms of added overhead. Seems like adding delay to the bad circuit opens the door to some DoS for which an attacker could just open 65534 circuits to the HS with the wrong port and those circuits would stay open for an unknown amount of time?... If that would be acceptable for some reasons, adding a random delay before sending back the reason will also slow down quite a bit the scanner :).

Yeah, I suggested doing that on the assumption (possibly wrong) that once established, keeping circuits around has minimal overhead. Establishing them is not cheap (because crypto), but keeping them around doesn't involve work right?

Right the idea is to try to get them to waste as much time as possible by adding randomized delay before the reason (and allowing the attacker to send a few cells of payload, once few is reached will also trigger sending REASON_DONE).

The 2nd part of 5 involves not dropping the circuit. Since we know they're up to no good, and have a way to waste their time and feed back garbage data into the scanner, we let the attacker "open" as many streams as they want using the known "eeevill" circuit, only to delay then close the stream down. (So say, someone starts scanning a HS that only is running httpd on a standard port, when their scanner gets to 80, it gets the "fake" discard+close behavior.).

We could add a maximum lifetime to evil portscanning circuits (based on number of stream open attempts and time), but all of this may be more effort than it's worth.

I would go for sending back END_STREAM_REASON_DONE (application closed the connection) and DROP the circuit after.

If doing the full blown "pretend we are there even accepting payload and injecting delay" is too expensive, this seems like the best lightweight alternative.

comment:10 Changed 4 years ago by tom

Cc: tom added

comment:11 Changed 4 years ago by dgoulet

After a discussion on IRC at the little-t tor meeting, here is the consensus. In a nutshell, reason done and kill the circuit.

< nickm> Ultimately, there is no solution to #13667.  If a client can try to connect 
         to a port, and if that client can differentiate success from failure, and   
         the scanner knows everything that the client does, then ultimately the   
         scanner can scan ports.
< dgoulet> yes exactly so our best course of action is to make it harder as we can I  
           guess
< nickm> so, if we do END_REASON_DONE and drop, they have to build more circuits and   
         do more introduction handshakes.
< dgoulet> "2)" has the possible drawback of the HS having a lot of opened circ.
< nickm> If we  do "insert random delays and finally drop at some point", they have  
         to open just as many circuits, maybe, and their programming job gets a  
         little harder, but they can do multiple queries in parallel, so ultimately  
         we're not slowing them down much
< dgoulet> nickm: yeah the parallel scanning makes that solution a bit useless
< nickm> I think that "drop and kill the circuit" is probably a reasonable thing to  
         do, in terms of trade-off between how much it helps and how hard it is.
< Yawning> ;_;
< Yawning> yeah
< dgoulet> yeah
< Yawning> if people are more paranoid, they could use authenticated HSes or  
           something
< nickm> Yeah. For a real answer, I'd think that better access control is the answer.

comment:12 Changed 4 years ago by dgoulet

Status: newneeds_review

Patch available in my tpo dgoulet repo on branch bug13667_025_v1. This also applies without any problem on master (0.2.6)

comment:13 Changed 4 years ago by arma

Patch looks like it'll work to me. Thanks!

1) Did you test it?

And since I'm helping you get up to speed as a developer, some minor points (which maybe you and nickm/andrea should talk through and agree about how to handle):

2)

+                                            Circuit is origin here thus
+       * will be set automatically to REASON_NONE anyway. (ref: #13667) */
+      circuit_mark_for_close(circ, END_CIRC_REASON_NONE);

But in circuit_mark_for_close_ I see

  if (reason == END_CIRC_AT_ORIGIN) {
    if (!CIRCUIT_IS_ORIGIN(circ)) {
      log_warn(LD_BUG, "Specified 'at-origin' non-reason for ending circuit, "
               "but circuit was not at origin. (called %s:%d, purpose=%d)",
               file, line, circ->purpose);
    }
    reason = END_CIRC_REASON_NONE;
  }

Does that mean saying END_CIRC_AT_ORIGIN is the safer plan here? How come we don't use ND_CIRC_AT_ORIGIN more often? If we opt not to use it here, should we just get rid of it?

3) Is "(ref: #13667)" the way we want to point people at the bugtracker? I know in the past nickm has wanted to make sure that the code stands independently of the bugtracker, but I think it does in this case.

4) "Note that this does not mitigate port scanning" -- I would say it actually does mitigate it. It just doesn't completely solve it.

5) If you happen to reformat the comments and they ended up a few columns shorter, I wouldn't object. :)

6) The first two rows of your changes file want to right-shift two spaces, and s/immetiately/immediately/

comment:14 in reply to:  13 Changed 4 years ago by dgoulet

Replying to arma:

Patch looks like it'll work to me. Thanks!

1) Did you test it?

Yes I did. Chutney network, confirmed with torsocks and logs. I should have mention it, good point.

And since I'm helping you get up to speed as a developer, some minor points (which maybe you and nickm/andrea should talk through and agree about how to handle):

2)

+                                            Circuit is origin here thus
+       * will be set automatically to REASON_NONE anyway. (ref: #13667) */
+      circuit_mark_for_close(circ, END_CIRC_REASON_NONE);

But in circuit_mark_for_close_ I see

  if (reason == END_CIRC_AT_ORIGIN) {
    if (!CIRCUIT_IS_ORIGIN(circ)) {
      log_warn(LD_BUG, "Specified 'at-origin' non-reason for ending circuit, "
               "but circuit was not at origin. (called %s:%d, purpose=%d)",
               file, line, circ->purpose);
    }
    reason = END_CIRC_REASON_NONE;
  }

Does that mean saying END_CIRC_AT_ORIGIN is the safer plan here? How come we don't use ND_CIRC_AT_ORIGIN more often? If we opt not to use it here, should we just get rid of it?

Not sure it's actually a safer plan. I don't know why we don't use it more often but providing the real reason on why we are closing the circuit seems a better option instead of having it changed for us. IMO, this is the caller responsability to provide the right reason, not the callee that set it to something else for some conditions.

3) Is "(ref: #13667)" the way we want to point people at the bugtracker? I know in the past nickm has wanted to make sure that the code stands independently of the bugtracker, but I think it does in this case.

Clarified by nickm on IRC, I removed it :).

4) "Note that this does not mitigate port scanning" -- I would say it actually does mitigate it. It just doesn't completely solve it.

Fixed.

5) If you happen to reformat the comments and they ended up a few columns shorter, I wouldn't object. :)

Just for you, I setup my vim for the entire tor code base to use 76 char :D.

6) The first two rows of your changes file want to right-shift two spaces, and s/immetiately/immediately/

Fixed by 5).

Updated patch in bug13667_025_v2

Thanks!

comment:15 Changed 4 years ago by dgoulet

Hrm on second thought, connection_exit_begin_conn() specifies that it should return a negative reason if we want the caller to close the circuit. However, fun fact, END_CIRC_REASON_NONE == 0 and closing the circuit right in that function is actually against the "rule" ... Thus returning "END_CIRC_AT_ORIGIN" could be more appropriate since it's value is -1.

IMO, this function should close the circuit itself on error or, for some reason, if we don't want that, set explicitely an err. reason in a variable and return -1 if we want the circuit to be closed. The NONE reason is 0 and also a valid reason to actually close a circuit. I don't see any palces where we send back a NONE and want to close the circuit but it's error prone in the long run...

Thoughts?

comment:16 Changed 4 years ago by nickm

We could add a higher-value return code that gets sent as NONE? Like, if we return -260 (value chosen at random), that could mean "close and send END_CIRC_REASON_NONE." I agree this probably needs api rethinking...

comment:17 Changed 4 years ago by dgoulet

That means you have two values in the ABI for the same return code but behaves differently which will probably end up confusing some people when adding code or reviewing it I think. And it gets crazier when at some point in time you end up to -260 for a legitimate reasons (or other code). I feel that this could simply be an other ticket for a discussion on that and at some point a fix.

However, the fix for this current issue for now, I feel should be closing the circuit right away with NONE (current patch in bug13667_025_v2) or returning END_CIRC_AT_ORIGIN for a negative value and let the caller close the circuit.

comment:18 Changed 4 years ago by dgoulet

Ok for this to move forward, I went with the "return negative circuit reason" so it can be close by the caller which is what is documented for connection_exit_begin_conn()

Fix in bug13667_025_v3

comment:19 Changed 4 years ago by nickm

Under this patch, won't it be the case that an attacker can still easily tell exit-policy failures from a missing application, since exit policy failures will make us close the circuit, but missing applications won't?

If that's the case, I don't know what the reason is for sending END_STREAM_REASON_DONE, if we're going to close too.

comment:20 Changed 4 years ago by dgoulet

After an IRC dicussion, the current solution is a bit weak because now a REASON_DONE + Killing circuit basically means an EXIT_POLICY reason which apart from adding some delay to the port scanning (attacker needs to open lots of circuit), it doesn't do much.

nickm came up with an interesting idea.

< nickm> yeah.  If we really want to be sneaky, the way to do it is...
< nickm> have a flag on the circuit that means "say DONE to everything".
< nickm> Set that flag when we would otherwise send EXITPOLICY
< nickm> Then refuse all connections on that circuit
< nickm> This forces them to open 65535 circuits  to do a portscan, *and* doesn't leak down vs exitpolicy

I tend to like that.

comment:21 Changed 4 years ago by nickm

Status: needs_reviewneeds_revision

ok; moving this to needs_revision

comment:22 Changed 4 years ago by dgoulet

Status: needs_revisionneeds_review

After discussion on IRC, 3 things came up that we want:

  1. Users are not terribly inconvenienced.
  2. Scanners cannot easily tell whether a port is firewalled or whether the service is down.
  3. Scanning requires a new circuit per port.

The branch below addresses 1) and 3) since 2) is quite incompatible with 1). The idea in comment:20 is not really working for any legitimate user since the circuit stays open and tor on the client side has no way of knowing that it should use a new circuit for a new port thus stalling until a timeout...

Solution is to go back to the DONE + kill circuit idea.

See bug13667_025_v4.

comment:23 Changed 4 years ago by nickm

Milestone: Tor: 0.2.6.x-finalTor: 0.2.5.x-final

Merged that; thanks! Marking for possible backport.

comment:24 Changed 4 years ago by cypherpunks

Resolution: not a bug
Status: needs_reviewclosed

But I don't want it to do this. I know what ports I have configured on my HS. I don't want to build a new circuit for every port a portscanner scans! Please, at least make this misfeature configurable.

comment:25 Changed 4 years ago by nickm

I agree with not backporting to 0.2.5. Adding a new ticket for a config option.

comment:26 Changed 4 years ago by dgoulet

Resolution: not a bug
Status: closedreopened

Ok so I'm here with syverson, robjansen and Roger discussing about this one and a possible "attack" came up. Here is the scenario:

Alice has a long lived connection to let say an IRC server on aaaa.onion port 6667. Now, Alice receives an email saying "Hey, can you connect to aaaa.onion:6668?" for which the HS does have that port in its exit policy. So Alice, connects, the tor client reuses the RP circuit but on a wrong port and blam the circuit is killed thus the long lasting connection.

A solution we thought of here is to pin a *good* virtual port on a circuit and not accepting connections on it with an other port.

Makes sense?

comment:27 in reply to:  26 Changed 4 years ago by arma

Replying to dgoulet:

Alice has a long lived connection to let say an IRC server on aaaa.onion port 6667. Now, Alice receives an email saying "Hey, can you connect to aaaa.onion:6668?" for which the HS does have that port in its exit policy. So Alice, connects, the tor client reuses the RP circuit but on a wrong port and blam the circuit is killed thus the long lasting connection.

I think you can do this automatically by having an img link in your attacking page, which when Alice visits it, it blows away her existing irc connection to that other service.

A solution we thought of here is to pin a *good* virtual port on a circuit and not accepting connections on it with an other port.

Right -- basically the proposed fix is to have one circuit per virtualport Alice is reaching.

(It might be that we should re-close this ticket and open a new one for the new issue.)

comment:28 Changed 4 years ago by nickm

Milestone: Tor: 0.2.5.x-finalTor: 0.2.6.x-final
Resolution: fixed
Status: reopenedclosed

Not getting backported; please open new tickets for new issues.

comment:29 Changed 4 years ago by asn

Resolution: fixed
Status: closedreopened

I will reopen this for a bit, to decide whether we want to keep this feature in or not.
It was shortly discussed in the dev meeting that we might want to strip off this feature, so that jerks who port scan anyway don't cause extra damage to the network.

This is on the SponsorR roadmap, so let's discuss this a bit over April.

comment:30 Changed 4 years ago by nickm

Milestone: Tor: 0.2.6.x-finalTor: 0.2.7.x-final

comment:31 Changed 4 years ago by nickm

Milestone: Tor: 0.2.7.x-finalTor: 0.2.8.x-final

comment:32 Changed 4 years ago by nickm

Keywords: SponsorR removed
Sponsor: SponsorR

Bulk-replace SponsorR keyword with SponsorR sponsor field in Tor component.

comment:33 in reply to:  29 Changed 4 years ago by asn

Resolution: fixed
Status: reopenedclosed

Replying to asn:

I will reopen this for a bit, to decide whether we want to keep this feature in or not.
It was shortly discussed in the dev meeting that we might want to strip off this feature, so that jerks who port scan anyway don't cause extra damage to the network.

This is on the SponsorR roadmap, so let's discuss this a bit over April.

People don't seem to have strong opinions about this, and I'm personally OK with the current behavior.

Feel free to reopen the ticket if you disagree.

Note: See TracTickets for help on using tickets.