Opened 7 months ago

Closed 5 months ago

#25400 closed defect (fixed)

CIRC_BW event miscounts, should count all circ data

Reported by: mikeperry Owned by: mikeperry
Priority: Medium Milestone: Tor: 0.3.4.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-stats, review-group-34, 034-roadmap-subtask, 034-triage-20180328, 034-included-20180328
Cc: karsten, atagar Actual Points:
Parent ID: #25546 Points:
Reviewer: dgoulet Sponsor:

Description

The CIRC_BW event as implemented seems to be totaling only explicit application data on the circuit. To have an accurate representation of circuit usage, it should total all bytes sent on the circuit, including padding, dropped cells, and partially full cells.

Furthermore, it currently counts bytes differently depending on if you have the STREAM event enabled (see control_event_stream_bandwidth()). This is definitely a bug.

Child Tickets

Change History (16)

comment:1 Changed 7 months ago by teor

Keywords: tor-stats added
Milestone: Tor: 0.3.4.x-final

How is this event used?
I'd like to know how important it is, so I can prioritise this ticket.

comment:2 Changed 7 months ago by mikeperry

Status: assignedneeds_review

Patch in mikeperry/bug25400. Spec patch in my torspec remote, also branch bug25400.

Right now this branch only counts relay cells, which is similar to the behavior before. I decided against counting CREATED cells, and VAR cells, because those are a bit weird since there is not a circuit ID on the outgoing cell, only upon the response. Let me know what you think.

For what it's worth, counting all cells (including padding/dropped/rejected/partially full ones) in circuit bandwidth totals allows a Tor controller to check for side channel attacks by doing accounting on STREAM_BW totals on a circuit and comparing that to the CIRC_BW totals. Large differences indicate side channel abuse (depending on the application protocol).

comment:3 Changed 7 months ago by mikeperry

Cc: atagar added

To Teor's question, Karsten implemented the event, possibly for metrics usage. I don't know if metrics relies upon the existing behavior in a critical way. atagar may also have more information on CIRC_BW usage by Nyx and other stem apps.

comment:4 Changed 7 months ago by mikeperry

Another additional wrinkle to discuss: I chose to use CELL_PAYLOAD_SIZE as the number of bytes to count per cell. That value is the total bytes in a cell, less cell header size. I also briefly debated subtracting the common relay header size, but since the true relay header size varies by relay cell type, I decided not to subtract the common relay header from the totals. The Tor controller can decide to subtract the common relay header out itself (as it represents a constant rate of overhead). The vanguards controller will do this subtraction.

comment:5 Changed 7 months ago by atagar

atagar may also have more information on CIRC_BW usage by Nyx and other stem apps

Stem and Nyx don't use CIRC_BW. Stem has parsing code for the events, but that's all.

comment:6 Changed 7 months ago by nickm

Keywords: review-group-34 added

comment:7 Changed 7 months ago by dgoulet

Reviewer: dgoulet

Reviewer week 03/16th

comment:8 Changed 6 months ago by asn

Parent ID: #25546

comment:9 in reply to:  4 ; Changed 6 months ago by dgoulet

Status: needs_reviewneeds_information

Good stuff Mike. I want to ask and clarify couple of things here.

  1. Why not put this counting in command_process_relay_cell()? I'm asking because that function, before calling circuit_receive_relay_cell() can close the circuit because of invalid RELAY_EARLY cell or too many of them or if cells are received but the circuit state doesn't allow it. Don't you want to catch those also in your calculation? Looking at this comment, it seems you might?
+  /* Count all circuit bytes here for control port accuracy. We want
+   * to count even invalid/dropped relay cells, hence counting
+   * before the recognized check and the connection_edge_process_relay
+   * cell checks.
  1. The CELL_PAYLOAD_SIZE, as you stated in your previous comment, doesn't take into account the header size but that header _can_ bring the cell size up to 514 bytes (CELL_MAX_NETWORK_SIZE). I'm not sure I understand the reasoning behind not counting the header. You mention that the controller application should subtract the header size but it really shouldn't if tor is not counting them in the first place? Actually, the cell size can vary depending on the use of "wide circ ids" so shouldn't get_cell_network_size(chan->wide_circ_ids) be used instead of CELL_PAYLOAD_SIZE?
  1. The counting bytes code for read/written is really the same so I would be much happier with a function that does that and could take a pointer to n_read_circ_bw or n_written_circ_bw for the calculation (or not a pointer, just return the new value). Seems to me will be easier to unit test but also better code semantic as well.

comment:10 Changed 6 months ago by nickm

Keywords: 034-roadmap-subtask added

comment:11 Changed 6 months ago by nickm

Keywords: 034-triage-20180328 added

comment:12 Changed 6 months ago by nickm

Keywords: 034-included-20180328 added

comment:13 in reply to:  9 Changed 6 months ago by mikeperry

Status: needs_informationneeds_review

Replying to dgoulet:

Good stuff Mike. I want to ask and clarify couple of things here.

  1. Why not put this counting in command_process_relay_cell()? I'm asking because that function, before calling circuit_receive_relay_cell() can close the circuit because of invalid RELAY_EARLY cell or too many of them or if cells are received but the circuit state doesn't allow it. Don't you want to catch those also in your calculation? Looking at this comment, it seems you might?

Actually, yes, let's move this block up towards the top of circuit_receive_relay_cell(). I had put it below since those violations closed the circuit, but you're right, let's count them too. Thanks!

https://oniongit.eu/mikeperry/tor/commit/4371b54854545a3962f9455fe318711dc7b799f9

+  /* Count all circuit bytes here for control port accuracy. We want
+   * to count even invalid/dropped relay cells, hence counting
+   * before the recognized check and the connection_edge_process_relay
+   * cell checks.
  1. The CELL_PAYLOAD_SIZE, as you stated in your previous comment, doesn't take into account the header size but that header _can_ bring the cell size up to 514 bytes (CELL_MAX_NETWORK_SIZE). I'm not sure I understand the reasoning behind not counting the header. You mention that the controller application should subtract the header size but it really shouldn't if tor is not counting them in the first place? Actually, the cell size can vary depending on the use of "wide circ ids" so shouldn't get_cell_network_size(chan->wide_circ_ids) be used instead of CELL_PAYLOAD_SIZE?

I'm using the definition of "total bandwidth carried by the circuit". This means that the payload size is the natural value here. For purposes of the payload carried, it does not matter if the *circuit* headers are different sizes. My earlier comment was about *relay* headers, which are part of the (encrypted) circuit payload, and I think should be counted in this stat.

  1. The counting bytes code for read/written is really the same so I would be much happier with a function that does that and could take a pointer to n_read_circ_bw or n_written_circ_bw for the calculation (or not a pointer, just return the new value). Seems to me will be easier to unit test but also better code semantic as well.

Good point. I made this u32 overflow add into a util.c helper, since it is a common pattern, independent of this circuit accounting:

https://oniongit.eu/mikeperry/tor/commit/b9f28f475ecc546503bd85d1d4a0564f09b638a3 (impl + tests)
https://oniongit.eu/mikeperry/tor/commit/4e86178122c7aa20af91838c59ee5dd6fe6bb7d5 (use)

comment:14 Changed 5 months ago by dgoulet

Status: needs_reviewmerge_ready

Good stuff!

Your top fixup commit that uses tor_add_u32_nowrap() won't squash cleanly since that function is added in its own commit so you might want to either do a new commit or base it on that helper function commit.

Happy with the fixups! lgtm;

comment:15 Changed 5 months ago by mikeperry

Ok I rebased on to latest origin/master, and squashed and changed that second fixup commit message.

Branch is mikeperry/bug25400_squashed.

comment:16 Changed 5 months ago by nickm

Resolution: fixed
Status: merge_readyclosed

lgtm; merged to master!

Note: See TracTickets for help on using tickets.