Opened 6 months ago

Closed 2 months ago

#29231 closed defect (fixed)

Relays vastly underreport write-total in padding-counts line in extrainfo descriptor

Reported by: arma Owned by:
Priority: Very High Milestone: Tor: 0.4.1.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-relay padding wtf-pad
Cc: mikeperry Actual Points:
Parent ID: #28634 Points:
Reviewer: dgoulet Sponsor:

Description

Here is a snippet from WhatsGoingOn's extrainfo descriptor:

write-history 2019-01-27 18:10:06 (86400 s) 7709989888,6911394816,6582383616,7393989632,6065009664
read-history 2019-01-27 18:10:06 (86400 s) 7644498944,6824735744,6550761472,7355173888,6059125760
[...]
padding-counts 2019-01-27 22:06:23 (86400 s) bin-size=10000 write-drop=0 write-pad=220000 write-total=250000 read-drop=0 read-pad=210000 read-total=12310000 enabled-read-pad=10000 enabled-read-total=20000 enabled-write-pad=10000 enabled-write-total=10000 max-chanpad-timers=6

Notice how the read-total is reasonable compared to read-history, but the write-total is way undercounting compared to write-history.

I suspect that write-total is the wrong one.

Child Tickets

Change History (10)

comment:1 Changed 6 months ago by dgoulet

Two functions are used to count the total read and write bytes:

rep_hist_padding_count_read()
rep_hist_padding_count_write()

The read one is called for each cell that enters tor within channel_tls_handle_cell() which is what we want for the total cell counter.

However, for the write counter, it is within connection_or_write_cell_to_buf() called by chan->write_cell() function pointer that currently points to channel_tls_write_cell_method().

The problem is that currently, only the channel padding subsystem (netflow padding) is using chan->write_cell(). Furthermore, connection_or_write_cell_to_buf() is directly called by:

  • NETINFO cell: connection_or_send_netinfo()
  • KEEPALIVE: run_connection_housekeeping()
  • Channel write_cell() function pointer.

Thus, we are effectively only counting NETINFO, keepalive (PADDING cell) and netflow padding.

comment:2 Changed 6 months ago by dgoulet

Cc: dgoulet removed
Keywords: tor-relay padding added
Milestone: Tor: 0.4.1.x-final

comment:3 Changed 5 months ago by arma

Oh, I should probably say: credit to karsten and irl for finding the anomaly. I was just nearby at the hack meeting and overheard them talking about it and filed the ticket.

comment:4 Changed 3 months ago by mikeperry

Parent ID: #28634

comment:5 Changed 3 months ago by mikeperry

Keywords: wtf-pad added
Priority: MediumVery High

comment:6 Changed 3 months ago by mikeperry

Status: newneeds_review

https://github.com/torproject/tor/pull/998 - after conversations with dgoulet, I think this is now correct. Unfortunately rephist doesn't have unit tests or good instrumentation. I could expose some of its global vars and add some tests to channeltls that covers some of this stuff, if we feel that is important.

comment:7 Changed 2 months ago by asn

Reviewer: dgoulet

comment:8 Changed 2 months ago by dgoulet

Status: needs_reviewmerge_ready

lgtm;

Just to clarify Mike, in write_packed_cell(), we count the cell only if the cell was successfully put in the connection buffer. Any error there will not count the cell.

That it OK? As I recall some stats we want to count every cell, not just the ones that succeeded.

comment:9 in reply to:  8 Changed 2 months ago by mikeperry

Replying to dgoulet:

lgtm;

Just to clarify Mike, in write_packed_cell(), we count the cell only if the cell was successfully put in the connection buffer. Any error there will not count the cell.

That it OK? As I recall some stats we want to count every cell, not just the ones that succeeded.

Yes, this is what we want here -- we want to know the actual written cell counts and actual written padding counts, so we can measure what the overhead is on the wire and make sure the actual overhead is reasonable for the current network conditions.

The patch also relocates the circuitpadding stat accounting to behave similarly -- if the #29204 fix causes us to not pad because the queue is full, we should not count those cells in our statistics.

comment:10 Changed 2 months ago by nickm

Resolution: fixed
Status: merge_readyclosed

merged to master.

Note: See TracTickets for help on using tickets.