Opened 2 years ago

Last modified 11 months ago

#23878 new enhancement

Attempt rewriting buffers.c in Rust

Reported by: isis Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: rust-pilot, rust, datatypes, 034-triage-20180328, 034-removed-20180328
Cc: manishearth@…, chelseakomlo Actual Points:
Parent ID: Points: 3
Reviewer: Sponsor:

Description

In buffers.c, we define buf_t, which is essentially a doubly-linked list comprised of chunks of contiguously-allocated memory. During the Montréal meeting, we identified buf_t as a potentially good candidate datatype for reimplementation in Rust.

My understanding of possibly the ideal way to do this (after talking with Alex Crichton, without boats, nickm, and Nika Layzell) would be to entirely rethink the implementation in terms of a VecDeque<Bytes> using VecDeque from the stdlib and Bytes or another buffer type from the bytes crate. If this is something which works out, we could then (hopefully!) expose a similar API as to the C interface. (If that doesn't work out, there's only a couple points in the code which appear to rely on the current implementation of buf_t.)

Child Tickets

TicketTypeStatusOwnerSummary
#26754enhancementneeds_revisionstop using BUFFERS_PRIVATE
#27319defectnewremove buf_get_oldest_chunk_timestamp()

Change History (10)

comment:1 Changed 2 years ago by manish.earth

Cc: manishearth@… added

comment:2 Changed 2 years ago by manish.earth

Probably a thin wrapper around VecDeque<u8> would work better; Bytes seems to itself be a growable buffer (which also does stuff like reference counting).

Looking at the C API in buffers.h there doesn't seem to be any reason why we can't expose the same API.

(I don't mind taking a crack at this if I get the time, but I also don't want to take the place of someone who wants to learn Rust/Rust FFI through this exercise)

comment:3 Changed 2 years ago by manish.earth

Oh, I misunderstood the code; yeah, VecDeque<Vec<u8>> would be good (VecDeque<Bytes> is similar but has extremely minor refcounting overhead which we probably can live with)

comment:4 Changed 2 years ago by chelseakomlo

Cc: chelseakomlo added

comment:5 Changed 21 months ago by nickm

Keywords: 034-triage-20180328 added

comment:6 Changed 21 months ago by nickm

Keywords: 034-removed-20180328 added

Per our triage process, these tickets are pending removal from 0.3.4.

comment:7 Changed 20 months ago by nickm

Milestone: Tor: 0.3.4.x-finalTor: unspecified

These tickets, tagged with 034-removed-*, are no longer in-scope for 0.3.4. We can reconsider any of them, if time permits.

comment:8 Changed 17 months ago by cypherpunks

Since d8b34e0886c9fd08fb51e0de4fadca7da82056ff, buffers.c has been split and refactored four ways now, so there are three callers in the C code that still dig into the implementation detail guts of buf_t in search of raw pointers, presumably with the benefit of zero-copy performance.

Could compress_buf.c, buffers_net.c, and buffers_tls.c just switch to using the single-copy public API, or should a new zero-copy API be added that the Rust version can implement too?

It could be something that synchronously invokes a callback exposing a raw pointer, so that the return value tells buf_t how many bytes were written/read, and the callback serves as a clear sign to users that the pointer is not guaranteed to remain valid past the end of that function.

typedef size_t (*buf_read_cb)(const char *string, size_t string_len, void *data);
typedef size_t (*buf_write_cb)(char *string, size_t string_len, void *data);

void buf_read_call(buf_t *buf, buf_read_cb *cb, void *data);
void buf_write_call(buf_t *buf, buf_write_cb *cb, void *data);
Last edited 17 months ago by cypherpunks (previous) (diff)

comment:9 in reply to:  8 Changed 17 months ago by nickm

Replying to cypherpunks:

Since d8b34e0886c9fd08fb51e0de4fadca7da82056ff, buffers.c has been split and refactored four ways now, so there are three callers in the C code that still dig into the implementation detail guts of buf_t in search of raw pointers, presumably with the benefit of zero-copy performance.

Could compress_buf.c, buffers_net.c, and buffers_tls.c just switch to using the single-copy public API, or should a new zero-copy API be added that the Rust version can implement too?

So, I'd be fine with either one of these as a separate patch. Rather than needing a callback, I'd prefer an approach that exposed the contents of the underlying buf_t as something iovec-like.

The reason I'd be okay with either approach for now is that in any reasonable implementation of memcpy, copying data is far cheaper than TLS or compression, and we don't need to worry about an extra copy too much.

comment:10 Changed 11 months ago by gaba

Sponsor: Sponsor8-can
Note: See TracTickets for help on using tickets.