Opened 16 months ago

Last modified 2 months ago

#24249 needs_information enhancement

Create automated mechanism for C/Rust types to stay in sync

Reported by: chelseakomlo Owned by: chelseakomlo
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: rust, 040-deferred-201915
Cc: isis, teor Actual Points:
Parent ID: Points:
Reviewer: isis Sponsor:

Description (last modified by chelseakomlo)

In transitioning parts of tor to Rust, some parts of the code will either need to temporarily exist in both C and Rust (such as protover), or will be highly coupled (such as enums that are passed between the FFI boundary).

It would be good to automatically verify these areas of the code don't get out of sync. This could either be a post-hoc verifier, or a generator that takes a higher-level specification and generates both C and Rust types.

Ideally, the coupling between C and Rust will be as minimal as possible, so this probably does not need to be a heavyweight solution.

Child Tickets

Change History (17)

comment:1 Changed 16 months ago by chelseakomlo

Description: modified (diff)

comment:2 Changed 10 months ago by chelseakomlo

Owner: set to chelseakomlo
Status: newassigned

Started initial work this week to keep constants in sync across language boundaries, but this should be able to be extended to enums, structs, etc.

I'll put up an initial POC for review, and then we can talk about some questions that I have around project structure/external dependencies (i.e, where should generated files on the Rust/C side be in the project structure, is it ok that a pre-processing step to generate files relies on an external library, etc).

comment:3 Changed 10 months ago by chelseakomlo

Cc: isis added

comment:4 Changed 9 months ago by chelseakomlo

Initial implementation at http://github.com/chelseakomlo/types-parser- documentation is currently a bit rough so please let me know how this can be further improved.

One TODO that I'm currently working on is a suggestion from Nick, which is to allow the ability to give items a prefix in C that they don't have in Rust. That is fairly easy and I'll fix this up.

This tool can be extended to enums and structs, but I think we should take a look at Bindgen again before deciding we want to build capacity for generating FFI.

A few questions:
1) Is this something we want in the core tor codebase, or should this be an external tool (as it relies on https://github.com/dtolnay/syn? I'm leaning towards this remaining external but would be interested to hear what others think. One argument for making this internal is that we could have a make task that would search for all _generated.rs files and generate corresponding c files (as opposed to having this be a manual task during development).

2) What code conventions do we want for where generated files should be in the tor codebase? I think it might be good to have a separate file for each subdomain/subsystem- i.e, bridges, hidden services, dirauths, etc, but it would be good to also logically be able to track easily which files are generated or not. Curious what others think.

comment:5 Changed 9 months ago by chelseakomlo

Cc: teor added

comment:6 Changed 9 months ago by chelseakomlo

Status: assignedneeds_review

comment:7 Changed 9 months ago by asn

Reviewer: isis

comment:8 Changed 9 months ago by isis

Status: needs_reviewneeds_information

Chelsea and I talked about this a bit and we want to take a minute to explore how difficult it would be to get bindgen to do this for us (without generating the entire world), but in the opposite direction. This could probably build off of Nick's work on #26383.

comment:9 Changed 9 months ago by chelseakomlo

Ok, here is the overall difference between using bindgen (C to Rust) or a custom tool (Rust to C).

Bindgen (C to Rust):

Takes C data types and generates FFI bindings for them. https://rust-lang-nursery.github.io/rust-bindgen

As shown in their documentation, a struct in C such as:

typedef struct Person {
    int height;
    char *name;
} Person;

Becomes in Rust:

#[repr(C)]                                                                                                                                                                                                         
#[derive(Debug, Copy)]                                                                                                                                                                                             
pub struct Person {                                                                                                                                                                                                
    pub height: ::std::os::raw::c_int,                                                                                                                                                                             
    pub name: *mut ::std::os::raw::c_char,                                                                                                                                                                         
} 

and a constant in C

#define PERSON_NAME "HELLO";

becomes in Rust:

pub const PERSON_NAME: &'static [u8; 6usize] = b"HELLO\x00";

If we were to use bindgen, we could generate types, but we would still need to write wrapper "glue" to translate specific libc types into Rust types (such as a *mut c_char into a String, smartlists into vectors, etc).

Advantages of using bindgen:

  • Using a library we don't have to maintain
  • Ability to autogenerate multiple types (functions, structs, enums, etc).

Disadvantages of using bindgen:

  • Hand-writing FFI type translation code
  • More FFI code which is unsafe and (for now) often requires copies

Custom library (Rust to C):

(something like http://github.com/chelseakomlo/types-parser)

Takes Rust data types and generates C equivilant types.

For example, a struct in Rust such as:

struct Person {
    height: i32,
    name: String,
}

could be autogenerated to the following struct in C:

typedef struct Person {
    int height;
    char *name;
} Person;

and a constant in Rust:

const FIRST_CONTSTANT: &'static str = "first_constant";

becomes in C:

#define FIRST_CONTSTANT first_constant

Advantages of using a custom library:

  • Compile types into Rust and C without further translation/error handling code

Disadvantages of using a custom library:

  • Maintain our own library
  • Effort to scale out features. Currently supporting only constants is simple, but adding new data types will be effort. (for example, structs with smartlists as fields won't be the easiest to compile, I think)

Overall Thoughts:

While I'm not crazy about maintaining our own library, I do prefer the option of being able to cleanly keep Rust and C types separate without the need to hand write FFI type-conversion glue. However, both approaches certainly have drawbacks, and overall this effort should be minimal as we are striving to keep the Rust/C FFI interface small to begin with.

Considering currently we mainly need to keep constants and enums in sync, both options are relatively equal. The complexity for both increases for more complex of data types to keep in sync.

comment:10 Changed 9 months ago by chelseakomlo

Status: needs_informationneeds_review

comment:11 Changed 9 months ago by chelseakomlo

Ok, Manish pointed me to this project, which generates C bindings for Rust. https://github.com/eqrion/cbindgen

I definitely think this is the approach we should take, but I'll dig into this project to see if it does what we need.

comment:12 Changed 9 months ago by chelseakomlo

Status: needs_reviewneeds_information

comment:13 Changed 7 months ago by nickm

Milestone: Tor: unspecified

comment:14 Changed 6 months ago by chelseakomlo

Alex Crichton pointed me to this, which can be helpful for verifying Rust C bindings: https://github.com/alexcrichton/ctest

comment:15 Changed 6 months ago by nickm

Milestone: Tor: unspecifiedTor: 0.3.6.x-final

comment:16 Changed 4 months ago by nickm

Milestone: Tor: 0.3.6.x-finalTor: 0.4.0.x-final

Tor 0.3.6.x has been renamed to 0.4.0.x.

comment:17 Changed 2 months ago by nickm

Keywords: 040-deferred-201915 added
Milestone: Tor: 0.4.0.x-finalTor: unspecified

Deferring some tickets from 0.4.0 without proposing them for later. Please tag with 041-proposed if you want to do them.

Note: See TracTickets for help on using tickets.