Implement a way to utilise tor's logging system from Rust code

changed milestone to %Tor: 0.3.3.x-final

added component::core tor/tor milestone::Tor: 0.3.3.x-final owner::isis points::3 priority::high resolution::fixed review-group-29 review-group-34 reviewer::nickm rust rust-pilot severity::normal sponsor::M-can status::closed type::enhancement labels

Trac:
Username: manish.earth
Cc: N/A to manishearth@gmail.com

Trac:
Cc: manishearth@gmail.com to manishearth@gmail.com, chelseakomlo

Trac:
Status: new to assigned
Owner: N/A to chelseakomlo

I have been working on this and hopefully will have a patch within the week. One thing that has taken some time is deciding which logging function to use- on the C side, we have several macros which wrap log_fn_. We can implement similar functionality in Rust, or we can create helper functions in C- if anyone has arguments for/against either of these, let me know.

So my first thought is to take this approach:

Refactor the C a bit, though, so that there is a tor_log_string() function taking a severity, domain, and message. It should work the same as tor_log(severity, domain, message);, but instead it should be safe to call it function with strings containing %. (I can write this one if you want: it will involve refactoring logv a little.
Expose that function to rust.
Write a single rust macro that takes severity, domain, message, and some format arguments, and uses the format! mechanism to generate the log message.

Optionally afterwards:

Expose a view of log_global_min_severity to rust, maybe via a function.
Update the rust macro to compare log_global_min_severity to the declared severity, and avoid formatting the message if no logs actually want it.

Replying to nickm:

So my first thought is to take this approach:

Refactor the C a bit, though, so that there is a tor_log_string() function taking a severity, domain, and message. It should work the same as tor_log(severity, domain, message);, but instead it should be safe to call it function with strings containing %. (I can write this one if you want: it will involve refactoring logv a little.

This sounds great- I was hesitant to add a bunch of functionality to Rust just to make it compatible with the existing C api. If you have time to do this, that would be great! Otherwise I can take a stab at it this week.

Expose that function to rust.

Write a single rust macro that takes severity, domain, message, and some format arguments, and uses the format! mechanism to generate the log message.

This is approximately what I have, with the ability to optionally compile with a no-op implementation (for running tests at the rust module level).

Optionally afterwards:

Expose a view of log_global_min_severity to rust, maybe via a function.

Update the rust macro to compare log_global_min_severity to the declared severity, and avoid formatting the message if no logs actually want it. This sounds great- we can do this either in this ticket or an enhancement.

So, I did a really simple untested implementation of the backend functions. It turned out not to actually need any refactoring, which is good, because that code is a mess. :)

It's in a branch rust_log_backend in my public repository, and it should be a good basis to build on.

Ok, see the merge request on gitlab here: https://oniongit.eu/network/tor/merge_requests/7

This PR creates a minimal implementation for Rust modules to log using tor's existing logger.

One issue that I ran into is that log levels/domains are defined as macros in C. I redeclared a small subset of these as constants- if this is the approach that we want to take, I can fill out the remainder of necessary constants. Another alternative is to define log level/log domain macros as constants in C.

In order to run module-level Rust tests without linking to C, it is necessary to "stub" out direct calls to C functions. This PR adds a "testing" feature in order to optionally compile no-op implementations for testing. (The approach of "write C integration tests and test Rust in isolation" is an approach taken by Servo, although it would be great to get Rust people's eyes on this to double check how I've done this).

Looking forward to feedback and suggestions how to improve!

Trac:
Cc: manishearth@gmail.com, chelseakomlo to manishearth@gmail.com, chelseakomlo, isis
Status: assigned to needs_review

This looks pretty good! I like how you've set up the testing code.

Some review comments

const int _LOG_WARN = LOG_WARN;

This is interesting. Usually I'd just mirror the constants and leave comments, and bindgen is useful for automirroring (but y'all aren't using bindgen and that's ok).

I guess this works. I've never seen constants mirrored this way but I do feel it is more resilient to code changing.

    /// Allow for default cases in case Rust and C log types get out of sync
    #[allow(unreachable_patterns)]
    pub unsafe fn translate_severity(severity: LogSeverity) -> c_int {
        match severity {
            LogSeverity::Warn => _LOG_WARN,
            LogSeverity::Notice => _LOG_NOTICE,
            _ => _LOG_NOTICE,
        }
    }

There's no need for that unreachable_patterns. The only way stuff will go out of sync is if LogSeverity gets a variant added, however that will trigger a compiler error in this match (if you leave off the _ branch). It's nice to be able to modify enums and immediately have the compiler tell you where you need to update things

            unsafe {
                tor_log_string(translate_severity($severity),
                translate_domain($domain),
                func_ptr, msg_ptr
                )
            }

These will fail to resolve if called outside of the correct module with the right imports.

The best thing to do is to use $crate::tor_log_string(..) (see the docs on $crate, https://doc.rust-lang.org/1.7.0/book/macros.html#the-variable-crate) which will ensure these will resolve wherever you use them.

const ERR_LOG_MSG: &'static str

Rust lets you omit the 'static now (this is recent, so it works on the latest stable but may not work on the compiler you're using)

One concern of mine is that the tor_log_msg macro contains unsafe code, and more problematically, it contains macro variables in that unsafe code.

Now this is firmly in "why would anyone even do that" territory, but for example someone could call tor_log_msg!({some code}, ....), and that code would be expanded to be inside an unsafe block without the caller ever having to use unsafe.

You should probably have severity/domain get bound to variables outside of the unsafe block.

Ideally the entire macro body would be moved to a function, making the body just handle the format! stuff and calling the function. However, this means you can no longer do the testing trick where the test function itself returns what was printed.

Trac:
Username: manish.earth

Trac:
Status: needs_review to needs_revision

Thanks for the really helpful review! All of these suggestions make sense- I wasn't sure about the best way to import dependencies for a macro in other modules.

It is interesting to hear your recommendation to use a function for the macro body. Would you recommend using the #[inline] tag for that function then?

I'll make these fixes over the weekend, thanks again for the help.

Would you recommend using the #[inline] tag for that function then?

yep.

To give an exact description of #[inline]:

For the same crate it acts as a small hint for inlining. This is also true for generics that get instantiated in a different crate.

For non-generic functions from a different crate, it is impossible for rustc to inline without this, because by the time you get to the current crate it's already been codegenned. #[inline] forces rustc to include extra metadata that lets the next crate re-codegen it if it wishes to. (It also contains the small hint for inlining)

#[inline(always)] is a strong hint, and will force inlining unless it's impossible (e.g. in the presence of a lot of recursion). #[inline(never)] does what it says on the tin.

In this case it's a macro and going to be used from a different crate, so we need #[inline] so that the other crate can inline it if the optimizer thinks it is a good idea. We don't need inline(always) -- the optimizer is smart.

Trac:
Username: manish.earth

See fixes from Manish's review on the same branch: https://oniongit.eu/network/tor/merge_requests/7

A couple thoughts:

Testing

I wasn't able to test the inline logging function, as I wasn't sure the best way how to. I think I could properly test this with a global variable that was only defined for unit tests (such as a Cursor, and "log" to it in the no-op implementation. However, I ran into issues due to the limitation of creating non-struct static types in stable Rust.

For example, for the following declaration:

pub static mut log_cursor: Cursor<Vec<String>> = Cursor::new(Vec::new());

I received this error:
calls in statics are limited to struct and enum constructors

And for this declaration:

{{{
struct LastLogged {
function: Option
message: Option
}
}}}

I received this error:
destructors in statics are an unstable feature

Maybe we just need to wait for this feature to land in stable, but it would be nice to have the flexibility for functions that are inherently hard to test, without having to change the code itself.

Feature flags for cargo test

I am unable to run cargo test with a specific flag. For example, I wanted to be able to run cargo test --feature "testing", but this flag is not applied to other modules as well (i.e, running tests in protover will not result in tor_log being compiled with the "testing" flag). I'm able to run cargo test --all-features but being able to turn on features at a fine-grained level for tests would be nice.

Binding compile-time constants to log level macros

Nick- as for the declaration of log levels in src/common/torlog.h, if we can refactor these into compile-time constants (instead of macros) that would remove the need to declare _constant for each. Let me know what you think, I can help do this refactor.

For example, instead of binding _LD_NET to LD_NET, we could declare LD_NET as a constant directly.

Trac:
Status: needs_revision to needs_review

However, I ran into issues due to the limitation of creating non-struct static types in stable Rust.

The lazy_static crate helps here, static mut is something you rarely use.

I am unable to run cargo test with a specific flag. For example, I wanted to be able to run cargo test --feature "testing",

You'll want to make the feature depend on features from other crates.

Trac:
Username: manish.earth

Replying to manish.earth:

However, I ran into issues due to the limitation of creating non-struct static types in stable Rust.

The lazy_static crate helps here, static mut is something you rarely use.

I don't think we should import this crate just for this test case- I did see that come up as an option though.

I am unable to run cargo test with a specific flag. For example, I wanted to be able to run cargo test --feature "testing",

You'll want to make the feature depend on features from other crates.

Thanks! I didn't see that that was a possibility, that is super helpful, I added this in the last commit in this branch. We can now run tests via cargo test --features "testing" where the code has a dependency to C with a no-op (such as in the protover module), and still unit test the Rust code. I'm pretty happy about that.

I don't think we should import this crate just for this test case- I did see that come up as an option though.

you can use a dev-dependency so it only gets used in testing

Alternatively, have a static FOO: *const YourType = 0 as *const _; and allocate it at the beginning of the test program. That's basically what lazy_static does, except lazily.

Trac:
Username: manish.earth

If you give me code that works using lazy_static I can de-lazy-static it, if you want

Trac:
Username: manish.earth

Replying to manish.earth:

Alternatively, have a static FOO: *const YourType = 0 as *const _; and allocate it at the beginning of the test program. That's basically what lazy_static does, except lazily.

Good suggestion- I took a stab at implementing something similar to what lazy_static does (declare a static mutable pointer and allocate data on the heap during assignment). As this code is purely for testing, I don't have as many safety checks, but open to feedback.

See commit 4cdf0c8899796582b058d289bcd21b95dd1bd452.

(that commit doesn't seem to be on your oniongit repo -- got a link?)

Trac:
Username: manish.earth

Replying to manish.earth:

(that commit doesn't seem to be on your oniongit repo -- got a link?)

Apologies! Multiple remotes- it should be updated now. Thanks for taking a look.

https://oniongit.eu/network/tor/merge_requests/7

Trac:
Keywords: N/A deleted, review-group-28 added

It would also be great to hear recommendations (Nick might be the best person) on whether refactoring the C side should be in scope for this ticket. See this comment regarding log level macros https://trac.torproject.org/projects/tor/ticket/23881#comment:14

Trac:
Reviewer: N/A to nickm

I'd be fine with the change in log level and/or domain macros. Reviewing the branch now...

I added some comments to the merge request, but then apparently oniongit.eu decided to start giving me 500 errors.

Here are the other questionsissues that I didn't enter into oniongit -- I hope that it comes back up again soon, and we can find the other questions/issues too.

I'd recommend a rename on rust_welcome_string() to rust_log_welcome_string(): by convention, a function whose primary purpose is to do something should have a verb name.
Will the translate_* functions get inlined and compiled down to constants? Logging is pretty critical-path.
The C log functions exit fast, without formatting the message, if the log message isn't "interesting" because its severity is too low. I think that if possible the Rust log macro should do the same, and avoid formatting messages when the function log_message_is_interesting() returns false. That could be a separate commit, though.

Newbie questions from oniongit that I can remember:

Does CString::new() do a complete copy on the message?
What part of the Rust code removes the indentation on the log strings?

Change requests I can remember from oniongit:

The tor_log_msg!() macro documentation should document its support for format! syntax.
In C, all identifiers that start with an underscore are reserved: we can create constants like LOG_WARN_, but using _LOG_WARN is risky and we try not to do that.

Trac:
Status: needs_review to needs_revision

Does CString::new() do a complete copy on the message?

It depends what you give it. If you give it borrowed data, it will copy. If you give it a String or a Vec, it will copy.

(I'll have a look at the code soon, missed that it was updated)

Trac:
Username: manish.earth

It looks like the merge request on oniongit is back up! Please let me know if you'd like me to copy my comments from here there, or from there here.

Trac:
Keywords: review-group-28 deleted, review-group-29 added

(I'm still okay with taking this in 0.3.3 postfreeze if it remains pretty much rust-only.)

Sounds good! I got a late start on this but about 3/4 of the way through, hoping to have everything up soon.

[non sequitur] Wow! Today I learned, after like eight years of using Trac, from Manish, that you can do {{{#!rust and get syntax highlighting on Trac. I literally had no idea. Amazing.

I had a call with Chelsea last weekend, and we went over this patch. Chelsea has been super busy, so I offered to make whatever little tweaks are necessary to address the comments on this ticket.

Trac:
Status: needs_revision to accepted
Owner: chelseakomlo to isis

Okay, to answer the question of "Does CString::new perform a copy?"

Manish answered above, but I think there was a typo in the answer? (It seems to be missing the word "not" in the latter clause?) I had assumed it always copied, so I went and dug through the stdlib to figure it out.

So looking at the source of CString::new:

    pub fn new<T: Into<Vec<u8>>>(t: T) -> Result<CString, NulError> {
        Self::_new(t.into())
    }

    fn _new(bytes: Vec<u8>) -> Result<CString, NulError> {
        match memchr::memchr(0, &bytes) {
            Some(i) => Err(NulError(i, bytes)),
            None => Ok(unsafe { CString::from_vec_unchecked(bytes) }),
        }
    }

[wow!!! check out that ↑↑↑↑↑↑ frickin syntax highlighting!!!!! zomg, my life is changed!!]

it takes a any T which has implemented a way (Into<T>) to transmute it into a Vec<u8> and calls Cstring::_new() on it. There exists impl<'a> From<&'a str> for Vec<u8>:

impl<'a> From<&'a str> for Vec<u8> {
    fn from(s: &'a str) -> Vec<u8> {
        From::from(s.as_bytes())
    }
}

The _new() function checks that is has no NUL bytes, and errors if it does, then calls CString::from_vec_unchecked():

    pub unsafe fn from_vec_unchecked(mut v: Vec<u8>) -> CString {
        v.reserve_exact(1);
        v.push(0);
        CString { inner: v.into_boxed_slice() }
    }

And then Vec::into_boxed_slice():

    pub fn into_boxed_slice(mut self) -> Box<[T]> {
        unsafe {
            self.shrink_to_fit();
            let buf = ptr::read(&self.buf);
            mem::forget(self);
            buf.into_box()
        }
    }

which is doing some olympic level gymnastics where std::ptr::read is returning a Rawvec, then it mem::forget()s itself so that the deallocator never runs, and finally it Boxes the memory it forgot about by calling RawVec::into_box():

impl<T> RawVec<T, Heap> {
    /// Converts the entire buffer into `Box<[T]>`.
    ///
    /// While it is not *strictly* Undefined Behavior to call
    /// this procedure while some of the RawVec is uninitialized,
    /// it certainly makes it trivial to trigger it.
    ///
    /// Note that this will correctly reconstitute any `cap` changes
    /// that may have been performed. (see description of type for details)
    pub unsafe fn into_box(self) -> Box<[T]> {
        // NOTE: not calling `cap()` here, actually using the real `cap` field!
        let slice = slice::from_raw_parts_mut(self.ptr(), self.cap);
        let output: Box<[T]> = Box::from_raw(slice);
        mem::forget(self);
        output
    }
}

which is doing even more olympic level gymnastics with slice::from_raw_parts, which is like the King of Capital-U Unsafety, because it just says "N bytes from this pointer forward are now a slice, with an lifetime that I decided by royal proclamation, and which is not necessarily the correct lifetime given that I was handed a raw pointer that I know nothing about". But it doesn't copy. Then it does a similar pirouette as the trick we saw before with std::ptr::read and then Boxing the RawVec, but even more complicated and dangerous, like if the last one was a pirouette, this is like a pirouette while juggling machetes. Box::from_raw is only really supposed to be called on pointers produced via Box::into_raw, but whatever, so many crazy things have already been done with these pointers to this memory. So Box::from_raw creates a Unique<T>, which just says, "I own this raw pointer to a T (*mut T), and I'm just going to pretend it is a T." and then calls Box::from_unique on the resulting Unique<T>:

    pub unsafe fn from_unique(u: Unique<T>) -> Self {
        mem::transmute(u)
    }

which, as you can see, finishes its machete-juggling pirouette off with a triple axel flamethrower party by calling mem::transmute, re-interpretting the raw bits of the wrapped raw pointer as a Box. So much Unsafe! Seemingly no additional allocations. :)

Yeah, I'd missed the "not" in the second half.

You're right that CString::_new does no copies or allocations; the sneaky bit is the t.into() -- the From<&str> impl for Vec<u8>. That itself calls the From<&[u8]> for Vec<u8> impl (https://github.com/rust-lang/rust/blob/29f5c699b11a6a148f097f82eaa05202f8799bbc/src/liballoc/vec.rs#L2161-L2171) :

impl<'a, T: Clone> From<&'a [T]> for Vec<T> {
    #[cfg(not(test))]
    fn from(s: &'a [T]) -> Vec<T> {
        s.to_vec()
    }
    #[cfg(test)]
    fn from(s: &'a [T]) -> Vec<T> {
        ::slice::to_vec(s)
    }
}

which calls alloc::slice::to_vec (https://github.com/rust-lang/rust/blob/29f5c699b11a6a148f097f82eaa05202f8799bbc/src/liballoc/slice.rs#L164-L170), which creates a new vector and copies stuff into it via extend_from_slice

    pub fn to_vec<T>(s: &[T]) -> Vec<T>
        where T: Clone
    {
        let mut vector = Vec::with_capacity(s.len());
        vector.extend_from_slice(s);
        vector
}

If T is a String or Vec<u8>, it won't be copied, because the From<String> for Vec<u8> impl is basically a no-op move. But if it's a borrow (e.g. &str or &[u8] or something), it will be copied into a new allocation. It has to be -- you can't guarantee that borrow will live forever.

Trac:
Username: manish.earth

Replying to manish.earth:

Yeah, I'd missed the "not" in the second half.

You're right that CString::_new does no copies or allocations; the sneaky bit is the t.into() -- the From<&str> impl for Vec<u8>. That itself calls the From<&[u8]> for Vec<u8> impl (https://github.com/rust-lang/rust/blob/29f5c699b11a6a148f097f82eaa05202f8799bbc/src/liballoc/vec.rs#L2161-L2171) :

{{{#!rust impl<'a, T: Clone> From<&'a [T]> for Vec { #[cfg(not(test))] fn from(s: &'a [T]) -> Vec { s.to_vec() } #[cfg(test)] fn from(s: &'a [T]) -> Vec { ::slice::to_vec(s) } } }}}

which calls alloc::slice::to_vec (https://github.com/rust-lang/rust/blob/29f5c699b11a6a148f097f82eaa05202f8799bbc/src/liballoc/slice.rs#L164-L170), which creates a new vector and copies stuff into it via extend_from_slice

{{{#!rust pub fn to_vec(s: &[T]) -> Vec where T: Clone { let mut vector = Vec::with_capacity(s.len()); vector.extend_from_slice(s); vector } }}}

If T is a String or Vec<u8>, it won't be copied, because the From<String> for Vec<u8> impl is basically a no-op move. But if it's a borrow (e.g. &str or &[u8] or something), it will be copied into a new allocation. It has to be -- you can't guarantee that borrow will live forever.

Aha, I missed that part, thanks! So it'll be copied even if the originally borrowed thing is &'static?

Aha, I missed that part, thanks! So it'll be copied even if the originally borrowed thing is &'static?

Currently, yes. In the future there may be an optimization that allows &'static strings to be treated as Strings, though that may not affect this because it gets converted to a Vec anyway.

Trac:
Username: manish.earth

Yesterday I tried to find a way to get the name of the current function we're being called from, so that we wouldn't have to manually type it out everytime we log something. It seems like there was a proposed RFC to make this info available, but it unfortunately stalled out. One of the major logger implementation in Rust, slog, also doesn't/can't log function names currently but instead stubs it out with function: "" and logs lines/columns/filenames instead. We could maybe do something like what this nameof crate is doing, but then we'd have to find some way to call the macro outside of the function that is calling the logger somehow, and I still don't really see a way to make it automatic. Perhaps this could be something for the Rust wishlist?

There's a couple tiny fixes for typos and to address Nick's concerns in my bug23881 branch (I also answered I think all of the review comments on oniongit). I reviewed Chelsea's code. If someone wants to review my fixups then we could maybe merge?

Trac:
Sponsor: N/A to SponsorM-can
Status: accepted to needs_review

Trac:
Keywords: N/A deleted, review-group-34 added

Replying to isis:

Yesterday I tried to find a way to get the name of the current function we're being called from, so that we wouldn't have to manually type it out everytime we log something. It seems like there was a proposed RFC to make this info available, but it unfortunately stalled out.

Hm. Yeah, if this isn't available, it isn't available. Let's put it on the wishlist (assuming we have one?)

When I try to build this, I get some wide-line warnings in torlog.h from your d560384fd948ad24625312ee07e04f0cb60ac5b5 .

Also, I think that d560384fd948ad24625312ee07e04f0cb60ac5b5 is breaking the compilation, since I see a bunch of these linker warnings:

src/common/libor-event.a(procmon.o):/home/nickm/src/tor/src/common/torlog.h:146: multiple definition of `LOG_WARN_'
src/or/libtor.a(main.o):/home/nickm/src/tor/./src/common/torlog.h:146: first defined here

Maybe we should just remove d560384fd948ad24625312ee07e04f0cb60ac5b5 and merge the rest?

Trac:
Status: needs_review to needs_revision

Other than that, I'm pretty satisfied that my review issues were all addressed. If I find anything else, it can be a post-merge followup ticket.

Replying to nickm:

When I try to build this, I get some wide-line warnings in torlog.h from your d560384fd948ad24625312ee07e04f0cb60ac5b5 .

Also, I think that d560384fd948ad24625312ee07e04f0cb60ac5b5 is breaking the compilation, since I see a bunch of these linker warnings: {{{ src/common/libor-event.a(procmon.o):/home/nickm/src/tor/src/common/torlog.h:146: multiple definition of `LOG_WARN_' src/or/libtor.a(main.o):/home/nickm/src/tor/./src/common/torlog.h:146: first defined here }}}

Maybe we should just remove d560384fd948ad24625312ee07e04f0cb60ac5b5 and merge the rest?

Sounds good to me. I've made bug23881_r1 which is exactly the same branch, just without d560384f.

Great! Merged that one to master, with minor conflict resolution, in d8893bc93c52e1edd0a76b81f5d65bb10d790474

Trac:
Status: needs_revision to closed
Resolution: N/A to implemented

Clang builds are failing on Travis because of -Wmissing-variable-definitions:

src/common/log.c:57:11: error: no previous extern declaration for non-static
      variable 'LOG_WARN_' [-Werror,-Wmissing-variable-declarations]
const int LOG_WARN_ = LOG_WARN;

Trac:
Resolution: implemented to N/A
Status: closed to reopened

I think that my branch move_torlog_rust_declarations shoudl fix this, if I have it diagnosed properly. Currently trying it with Travis on github.

Trac:
Status: reopened to needs_review

It seems to fix the problem, so I'm merging it to master.

Trac:
Status: needs_review to closed
Resolution: N/A to fixed

Cool, thanks Nick!

closed

changed time estimate to 24h

mentioned in issue #24659 (moved)

mentioned in issue #25560 (moved)

moved to tpo/core/tor#23881 (closed)

mentioned in issue tpo/core/tor#25560 (closed)

Implement a way to utilise tor's logging system from Rust code

Child items 0

Activity