We really need a way to use tor's logging subsystem from Rust code. I haven't ever really looked at our logging code because it always Just Works™, but it seems possible that we could construct/format Strings to log in Rust code, choose a logging level, and throw the String across the FFI boundary, have the C code log it, then have the Rust code free it?
I'm not sure what we'll do about logging in general moving forward, once (and if) more and more of tor is rewritten in Rust.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items
0
Link issues together to show that they're related.
Learn more.
I have been working on this and hopefully will have a patch within the week. One thing that has taken some time is deciding which logging function to use- on the C side, we have several macros which wrap log_fn_. We can implement similar functionality in Rust, or we can create helper functions in C- if anyone has arguments for/against either of these, let me know.
Refactor the C a bit, though, so that there is a tor_log_string() function taking a severity, domain, and message. It should work the same as tor_log(severity, domain, message);, but instead it should be safe to call it function with strings containing %. (I can write this one if you want: it will involve refactoring logv a little.
Expose that function to rust.
Write a single rust macro that takes severity, domain, message, and some format arguments, and uses the format! mechanism to generate the log message.
Optionally afterwards:
Expose a view of log_global_min_severity to rust, maybe via a function.
Update the rust macro to compare log_global_min_severity to the declared severity, and avoid formatting the message if no logs actually want it.
Refactor the C a bit, though, so that there is a tor_log_string() function taking a severity, domain, and message. It should work the same as tor_log(severity, domain, message);, but instead it should be safe to call it function with strings containing %. (I can write this one if you want: it will involve refactoring logv a little.
This sounds great- I was hesitant to add a bunch of functionality to Rust just to make it compatible with the existing C api. If you have time to do this, that would be great! Otherwise I can take a stab at it this week.
Expose that function to rust.
Write a single rust macro that takes severity, domain, message, and some format arguments, and uses the format! mechanism to generate the log message.
This is approximately what I have, with the ability to optionally compile with a no-op implementation (for running tests at the rust module level).
Optionally afterwards:
Expose a view of log_global_min_severity to rust, maybe via a function.
Update the rust macro to compare log_global_min_severity to the declared severity, and avoid formatting the message if no logs actually want it.
This sounds great- we can do this either in this ticket or an enhancement.
So, I did a really simple untested implementation of the backend functions. It turned out not to actually need any refactoring, which is good, because that code is a mess. :)
It's in a branch rust_log_backend in my public repository, and it should be a good basis to build on.
This PR creates a minimal implementation for Rust modules to log using tor's existing logger.
One issue that I ran into is that log levels/domains are defined as macros in C. I redeclared a small subset of these as constants- if this is the approach that we want to take, I can fill out the remainder of necessary constants. Another alternative is to define log level/log domain macros as constants in C.
In order to run module-level Rust tests without linking to C, it is necessary to "stub" out direct calls to C functions. This PR adds a "testing" feature in order to optionally compile no-op implementations for testing. (The approach of "write C integration tests and test Rust in isolation" is an approach taken by Servo, although it would be great to get Rust people's eyes on this to double check how I've done this).
Looking forward to feedback and suggestions how to improve!
This looks pretty good! I like how you've set up the testing code.
Some review comments
const int _LOG_WARN = LOG_WARN;
This is interesting. Usually I'd just mirror the constants and leave comments, and bindgen is useful for automirroring (but y'all aren't using bindgen and that's ok).
I guess this works. I've never seen constants mirrored this way but I do feel it is more resilient to code changing.
/// Allow for default cases in case Rust and C log types get out of sync #[allow(unreachable_patterns)] pub unsafe fn translate_severity(severity: LogSeverity) -> c_int { match severity { LogSeverity::Warn => _LOG_WARN, LogSeverity::Notice => _LOG_NOTICE, _ => _LOG_NOTICE, } }
There's no need for that unreachable_patterns. The only way stuff will go out of sync is if LogSeverity gets a variant added, however that will trigger a compiler error in this match (if you leave off the _ branch). It's nice to be able to modify enums and immediately have the compiler tell you where you need to update things
Rust lets you omit the 'static now (this is recent, so it works on the latest stable but may not work on the compiler you're using)
One concern of mine is that the tor_log_msg macro contains unsafe code, and more problematically, it contains macro variables in that unsafe code.
Now this is firmly in "why would anyone even do that" territory, but for example someone could call tor_log_msg!({some code}, ....), and that code would be expanded to be inside an unsafe block without the caller ever having to use unsafe.
You should probably have severity/domain get bound to variables outside of the unsafe block.
Ideally the entire macro body would be moved to a function, making the body just handle the format! stuff and calling the function. However, this means you can no longer do the testing trick where the test function itself returns what was printed.
Thanks for the really helpful review! All of these suggestions make sense- I wasn't sure about the best way to import dependencies for a macro in other modules.
It is interesting to hear your recommendation to use a function for the macro body. Would you recommend using the #[inline] tag for that function then?
I'll make these fixes over the weekend, thanks again for the help.
Would you recommend using the #[inline] tag for that function then?
yep.
To give an exact description of #[inline]:
For the same crate it acts as a small hint for inlining. This is also true for generics that get instantiated in a different crate.
For non-generic functions from a different crate, it is impossible for rustc to inline without this, because by the time you get to the current crate it's already been codegenned. #[inline] forces rustc to include extra metadata that lets the next crate re-codegen it if it wishes to. (It also contains the small hint for inlining)
#[inline(always)] is a strong hint, and will force inlining unless it's impossible (e.g. in the presence of a lot of recursion). #[inline(never)] does what it says on the tin.
In this case it's a macro and going to be used from a different crate, so we need#[inline] so that the other crate can inline it if the optimizer thinks it is a good idea. We don't need inline(always) -- the optimizer is smart.
See fixes from Manish's review on the same branch: https://oniongit.eu/network/tor/merge_requests/7
A couple thoughts:
Testing
I wasn't able to test the inline logging function, as I wasn't sure the best way how to. I think I could properly test this with a global variable that was only defined for unit tests (such as a Cursor, and "log" to it in the no-op implementation. However, I ran into issues due to the limitation of creating non-struct static types in stable Rust.
I received this error: destructors in statics are an unstable feature
Maybe we just need to wait for this feature to land in stable, but it would be nice to have the flexibility for functions that are inherently hard to test, without having to change the code itself.
Feature flags for cargo test
I am unable to run cargo test with a specific flag. For example, I wanted to be able to run cargo test --feature "testing", but this flag is not applied to other modules as well (i.e, running tests in protover will not result in tor_log being compiled with the "testing" flag). I'm able to run cargo test --all-features but being able to turn on features at a fine-grained level for tests would be nice.
Binding compile-time constants to log level macros
Nick- as for the declaration of log levels in src/common/torlog.h, if we can refactor these into compile-time constants (instead of macros) that would remove the need to declare _constant for each. Let me know what you think, I can help do this refactor.
For example, instead of binding _LD_NET to LD_NET, we could declare LD_NET as a constant directly.
However, I ran into issues due to the limitation of creating non-struct static types in stable Rust.
The lazy_static crate helps here, static mut is something you rarely use.
I don't think we should import this crate just for this test case- I did see that come up as an option though.
I am unable to run cargo test with a specific flag. For example, I wanted to be able to run cargo test --feature "testing",
You'll want to make the feature depend on features from other crates.
Thanks! I didn't see that that was a possibility, that is super helpful, I added this in the last commit in this branch. We can now run tests via cargo test --features "testing" where the code has a dependency to C with a no-op (such as in the protover module), and still unit test the Rust code. I'm pretty happy about that.
I don't think we should import this crate just for this test case- I did see that come up as an option though.
you can use a dev-dependency so it only gets used in testing
Alternatively, have a static FOO: *const YourType = 0 as *const _; and allocate it at the beginning of the test program. That's basically what lazy_static does, except lazily.
Alternatively, have a static FOO: *const YourType = 0 as *const _; and allocate it at the beginning of the test program. That's basically what lazy_static does, except lazily.
Good suggestion- I took a stab at implementing something similar to what lazy_static does (declare a static mutable pointer and allocate data on the heap during assignment). As this code is purely for testing, I don't have as many safety checks, but open to feedback.
See commit 4cdf0c8899796582b058d289bcd21b95dd1bd452.
It would also be great to hear recommendations (Nick might be the best person) on whether refactoring the C side should be in scope for this ticket. See this comment regarding log level macros https://trac.torproject.org/projects/tor/ticket/23881#comment:14
I added some comments to the merge request, but then apparently oniongit.eu decided to start giving me 500 errors.
Here are the other questionsissues that I didn't enter into oniongit -- I hope that it comes back up again soon, and we can find the other questions/issues too.
I'd recommend a rename on rust_welcome_string() to rust_log_welcome_string(): by convention, a function whose primary purpose is to do something should have a verb name.
Will the translate_* functions get inlined and compiled down to constants? Logging is pretty critical-path.
The C log functions exit fast, without formatting the message, if the log message isn't "interesting" because its severity is too low. I think that if possible the Rust log macro should do the same, and avoid formatting messages when the function log_message_is_interesting() returns false. That could be a separate commit, though.
Newbie questions from oniongit that I can remember:
Does CString::new() do a complete copy on the message?
What part of the Rust code removes the indentation on the log strings?
Change requests I can remember from oniongit:
The tor_log_msg!() macro documentation should document its support for format! syntax.
In C, all identifiers that start with an underscore are reserved: we can create constants like LOG_WARN_, but using _LOG_WARN is risky and we try not to do that.
[non sequitur] Wow! Today I learned, after like eight years of using Trac, from Manish, that you can do {{{#!rust and get syntax highlighting on Trac. I literally had no idea. Amazing.
I had a call with Chelsea last weekend, and we went over this patch. Chelsea has been super busy, so I offered to make whatever little tweaks are necessary to address the comments on this ticket.
Trac: Status: needs_revision to accepted Owner: chelseakomlo to isis
Okay, to answer the question of "Does CString::new perform a copy?"
Manish answered above, but I think there was a typo in the answer? (It seems to be missing the word "not" in the latter clause?) I had assumed it always copied, so I went and dug through the stdlib to figure it out.
[wow!!! check out that ↑↑↑↑↑↑ frickin syntax highlighting!!!!! zomg, my life is changed!!]
it takes a any T which has implemented a way (Into<T>) to transmute it into a Vec<u8> and calls Cstring::_new() on it. There existsimpl<'a> From<&'a str> for Vec<u8>:
which is doing some olympic level gymnastics where std::ptr::read is returning a Rawvec, then it mem::forget()s itself so that the deallocator never runs, and finally it Boxes the memory it forgot about by calling RawVec::into_box():
impl<T> RawVec<T, Heap> { /// Converts the entire buffer into `Box<[T]>`. /// /// While it is not *strictly* Undefined Behavior to call /// this procedure while some of the RawVec is uninitialized, /// it certainly makes it trivial to trigger it. /// /// Note that this will correctly reconstitute any `cap` changes /// that may have been performed. (see description of type for details) pub unsafe fn into_box(self) -> Box<[T]> { // NOTE: not calling `cap()` here, actually using the real `cap` field! let slice = slice::from_raw_parts_mut(self.ptr(), self.cap); let output: Box<[T]> = Box::from_raw(slice); mem::forget(self); output }}
which is doing even more olympic level gymnastics with slice::from_raw_parts, which is like the King of Capital-U Unsafety, because it just says "N bytes from this pointer forward are now a slice, with an lifetime that I decided by royal proclamation, and which is not necessarily the correct lifetime given that I was handed a raw pointer that I know nothing about". But it doesn't copy. Then it does a similar pirouette as the trick we saw before with std::ptr::read and then Boxing the RawVec, but even more complicated and dangerous, like if the last one was a pirouette, this is like a pirouette while juggling machetes. Box::from_raw is only really supposed to be called on pointers produced via Box::into_raw, but whatever, so many crazy things have already been done with these pointers to this memory. So Box::from_raw creates a Unique<T>, which just says, "I own this raw pointer to a T (*mut T), and I'm just going to pretend it is a T." and then calls Box::from_unique on the resulting Unique<T>:
which, as you can see, finishes its machete-juggling pirouette off with a triple axel flamethrower party by calling mem::transmute, re-interpretting the raw bits of the wrapped raw pointer as a Box. So much Unsafe! Seemingly no additional allocations. :)
pub fn to_vec<T>(s: &[T]) -> Vec<T> where T: Clone { let mut vector = Vec::with_capacity(s.len()); vector.extend_from_slice(s); vector}
If T is a String or Vec<u8>, it won't be copied, because the From<String> for Vec<u8> impl is basically a no-op move. But if it's a borrow (e.g. &str or &[u8] or something), it will be copied into a new allocation. It has to be -- you can't guarantee that borrow will live forever.
{{{#!rust
pub fn to_vec(s: &[T]) -> Vec
where T: Clone
{
let mut vector = Vec::with_capacity(s.len());
vector.extend_from_slice(s);
vector
}
}}}
If T is a String or Vec<u8>, it won't be copied, because the From<String> for Vec<u8> impl is basically a no-op move. But if it's a borrow (e.g. &str or &[u8] or something), it will be copied into a new allocation. It has to be -- you can't guarantee that borrow will live forever.
Aha, I missed that part, thanks! So it'll be copied even if the originally borrowed thing is &'static?
Aha, I missed that part, thanks! So it'll be copied even if the originally borrowed thing is &'static?
Currently, yes. In the future there may be an optimization that allows &'static strings to be treated as Strings, though that may not affect this because it gets converted to a Vec anyway.
Yesterday I tried to find a way to get the name of the current function we're being called from, so that we wouldn't have to manually type it out everytime we log something. It seems like there was a proposed RFC to make this info available, but it unfortunately stalled out. One of the major logger implementation in Rust, slog, also doesn't/can't log function names currently but instead stubs it out with function: "" and logs lines/columns/filenames instead. We could maybe do something like what this nameofcrate is doing, but then we'd have to find some way to call the macro outside of the function that is calling the logger somehow, and I still don't really see a way to make it automatic. Perhaps this could be something for the Rust wishlist?
There's a couple tiny fixes for typos and to address Nick's concerns in my bug23881branch (I also answered I think all of the review comments on oniongit). I reviewed Chelsea's code. If someone wants to review my fixups then we could maybe merge?
Trac: Sponsor: N/Ato SponsorM-can Status: accepted to needs_review
Yesterday I tried to find a way to get the name of the current function we're being called from, so that we wouldn't have to manually type it out everytime we log something. It seems like there was a proposed RFC to make this info available, but it unfortunately stalled out.
Hm. Yeah, if this isn't available, it isn't available. Let's put it on the wishlist (assuming we have one?)
When I try to build this, I get some wide-line warnings in torlog.h from your d560384fd948ad24625312ee07e04f0cb60ac5b5 .
Also, I think that d560384fd948ad24625312ee07e04f0cb60ac5b5 is breaking the compilation, since I see a bunch of these linker warnings:
src/common/libor-event.a(procmon.o):/home/nickm/src/tor/src/common/torlog.h:146: multiple definition of `LOG_WARN_'src/or/libtor.a(main.o):/home/nickm/src/tor/./src/common/torlog.h:146: first defined here
Maybe we should just remove d560384fd948ad24625312ee07e04f0cb60ac5b5 and merge the rest?
When I try to build this, I get some wide-line warnings in torlog.h from your d560384fd948ad24625312ee07e04f0cb60ac5b5 .
Also, I think that d560384fd948ad24625312ee07e04f0cb60ac5b5 is breaking the compilation, since I see a bunch of these linker warnings:
{{{
src/common/libor-event.a(procmon.o):/home/nickm/src/tor/src/common/torlog.h:146: multiple definition of `LOG_WARN_'
src/or/libtor.a(main.o):/home/nickm/src/tor/./src/common/torlog.h:146: first defined here
}}}
Maybe we should just remove d560384fd948ad24625312ee07e04f0cb60ac5b5 and merge the rest?
Sounds good to me. I've made bug23881_r1 which is exactly the same branch, just without d560384f.
Clang builds are failing on Travis because of -Wmissing-variable-definitions:
src/common/log.c:57:11: error: no previous extern declaration for non-static variable 'LOG_WARN_' [-Werror,-Wmissing-variable-declarations]const int LOG_WARN_ = LOG_WARN;
Trac: Resolution: implemented toN/A Status: closed to reopened