Currently, when using a torrc.d directory, for example:
%include /etc/torrc.d/
Every file in the directory will be treated and parsed as a valid Tor
configuration file. However, sometime, this may not be what users and
developers want.
For example, users may use /etc/torrc.d/50_user.torrc as the place to
put their own torrc configurations. But sometimes, when they use a
text editor to edit it, the text editor will leave a
/etc/torrc.d/50_user.torrc~ file which will also be treated as a valid
torrc file.
Another example that also happens very frequently is, when dpkg does
an update on /etc/torrc.d/30_distribution.torrc, users' previous
configuration can be saved as
/etc/torrc.d/30_distribution.torrc.dpkg-old which will also be parsed
by Tor.
In best case users will just be frustrated because Tor does not work
as expected and in worst case this could be dangerous. This could be a
severe problem especially because of the following reasons:
filename.torrc~ filename.torrc.dpkg-old has higher priority than
filename.torrc when Tor does the parsing.
In most cases, this will happen without being noticed by the normal
suer.
teor suggested on the tor-dev@:
To be more precise, most tools accept files ending in ".conf".
We might want tor to accept ".conf" for consistency.
I suggest we also accept files called "torrc", or ending in ".torrc".
This should probably also include files called literally ".torrc".
Trac: Description: Currently, when using a torrc.d directory, for example:
%include /etc/torrc.d/
Every file in the directory will be treated and parsed as a valid Tor
configuration file. However, sometime, this may not be what users and
developers want.
For example, users may use /etc/torrc.d/50_user.torrc as the place to
put their own torrc configurations. But sometimes, when they use a
text editor to edit it, the text editor will leave a
/etc/torrc.d/50_user.torrc~ file which will also be treated as a valid
torrc file.
Another example that also happens very frequently is, when dpkg does
an update on /etc/torrc.d/30_distribution.torrc, users' previous
configuration can be saved as
/etc/torrc.d/30_distribution.torrc.dpkg-old which will also be parsed
by Tor.
In best case users will just be frustrated because Tor does not work
as expected and in worst case this could be dangerous. This could be a
severe problem especially because of the following reasons:
filename.torrc~ filename.torrc.dpkg-old has higher priority than
filename.torrc when Tor does the parsing.
In most cases, this will happen without being noticed by the normal
suer.
Therefore, we should let Tor parse only the files whose names
end with .torrc which "is standard behaviour among many tools".
to
Currently, when using a torrc.d directory, for example:
%include /etc/torrc.d/
Every file in the directory will be treated and parsed as a valid Tor
configuration file. However, sometime, this may not be what users and
developers want.
For example, users may use /etc/torrc.d/50_user.torrc as the place to
put their own torrc configurations. But sometimes, when they use a
text editor to edit it, the text editor will leave a
/etc/torrc.d/50_user.torrc~ file which will also be treated as a valid
torrc file.
Another example that also happens very frequently is, when dpkg does
an update on /etc/torrc.d/30_distribution.torrc, users' previous
configuration can be saved as
/etc/torrc.d/30_distribution.torrc.dpkg-old which will also be parsed
by Tor.
In best case users will just be frustrated because Tor does not work
as expected and in worst case this could be dangerous. This could be a
severe problem especially because of the following reasons:
filename.torrc~ filename.torrc.dpkg-old has higher priority than
filename.torrc when Tor does the parsing.
In most cases, this will happen without being noticed by the normal
suer.
teor suggested on the tor-dev@:
To be more precise, most tools accept files ending in ".conf".
We might want tor to accept ".conf" for consistency.
I suggest we also accept files called "torrc", or ending in ".torrc".
This should probably also include files called literally ".torrc".
This is a good idea, and I wish we had done it in the first version of the %include feature. One thing I'm concerned about: How do we handle the migration here? Anybody who currently has a working setup will have it fail if we start requiring a suffix that they didn't know to provide, and I really don't like breaking compatibility.
I have been thinking about this for a while. I still find supporting a very flexible extension name may cause incompatibility between different distros and confusion to users and developers.
How do you like the idea to add another flag named something like StrictParsing which will let Tor only parse .config files when it is set to 1?
No, I really don't think so. This is much closer to a new feature than it is to a bugfix. Moreover, it promises to involve stuff (pattern matching, file systems, string handling) that have a way of getting unexpected bugs the first time around.
No, I really don't think so. This is much closer to a new feature than it is to a bugfix. Moreover, it promises to involve stuff (pattern matching, file systems, string handling) that have a way of getting unexpected bugs the first time around.
Thank you so much for your explanation, nickm! Your immediate reply is really helpful for the downstream to make decisions on how to deal with it.
I agree that supporting wildcards in %includes would be a nice feature. I remember looking for code for processing wildcards in files in tor and not finding any so this might take some effort to develop.
If someone wants to take on this, the relevant changes are on ./src/or/config.c
In this file, there is a method config_process_include which is called for every %include found and receives the string after %include as first argument.
This method then calls config_get_file_list which is responsible for returning a list of files to be processed for that %include line. Note that config_get_file_list does not care about recursive includes, it just returns a list of files in a folder if the included path was a folder or a list with a single file if it was a file.
So, the only existing method that needs changes to support this is config_get_file_list.
3ca5f8173298a4fa3bda6c5c3c3da528fc7cf85a: Looks fine to me.
edd70916b66fca41fc8aab10893ba731c1b75840:
Would it make sense to ensure that the tor_glob() function works the same way on both Windows and POSIX when glob'ing on directory and file? A part of the glob test case is split between Windows and POSIX there.
5aec94bc3bb6623bd8f62aedeca8aaf699a0e8b6:
In config_get_file_list(): Why must the matches be sorted using smartlist_sort_strings()? Shouldn't that happen in the glob function then?
I think config_get_glob_opened_files() might benefit from having some of this logic split into some more functions.
15de04088b40efc85dc32f26fe7ec4a1abebb5d7: Looks fine. Good with some documentation.
Generally it looks good. It is very nice with a lot of tests for this code. Only concern for me is that there is some fairly big functions in this that I think could leverage from being split into smaller functions where possible.
I was finally able to find time to get back to this.
In the meantime, there was a big refactoring of the tor code base so I've created a new branch based on the code after the refactoring.
This means that the commit hashes on previous comments are now useless but I've added the equivalents on the new branch to the comments.
For my first 4 commits the code is mostly the same as before just in different files. The only change I recall is removing PATH_MAX from tor_glob and config tests due to #26873 (moved). There are also new commits that either address previous mentioned issues or other issues I've found while testing.
Would it make sense to ensure that the tor_glob() function works the same way on both Windows and POSIX when glob'ing on directory and file? A part of the glob test case is split between Windows and POSIX there.
This has been improved: now wildcards are supported on all path components on Windows (commit 8bf917c4d8369bde9be53a6746c532d9932b05f3) and the behavior of the path separator at the end of the path is the same as on POSIX (commit ba8f29f24a353892a1e9dfab397ce543a6d192d3).
In config_get_file_list(): Why must the matches be sorted using smartlist_sort_strings()? Shouldn't that happen in the glob function then?
I followed the approach of the existing tor_listdir function which also leaves sorting as a responsibility of the caller.
I do not know if there was a reason for this, but I followed it for consistency.
2. I think `config_get_glob_opened_files()` might benefit from having some of this logic split into some more functions.
This has been addressed in commit ba8f29f24a353892a1e9dfab397ce543a6d192d3: as the Windows version of tor_gob ended up sharing much of the code with get_glob_opened_files (due to both needing to expand path components with globs manually), all the common code has been split into smaller functions.
Also, I noticed that #27186 (moved) is addressed by my first commit (61ee3726caad0aa8399ad07af1fe6eb6aa6aecb2).
In the future, please submit patches as pull requests via Github's interface by adding a PR to the torproject/tor repository. This allows us to see if your code passes all test via Travis.
Once the errors have been fixed, please open a new PR and I will close the one I opened to do a Travis test run.
Also, I've rebased the branch to the current master and reverted commit ba3c785092fce23ce9ec52c89ed69b2c0f0f92f6 which fixed #27186 (moved) because I've already implemented the same feature on my branch with additional information: my version shows both the pattern on %include and the files that will be included when expanding it.
I've fixed the mentioned issues. Additionally, I've rebased to the current master and added support for bool type on the checkSpace.pl script because it started complaining when I had functions returning bool.
Looks good. Only a small correction needed. Please change the changes file's tag (between ( and )) to be 'configuration'. Once that is done I think we'll move this to merge_ready and then Nick can have a look at it too before it goes in.
Not only that tor_glob() function is kind of insane but also it seems most of it was copied from tor_listdir. I would really like us to avoid code duplication from that madness.
That function should be simplified to have a tor_listdir() per platform instead of a mayhem of #ifdef. And then extract the common part that we can use within tor_glob() in smaller functions that lib/path/ can all use.
Sorry to put that back in needs_revision but we just can't pile up technical debt like that.
I think it might be easier to review this pull request code by looking at the diff for the whole pull request (the files changed tab in github). I've avoided squashing the first 4 commits because they were already reviewed in https://trac.torproject.org/projects/tor/ticket/25140#comment:19 but I now wish I hadn't as the pull request would be easier to read if they had been squashed.
I've just been over this again, and I think it's starting to look good. I'd like to get one more pair of eyes on this because it's a slightly big change.
It seems like the documentation and tests are incomplete.
I'd like to see documentation for:
escaping wildcards (for non-Windows)
globs are only allowed at the end of paths (for Windows)
I'd like to see more tests for these cases, and for the dot-file case, where the documentation does not match the code.
I don't know what will happen when this code interacts with the sandbox.
I also didn't review memory management in detail.
Please make sure it has been tested with "Sandbox 1" and ASan. It might be useful to have an integration test script in src/test that launches tor with some example configs. The existing sh and py files in src/test are good examples to follow.
I've written a few commits to address your concerns, check the pull request comments for details.
I've tested with Sandbox enabled and it should work. The only restriction is that reloading the tor config after adding new files or folders to any of the folders where tor is looking for config files will not work. This restriction has been added to the documentation.
I also used valgrind during development and testing. Hopefully that has caught any memory issues.
It might be useful to have an integration test script in src/test that launches tor with some example configs. The existing sh and py files in src/test are good examples to follow.
Did you write an integration test script?
These kinds of scripts are useful for testing the external config interface of Tor, to make sure we haven't broken it. (Unit tests only test so much.)
It tests using wildcards with %include and the controls commands to reload torrc, SAVECONF and getinfo config-can-saveconf.
I also wanted to use the same test to test reloading the configuration file with the seccomp sanbox enabled but I discovered that doing so crashes tor on Travis CI. The problem is unrelated to this code as I've tested and it happens when %including a folder, which uses the tor_listdir function that was refactored but functionally unchanged (it will use the same syscalls and the same arguments as before). This issue is likely related to #27315 (moved).
I will look into this problem further but it is going to be difficult to solve as I am still not able to reproduce the issue on my system.
I was finally able to run Travis CI on my system and reproduce the problem.
The crash when reloading the configuration file with seccomp sandbox enabled was actually a bug that only happens with libc < 2.26. libc versions before 2.26 did not use openat to open files but used it on readdir. The previous seccomp rules for %included files and directories allowed them to be opened, which added either the rule for openat or for open, depending on your libc version. Because the initial %include sandbox rules were only tested on libc >= 2.26, where everything used openat, this was fine. When testing on libc < 2.26, only open (not openat) was allowed for the %included files and directories. Because readdir, which will be called for any %included directories, used openat, it caused tor to crash.
The fix was always adding the openat rule for %included directories regardless of the libc version in use.
Additionally, another problem was found and fixed: when tor is compiled with NSS, it will call getpeername and socket with some arguments that were not allowed by the sandbox rules. This caused tor to crash on startup. I've added the required rules and tor no longer crashes.
Also, I have mixed feelings on adding an automatic test for the seccomp sandbox. While it is great that this functionally can now be tested automatically (and it caught two bugs already), the seccomp sandbox is, by its nature, very fragile and this test will easily break when people start running on systems with different configurations. What do you think? Should this test configuration enable the seccomp sandbox?
My apologies for the delay with this one. I still have no idea why Trac ate my comment after our meeting. The comment I wrote back then is the following according to my notes:
We are concerned right now about the complexity of these patches, particularly around the glob() implementation, but also the size of the changes to the conffile.c file.
Nick and I discussed the patches and came up with the following rather radical suggestions to make the code simpler:
Switch to use fnmatch() on the platforms where that function is available.
This would allow you to implement more or less the same functionality in a simpler way using fnmatch() and listdir() while simplifying the code rather a lot and reducing the overall size of the patches.
For platforms that DOES NOT support fnmatch(), we bundle an implementation from another project (with a similar license as Tor) in the `src/ext/'. This could for example be this one from Git (which they took from the OpenBSD project): https://github.com/libgit2/libgit2/blob/master/src/fnmatch.c
You can see how we do this with other functions by looking at the other code imports found in the src/ext directory tree.
This would make the code more edible for us and overall much less complex than before. fnmatch() and listdir() would not support the same amount of features as a real glob() implementation would, but do we really need those features?