See bug22752_mitigation_031 for a mitigation and diagnostic branch. It doesn't fix this, but it makes it nonfatal and tries to get more useful info if it happens again.
Unable to unlink ".\\Data\\Tor\\LocalState\\diff-cache/1001" while removing file: Permission deniedtor_bug_occurred_(): Bug: consdiffmgr.c:1289: store_multiple: Non-fatal assertion !(ent == NULL) failed. (on Tor 0.3.1.4-alpha fab91a290ded3e74)Bug: Non-fatal assertion !(ent == NULL) failed in store_multiple at consdiffmgr.c:1289. (Stack trace not available) (on Tor 0.3.1.4-alpha fab91a290ded3e74)tor_bug_occurred_(): Bug: consdiffmgr.c:328: cdm_diff_ht_purge: Non-fatal assertion !((*diff)->entry == NULL) failed. (on Tor 0.3.1.4-alpha fab91a290ded3e74)Bug: Non-fatal assertion !((*diff)->entry == NULL) failed in cdm_diff_ht_purge at consdiffmgr.c:328. (Stack trace not available) (on Tor 0.3.1.4-alpha fab91a290ded3e74)eventdns: All nameservers have failed
This bug can be reproduced with two virtual machines:
Windows 7, Tor 0.3.1.5-alpha relay.
Ubuntu 16.04, chutney "basic" network.
It takes about 10 minutes until cache fills up with 256 entries and assertions start appearing in log file.
But there is one problem with this test:
In about a minute, Windows 7-based relay start to use 100% of CPU resources.
(Looks like this overload is made by a lot of tor_cond_wait() calls)
So it needs to be limited for one core usage, or your system will be lagging.
I think this bug might be caused by the fact that (I think!) on windows, you can't unlink a file that's in use. But our code tries to unlink these files while they are still mapped.
I think this bug might be caused by the fact that (I think!) on windows, you can't unlink a file that's in use.
Yes, adding of consensus_cache_entry_unmap call hides "unlink" warnings:
unmap_hack.patch
But adds other ones, of course.
But our code tries to unlink these files while they are still mapped.
Using of deleted file is a strange thing for me.
It's pretty normal on Unix-derived systems. Files are reference-counted, and not actually deleted until nothing else is using them. The "unlink()" system call doesn't actually delete a file -- it just removes a name from from the filesystem. Only when all links and references are gone is the actual data deleted. That's why it's called "unlink()" instead of "delete()".
Concerning architecture:
Tor unlinks file just to free name or its goal is to free disk space?
What is the reason to have exactly 256 names, but gigabytes of used space?
Sadly, I don't completely understand what happening here.
File names are scarce under the linux seccomp2 sandbox code, where they need to be preallocated and reserved. Attempts to limit disk usage here are over the long term -- everything that's unlinked but kept on disk should be removed as soon as the mmap is closed, which should happen as soon as it's done spooling.
One option here is to just use a different strategy under windows, since we don't need the same filename limits. Another option is to kill off the maps when deleting on windows, since se can't keep them around.