Opened 10 months ago

Last modified 2 days ago

#21366 reopened enhancement

Support whitespace in search term (as does Onionoo)

Reported by: cypherpunks Owned by: karsten
Priority: Medium Milestone: Onionoo-1.7.0
Component: Metrics/Onionoo Version:
Severity: Normal Keywords: metrics-2018
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Child Tickets

Change History (33)

comment:1 in reply to:  description Changed 10 months ago by cypherpunks

Replying to cypherpunks:

does not work:
https://atlas.torproject.org/#search/contact:Neel%20Chauhan

Atlas uses https://onionoo.torproject.org/summary?search=contact:Neel%20Chauhan with that query which does result in two empty arrays.

comment:2 Changed 10 months ago by cypherpunks

So this is an onionoo bug?

Should semantics for
https://onionoo.torproject.org/summary?search=contact:Neel%20Chauhan
and
https://onionoo.torproject.org/summary?contact=Neel%20Chauhan
match? (result in the same search result)

comment:3 Changed 10 months ago by cypherpunks

Component: Metrics/AtlasMetrics/Onionoo
Owner: changed from irl to metrics-team

comment:4 Changed 10 months ago by karsten

Well, I wouldn't say that this is an Onionoo bug. Onionoo interprets spaces in the search parameter as separators between search terms, and search terms can optionally be prefixed with a qualifier.

In the first example above, search=contact:Neel%20Chauhan is interpreted as "contains Neel in the contact line and contains Chauhan in the usual fields we search for".

You could approximate the second example by using search=contact:Neel%20contact:Chauhan, but that will also return all relays that have those two strings somewhere in the contact line, rather than just Neel Chauhan.

What we could do is support quotes in qualified search terms to include spaces. That would be search=contact:"Neel%20Chauhan". But how intuitive is that? And would it produce new problems that I don't think of right now? Or are there any alternatives that are more intuitive? Hmmmmm.

comment:5 Changed 10 months ago by cypherpunks

Component: Metrics/OnionooMetrics/Atlas
Owner: changed from metrics-team to irl

So then the easy fix is to tell atlas to use

​​https://onionoo.torproject.org/summary?contact=...
instead of
https://onionoo.torproject.org/summary?search=contact:...

when the search term starts with "contact:"?

comment:6 Changed 10 months ago by karsten

Well, no. What if a user wants to search by contact and by another search part, say, nickname? Or by contact and another qualified search term like AS number? We cannot merge everything following the first qualifier into a single qualified search term. And we can also not define the "contact:" qualifier to come last, because what if we add another qualified search term in the future that permits spaces? No, this won't work.

comment:7 Changed 10 months ago by cypherpunks

Oh this is bad. So atlas will never support whitespace?

what if we introduce a special new atlas-level qualifier in which the users says "I want to search for contact, yes contact only"

lets say "contactonly:foo bar" and that gets mapped to
https://onionoo.torproject.org/summary?contact=...

What do you think?
(btw: thanks for the quick feedback so far!)

comment:8 Changed 10 months ago by karsten

Well, the idea of using (double) quotes for qualified search terms containing spaces would work. I'm just not sure how intuitive that would be. And somebody would have to build it. :)

Another option would be that Atlas provides an extended search of some kind where it has inputs for other parameters than Onionoo's search parameter. For example, there could be a contact input field, and Atlas would simply pass anything in that field to Onionoo's contact parameter, including contained spaces. That would still leave qualified search term as shortcut for pro users with almost the same functionality (except for this case and maybe a few others). But the more advanced users would go to that extended search and see what they can search for, rather than having to remember what qualified search terms exist. (Somebody would have to build this as well.)

comment:9 in reply to:  8 Changed 10 months ago by cypherpunks

Replying to karsten:

Well, the idea of using (double) quotes for qualified search terms containing spaces would work. I'm just not sure how intuitive that would be. And somebody would have to build it. :)

I assume this would require an onionoo change (I guess this is less likely to be implemented).

Another option would be that Atlas provides an extended search of some kind where it has inputs for other parameters than Onionoo's search parameter. For example, there could be a contact input field, and Atlas would simply pass anything in that field to Onionoo's contact parameter

ok, so the option with the least amount of effort would be the contactonly:...
option to tell atlas to ask
https://onionoo.torproject.org/summary?contact=...
instead of
​​https://onionoo.torproject.org/summary?search=contact:...

since it does not require any atlas UI changes.

I have no opinion on how this is actually done as long as the use case is possible.

Either way the implementer decides - as usual, I hope there will be one.

It is somehow sad that you can use atlas to make a powerful search over many fields but you can not make a simple search using a single field if your searchterm contains a whitespace.

comment:10 Changed 10 months ago by karsten

The contactonly: suggestion is a hack. We shouldn't go that route.

The stop-gap solution would be that you prefix each contact part with contact:, as in: https://atlas.torproject.org/#search/contact:Neel%20contact:Chauhan.

One possible real solution would be to extend Onionoo to accept quoted strings. Needs more discussion and somebody to write it.

comment:11 in reply to:  10 Changed 10 months ago by cypherpunks

Great to see that others (teor) also have a use case here.

Replying to karsten:

The contactonly: suggestion is a hack. We shouldn't go that route.

I agree that would be a hack, but a hack is still better than no solution at all - for me.
(I'm not in favor of that hack if there is someone implementing a proper solution.)

If anyone implements something proper, let me add something that I didn't mention until know because it reduces the likelihood of any solution at all.

I'd actually like to search for perfect matches only.

Search term:
"Neel Chauhan"

should not match on

Chauhan Neel
or
Neel Chauhan 123

comment:12 Changed 10 months ago by teor

In #21373, I had no idea that atlas had qualifiers, or I had forgotten.

So my use case would be resolved by making "contact" part of "the usual fields we search for".
Or I can work around it by searching once for the name I remember, then again using "contact:".

(I really don't care much about names with spaces, or getting the query exactly right: I am quite capable of refining my search using an appropriately unique string. Is there a specific use case that requires a quoted string? Perhaps a programmatic search by contact in an application?)

comment:13 in reply to:  12 Changed 10 months ago by cypherpunks

Replying to teor:

Is there a specific use case that requires a quoted string? Perhaps a programmatic search by contact in an application?

The use case is: creating atlas URLs that use contact as identifier,
to find list all relays with a given contact.(described #21368).

I would use that functionality to make the contact column in this table a URL:
https://raw.githubusercontent.com/ornetstats/stats/master/o/potentially_dangerous_relaygroups.txt

comment:14 Changed 7 months ago by nusenu

https://lists.torproject.org/pipermail/metrics-team/2017-April/000323.html:

(trying to find a stop-gap solution for
https://trac.torproject.org/projects/tor/ticket/21366)

from onionoo.tpo:

search

Return only (1) relays with the parameter value matching (part of a)
nickname, (possibly $-prefixed) [...]
If multiple search terms are given, separated by spaces, the
intersection of all relays and bridges matching all search terms
will be returned.

Karsten wrote
(https://trac.torproject.org/projects/tor/ticket/21366#comment:4)

You could approximate the second example by using
search=contact:Neel%20contact:Chauhan, but that will also return all
relays that have those two strings somewhere in the contact line,
rather than just Neel Chauhan.

So I would assume that
https://atlas.torproject.org/#search/contact:Neel%20contact:Chauhan
(backend:
https://onionoo.torproject.org/summary?search=contact:Neel%20contact:Chauhan)

should only return relays where the contact field contains "Neel"
and "Chauhan" but it also returns relays that have only "Neel" (and
no "Chauhan"), so I would deduce that search terms are OR connected.

example contact result:
"<neel AT rdkr DOT uk> 0xBBC1514B34CFB0F10231280F2FC36F0EF7887127"

If search terms are OR connected (not the "intersection")
then I would simply list all the fingerprints to get a list of all
relevant relays, but that does not work either (no results)
example (2 fingerprints separated by a single space):
https://atlas.torproject.org/#search/D5B8C38539C509380767D4DE20DE84CF84EE8299%201602E42D1DE3C7B3EF042F357F906DE55FA6C7C6
Also tried:
"lookup:D5B8C38539C509380767D4DE20DE84CF84EE8299%20lookup:1602E42D1DE3C7B3EF042F357F906DE55FA6C7C6"

In this search, the search terms are AND connected:

https://onionoo.torproject.org/summary?search=contact:Neel%20D46175487C3

So I'm not sure

  • if the current behavior works as documented and intended
  • How to get a stop-gap solution without false-positives/false-negative

search results

Is this a bug or am I misunderstanding something?
(or does the AND/OR mode depend on whether search qualifiers are used?)


https://lists.torproject.org/pipermail/metrics-team/2017-April/000324.html:
Hi nusenu,

(trying to find a stop-gap solution for
https://trac.torproject.org/projects/tor/ticket/21366)

We should probably discuss this on the ticket, not here. Quick response
below though.

from onionoo.tpo:

search

Return only (1) relays with the parameter value matching (part of a)
nickname, (possibly $-prefixed) [...]
If multiple search terms are given, separated by spaces, the
intersection of all relays and bridges matching all search terms
will be returned.

Karsten wrote
(https://trac.torproject.org/projects/tor/ticket/21366#comment:4)

You could approximate the second example by using
search=contact:Neel%20contact:Chauhan, but that will also return all
relays that have those two strings somewhere in the contact line,
rather than just Neel Chauhan.

So I would assume that
https://atlas.torproject.org/#search/contact:Neel%20contact:Chauhan
(backend:
https://onionoo.torproject.org/summary?search=contact:Neel%20contact:Chauhan)

should only return relays where the contact field contains "Neel"
and "Chauhan" but it also returns relays that have only "Neel" (and
no "Chauhan"), so I would deduce that search terms are OR connected.

example contact result:
"<neel AT rdkr DOT uk> 0xBBC1514B34CFB0F10231280F2FC36F0EF7887127"

Hmm, looks like my suggestion was misleading. The two qualified search
terms are not OR-connected, but the second search term is simply
discarded. Try swapping the two and see how that changes the result.

The spec says: "If the same parameter is specified more than once, only
the first parameter value is considered."

Now, the search term is only given once, but the qualified search terms
are treated as if the user passed values to keys matching search
qualifiers. And the second contact parameter is simply dropped.

This is expected behavior that we might be able to document better.

All the best,
Karsten

comment:15 Changed 7 months ago by nusenu

Since the stop-gap solution does not work, is there any other way how I can build a single atlas URL to find a list of relays with a given contact string (I also know their fingerprint)?

Last edited 7 months ago by nusenu (previous) (diff)

comment:16 in reply to:  15 Changed 7 months ago by cypherpunks

Type: defectenhancement

I've created #22063 so the documentation task in comment:14 isn't forgetten (although it becomes obsolete by #22064).

Replying to nusenu:

Since the stop-gap solution does not work, is there any other way how I can build a single atlas URL to find a list of relays with a given contact string (I also know their fingerprint)?

FWICT there doesn't seem to be a way to do this currently. The search parameter intersects the results from the given search terms and the qualified search terms do not permit multiple search terms separated by spaces.

I like the suggestion in comment:8 about adding an advanced search to Atlas. It would better communicate the parameters that Onionoo (and therefore Atlas) supports. I also opened #22064 to deprecate the qualified search terms in the search parameter because i believe it weakly duplicates existing functionality.

comment:17 Changed 2 months ago by karsten

Summary: support whitespace in search term (as does onionoo)Support whitespace in search term (as does Onionoo)

Capitalize summary.

comment:18 Changed 8 weeks ago by karsten

Keywords: metrics-2018 added

comment:19 Changed 8 weeks ago by karsten

Owner: changed from irl to metrics-team
Status: newassigned

comment:20 Changed 5 weeks ago by karsten

Owner: changed from metrics-team to karsten
Status: assignedaccepted

So, I just rediscovered this ticket after seeing the link from #23829. Let's try harder to resolve it, as it might help us with that other ticket.

I'm summarizing and commenting on all possible solutions to this issue in order of appearance:

  1. Allow parameters and qualified search terms to be specified more than once. As a result, search=contact:Neel%20contact:Chauhan would become a valid search for all contacts containing both name parts. This is not exactly what the ticket reporter needed to satisfy their use case. But I could see similar use cases where this would be useful. Let's move this to a new ticket if there's general agreement on whether this would be useful at all.
  1. Support quotes in qualified search terms. This was my original suggestion, and even though I raised some mild concerns about this approach being less intuitive than it could be, I think it's the best option we have. After all, we'd be adding this option for pro users who want to perform an exact match on a contact line or partial contact line that just happens to contain spaces. I wrote a surprisingly short patch for this in my task-21366 branch that still needs review. We might document this change by saying: "Qualified search terms have the form 'key:value' or 'key:"value"' (both without surrounding single quotes) with "key" being one of the parameters [...]". I'm inclined to say that this requires just a minor protocol version bump, because we're currently returning a 400 status code for this newly introduced syntax. Let me know what you think!
  1. Add an Atlas-level qualifier to search only in the contact line. I'm not a fan of this approach, because it seems too limiting. We cannot combine the search with any other parameter. And we'd have this same discussion again when adding an as_name parameter (#23713). And wouldn't it be bad to introduce another Atlas-level qualifier to search only in the AS name which couldn't be combined with other search terms like the country, whereas AS number and country could still be combined? All in all, I'd say let's drop this idea.
  1. Add an extended search form to Atlas with inputs for other parameters than Onionoo's search parameter. This is not my call. Personally, I'd rather go in the other direction and make the search parameter as powerful as the other parameters. But I can see value in providing a complete overview of possible parameters in Atlas. Maybe that's just a documentation issue. But if it helps to really provide such an extended search form to Atlas, I won't object.
  1. Extend Onionoo's search parameter to also look at contact line parts when processing unqualified search terms. That would mean not only looking at nicknames, fingerprints, IP addresses, etc., but also at contact line parts. I'm opposed to that idea. The effect would be that a simple search for a relay nickname part would suddenly produce lots of seemingly unrelated hits, because those relays have the nickname part in their contact line. Let's better not touch the Onionoo basics and drop this idea.

So, my suggestion is to create a new ticket for 1, get 2 reviewed and merged, and possibly create another ticket for 4. Thoughts?

comment:21 in reply to:  20 ; Changed 5 weeks ago by nusenu

Thank you Karsten for your comprehensive summary!

  1. Allow parameters and qualified search terms to be specified more than once. [...] Let's move this to a new ticket

I created #23913 now.

  1. [...] We might document this change by saying: "Qualified search terms have the form 'key:value' or 'key:"value"' (both without surrounding single quotes) with "key" being one of the parameters [...]". I'm inclined to say that this requires just a minor protocol version bump, because we're currently returning a 400 status code for this newly introduced syntax. Let me know what you think!

I agree that this does not need a major version bump - because it is unlikely to break existing use-cases. How should users escape the quote sign? (\")

  1. Add an extended search form to Atlas with inputs for other parameters than Onionoo's search parameter.

I made a ticket for that not all to long ago: #23782

I agree that this is a documentation issue but the search form would make the power of atlas/onionoo more accessible, visible and usable to a broader audience.

  1. Extend Onionoo's search parameter to also look at contact line parts when processing unqualified search terms. That would mean not only looking at nicknames, fingerprints, IP addresses, etc., but also at contact line parts. I'm opposed to that idea.

I'm opposing as well.

So, my suggestion is to create a new ticket for 1, get 2 reviewed and merged, and possibly create another ticket for 4. Thoughts?

Looking forward to that new feature! :)

comment:22 Changed 5 weeks ago by nusenu

Component: Metrics/AtlasMetrics/Onionoo

comment:23 in reply to:  21 Changed 5 weeks ago by karsten

Replying to nusenu:

Thank you Karsten for your comprehensive summary!

  1. Allow parameters and qualified search terms to be specified more than once. [...] Let's move this to a new ticket

I created #23913 now.

Thanks!

  1. [...] We might document this change by saying: "Qualified search terms have the form 'key:value' or 'key:"value"' (both without surrounding single quotes) with "key" being one of the parameters [...]". I'm inclined to say that this requires just a minor protocol version bump, because we're currently returning a 400 status code for this newly introduced syntax. Let me know what you think!

I agree that this does not need a major version bump - because it is unlikely to break existing use-cases. How should users escape the quote sign? (\")

Ah, good question about escaping. Yes, I think \" would be good as escaped quote sign. The current branch does not support that yet, but that shouldn't be too hard to implement. (Famous last words?...)

  1. Add an extended search form to Atlas with inputs for other parameters than Onionoo's search parameter.

I made a ticket for that not all to long ago: #23782

I agree that this is a documentation issue but the search form would make the power of atlas/onionoo more accessible, visible and usable to a broader audience.

  1. Extend Onionoo's search parameter to also look at contact line parts when processing unqualified search terms. That would mean not only looking at nicknames, fingerprints, IP addresses, etc., but also at contact line parts. I'm opposed to that idea.

I'm opposing as well.

So, my suggestion is to create a new ticket for 1, get 2 reviewed and merged, and possibly create another ticket for 4. Thoughts?

Looking forward to that new feature! :)

Great! I'll have to do other things first today, but I hope that I can write some more Onionoo code next week.

comment:24 Changed 5 weeks ago by irl

If this is implemented in Onionoo, I guess no changes in Atlas are needed as long as there's no internal escaping going on that I haven't spotted?

comment:25 in reply to:  24 Changed 5 weeks ago by karsten

Replying to irl:

If this is implemented in Onionoo, I guess no changes in Atlas are needed as long as there's no internal escaping going on that I haven't spotted?

Right, and it looks like there's no such internal escaping going on. If I enter contact:"John Doe" into Atlas' search bar, I see a request to https://onionoo.torproject.org/summary?search=contact:&quot;John Doe&quot; going out. That request is a 400 so far, but that's what we would change on Onionoo's side. So, I'd say nothing to do on the Atlas side.

comment:26 Changed 4 weeks ago by karsten

Status: acceptedneeds_review

Please review my task-21366 branch with a new commit to also support escaping double quotes inside double-quoted qualified search terms.

comment:27 Changed 3 weeks ago by iwakeh

Status: needs_reviewmerge_ready

Tests and checks pass.

I had to increase the timeout value for 'testCountryDeDe' to 200ms (was 100ms) in order to make the test pass. Maybe, we should watch out for such errors and find the reason.

(The changelog entry is missing.)

Last edited 3 weeks ago by iwakeh (previous) (diff)

comment:28 Changed 3 weeks ago by iwakeh

Milestone: Onionoo-1.7.0

Adding the next milestone as these tickets are close to completion.

comment:29 Changed 3 weeks ago by karsten

Resolution: fixed
Status: merge_readyclosed

Thanks for the review! I increased the timeout value to 200ms, which is unrelated to this change but which seemed like a plausible workaround. And I added a change log entry. Rebased and pushed to master. Closing. Thanks again!

comment:30 Changed 5 days ago by karsten

Resolution: fixed
Status: closedreopened

Re-opening, because we might have merged something that doesn't work yet. Needs closer investigation next week. Oops.

comment:31 Changed 4 days ago by karsten

Here's a possible explanation: maybe this feature worked just fine in Tomcat and doesn't work anymore after the Jetty switch. Or, maybe it has to do with Servlet API versions. I didn't start investigating yet, but these may be possible starting points.

comment:32 Changed 2 days ago by karsten

Uhm, wait, we made that switch in ExoneraTor, not Onionoo. Please ignore my previous comment.

comment:33 Changed 2 days ago by karsten

See https://trac.torproject.org/projects/tor/ticket/24311#comment:1 for a possible explanation what went wrong.

Note: See TracTickets for help on using tickets.