Opened 19 months ago

Last modified 16 months ago

#24384 needs_revision defect

Decode percent-encoded characters in qualified search terms

Reported by: karsten Owned by: metrics-team
Priority: Medium Milestone: Onionoo 2.0.0
Component: Metrics/Onionoo Version:
Severity: Normal Keywords:
Cc: irl Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

From #24311: "Second, Oniono does not decode percent-encoded characters in qualified search terms, even though it probably should do that. That means that even if Atlas sent over a query like ​https://onionoo.torproject.org/summary?search=contact:%3Cnone%3E, Onionoo wouldn't decode it. It would just return an empty result set, because there are no contact lines with %3Cnone%3E. That's different from the contact parameter which is decoded correctly."

Child Tickets

Change History (18)

comment:1 Changed 19 months ago by karsten

Priority: MediumHigh

Please review a relatively short patch in my task-24384 branch. I didn't get to write tests yet, though it shouldn't be terribly hard to write some.

comment:2 Changed 19 months ago by irl

This looks good to me.

The + issue is interesting. Were you getting + from Relay Search or are you just adding this pre-emptively?

PHP seems to encode spaces to + in urlencode() but JavaScript's encodeURIComponent() makes them into %20. Behaviour seems to annoyingly vary depending on what you're encoding with.

comment:3 Changed 19 months ago by karsten

Fine question. The main reason I put it in was for the unit tests to still succeed. In particular, the following two tests stop working without that replacement:

  • /summary?search=ACXBNsHzqe7+KuP5GPA7+iG1Bws
  • /summary?search=contact:<tor+steven.murdoch@cl.cam.ac.uk>.

I didn't check how Relay Search would encode these searches on the wire. Do you have an easy way to say that?

It seems that we'd still need it for PHP variants of Relay Search to work. Unless PHP is doing it "wrong". If we can reasonably expect + to be encoded as %20 I could imagine taking out that replaceAll() and changing unit tests accordingly. That would probably require a major protocol version bump.

comment:4 Changed 19 months ago by iwakeh

Status: newmerge_ready

URI syntax allows for different types of delimeters. For generic delimeters, e.g. / and ?, the percent encoding is only used when they appear inside data transferred in the URI.
There is another subset of delimiters (referred to as sub-delims in rfc3986), which can be used as separators in sub-components as an application defines it: For example, if the plus + was used as a delimiter between several search terms it ought to be percent encoded when being part of the data. Without any special meaning (as done in Onionoo) the plus doesn't need to be encoded and whether only the non-encoded or both are accepted is courtesy of the processing application.

With the current patch Onionoo allows for plus to be submitted as + and %B2 in searches. The following tests pass (ResourceServletTest) as well as their already existing counterparts with just +:

  @Test(timeout = 100)
  public void testSearchBase64FingerprintPlusEncoded() {
    this.assertSummaryDocument(
        "/summary?search=ACXBNsHzqe7%2BKuP5GPA7+iG1Bws", 1,
        new String[] { "TimMayTribute" }, 0, null);
  }

  @Test(timeout = 100)
  public void testSearchEmailAddressEncodedPlus() {
    this.assertSummaryDocument(
        "/summary?search=contact:<tor%2Bsteven.murdoch@cl.cam.ac.uk>", 1,
        new String[] { "TimMayTribute" }, 0, null);
  }

Regarding the space character:

Replacing it by + is done for form encoding (cf. PHP manual, w3c HTML spec) whereas encodeURIComponent() properly uses the percent encoding according to the applied URI spec.

Onionoo does not receive form data, therefore accepting plus encoded as + or %2B and space as %20 is perfectly fine.

The patch passes all checks and tests, and is ready to be merged. Maybe, with some tests added to make clear what data is accepted.

comment:5 Changed 19 months ago by iwakeh

Milestone: Onionoo-1.8.0

comment:6 Changed 19 months ago by iwakeh

Milestone: Onionoo-1.8.0Onionoo-2.0.0

comment:7 Changed 19 months ago by iwakeh

Milestone: Onionoo-2.0.0Onionoo-1.8.0

comment:8 Changed 19 months ago by karsten

Status: merge_readyneeds_revision

Okay, I looked deeper into this issue. I'm writing down here what I found, though I admit that some facts are only based on my memory, as I have a few dozen browser tabs open here and have totally lost control over my minddesktop. Anyway, here's what I found, to be treated with care:

  • Relay Search encodes the space character as %20. So far so good, we handled that before, and we handle that with the patch.
  • Relay Search encodes " as %22, ' as %27, < as %3C, and > as %3E. We'll have to decode these in Onionoo, which we don't do yet.
  • Relay Search does not encode # as %23 but instead cuts off anything starting at that character. This means we cannot search for # in contact lines or other, future qualified search terms. I believe it should percent-encode the # character, but I could be wrong.
  • Relay Search does not encode % as %25. This leads to an IllegalArgumentException in the newly patched (and not yet merged) Onionoo code. I'm not entirely certain, but I'd think that this is illegal on the Relay Search side. And we'll have to catch this in Onionoo and send back a 400 status code.
  • Relay Search does not encode & as %26. I think it'll have to do this, because we're looking for the & character as delimiter to the next parameter. I think the current Onionoo patch doesn't handle this very well, by first decoding the entire query string and then splitting it up at & characters which may have been %26's that we decoded ourselves.
  • Relay Search does not encode + as %2B. And to be clear, this is not strictly required by the Onionoo specification. On the Onionoo side we should treat the unencoded + the same regardless of whether it appears in, say, search=contact:bs+tor or contact=bs+tor. But we don't:
    • In the first case, search=contact:bs+tor, we treat the + as + and not as space character and only return relays with bs+tor in the contact line. This seems reasonable to me, even though we specify in the "contact" parameter that + must be encoded as %2B. But this is the "search" parameter.
    • In the second case, contact=bs+tor, Jetty converts the + to a space character, so that we return all relays with bs and with tor in the contact line, which clearly outnumber those with bs+tor in the contact line. I think we should fix this second case by doing the conversion of all parameter values ourselves and not leaving that to Jetty. If we do, we can remove the requirement to encode + in the "contact" parameter. But let's focus on the Relay Search side first.
  • Relay Search does not encode / as %2F. Random example: The search for AJQ2YvMOznqi0SYrNk/AQx94+Aw does not even trigger a query to Onionoo. Only if I truncate at the / by searching for AJQ2YvMOznqi0SYrNk, I get a result. This is a bug in Relay Search that we need to fix.
  • Our unit tests in ResponseServletTest treat the + character different than Jetty by not decoding it to a space character.
  • Last but not least, when we attempted to support space characters in the qualified search term, like search=contact:"a b c", we overlooked that this does not match only relays with "a b c" as contact line part but all contact lines containing "a", "b", and "c" somewhere in the contact line. Potentially useful, but not the use case we had in mind. Hilarious!

How about we take this out of 1.8.0 and resolve all related issues with more time on our hands? And I don't mean to ignore this ticket until a few days before the next planned release. But let's take the time we need to fix everything that needs fixing, and let's include it in the next release when it's ready. Sound okay?

comment:9 Changed 19 months ago by iwakeh

Milestone: Onionoo-1.8.0Onionoo-1.9.0

This needs more thinking and defining what to accept in Onionoo even though many of the above bullet points are rather Relay Search issues. The missing part is a Onionoo's search&query specification.
New ticket and revise this ticket after such a spec is available.

Move to 1.9.0

comment:10 Changed 19 months ago by iwakeh

Waiting for #24458.

comment:11 in reply to:  10 ; Changed 19 months ago by karsten

Replying to iwakeh:

Waiting for #24458.

Unclear. We might answer some of the open issues above before moving forward with #24458. And maybe it will be easier in this order. Maybe let's do both in parallel.

comment:12 in reply to:  11 Changed 19 months ago by iwakeh

Replying to karsten:

Replying to iwakeh:

Waiting for #24458.

Unclear. We might answer some of the open issues above before moving forward with #24458. And maybe it will be easier in this order. Maybe let's do both in parallel.

I think, the unambiguous Relay Search bullet points from comment:8, that identify deviations from general standards, should become Relay Search tickets. The remaining need to be discussed on different tickets, if unrelated to Onionoo or on #24458, if the topic concerns both.

For this ticket I see three options:
1) merge as-is and then work on #24458 to identify what standards should be met and work toward the newly set goals.
2) only merge the double-quote-fix without the plus-encoding-decoding parts and then proceed with #24458 as in 1).
3) leave this ticket open and do #24458 first.

comment:13 Changed 19 months ago by irl

Cc: irl added

I've made some changes to the way that URLs are encoded and uploaded to a webserver. Could you let me know if this fixes the issues?

https://people.torproject.org/~irl/atlas/#

comment:14 Changed 19 months ago by karsten

irl, can you push your changes to a branch somewhere? (I'm using a local atlas.git with changed base URLs and a local Onionoo server to debug this.) Thanks!

comment:15 Changed 19 months ago by irl

The changes are pushed to my task/24384 branch.

The following files contain the base URL:

js/collections/aggregates.js
js/collections/results.js
js/router.js
js/models/graph.js
js/models/relay.js

If you're doing this often, please open a ticket and I'll refactor the host name portion of the URL into router.js so it's only needing to be changed in one place.

Last edited 19 months ago by irl (previous) (diff)

comment:16 Changed 18 months ago by iwakeh

Milestone: Onionoo-1.9.0Onionoo-2.0.0

comment:17 Changed 16 months ago by karsten

Priority: HighMedium

This ticket has not been modified in the last 2 months or even longer. Setting priority to medium.

comment:18 Changed 16 months ago by karsten

Milestone: Onionoo-2.0.0Onionoo 2.0.0

Milestone renamed

Note: See TracTickets for help on using tickets.