Opened 4 years ago

Closed 2 years ago

#16553 closed enhancement (implemented)

Add support for searching by (partial) host name

Reported by: karsten Owned by: metrics-team
Priority: Very Low Milestone: Onionoo-1.7.0
Component: Metrics/Onionoo Version:
Severity: Normal Keywords: metrics-2018
Cc: irl Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Searching by host name is currently not supported, but it might be useful to add that. I could imagine supporting it by returning all host names ending with a given search term. For example, sampo.ru would return that relay and all others in the same domain. It would be a new qualified search term though, that is, one would have to search for it in Atlas like this: "hostname:sampo.ru".

This enhancement request originates from #10128 and has low priority.

Child Tickets

Change History (28)

comment:1 Changed 2 years ago by karsten

Severity: Normal
Summary: Consider adding support for searching by host nameAdd support for searching by host name

Simplify summary. (Considering whether something is a good idea or not is always part of a ticket.)

comment:2 Changed 2 years ago by karsten

Keywords: metrics-2018 added

comment:3 Changed 2 years ago by karsten

Owner: set to metrics-team
Status: newassigned

comment:4 Changed 2 years ago by karsten

Cc: irl added
Status: assignedneeds_information

irl, you said something today that you don‘t really see the use case here. Do you think we can close this ticket?

comment:5 Changed 2 years ago by cypherpunks

I would find this useful to have for https://nusenu.github.io/OrNetRadar
but it somewhat depends on proper reverse DNS data (currently onionoo does not provide raw PTR data, see #18342).

comment:6 Changed 2 years ago by irl

Aaaaah

Do you mean search by *partial* host name? This makes a lot more sense as a use case.

Instead of searching for a single host, you're searching for groups of hosts.

comment:7 Changed 2 years ago by karsten

Status: needs_informationneeds_review
Summary: Add support for searching by host nameAdd support for searching by (partial) host name

Alright, let's do it.

How about we add a new parameter "host_name" with the following specification: "Return only relays with a domain name ending in the given (partial) host name. Filtering by host name is case-insensitive."

Please review my task-16553 branch with an implementation.

Requires a minor protocol version bump when merging, because it adds a new parameter.

comment:8 Changed 2 years ago by irl

+  @Test
+  public void testHostNameUmlaut() {
+    this.assertErrorStatusCode("/summary?host_name=äöü", 400);
+  }

We should say that Unicode must be encoded as punycode, for the avoidance of doubt.

comment:9 Changed 2 years ago by karsten

Aha, right, we should avoid confusions there. However, wouldn't we also have to encode domain names that we look up to punycode ourselves in order to say that? Could we limit this parameter to ASCII characters? Or would that be too much of a hack?

That would be (with new parts in italics): "Return only relays with a domain name ending in the given (partial) host name. Only ASCII characters are supported as host names. Filtering by host name is case-insensitive."

comment:10 Changed 2 years ago by irl

The punycode version *is* the ASCII version, and should be what we get from doing a lookup unless we've explicitly done some IDN conversion.

https://www.ietf.org/rfc/rfc3492.txt

"ASCII characters in the Unicode string are represented literally, and non-ASCII characters are represented by ASCII characters that are allowed in host name labels (letters, digits, and hyphens)."

comment:11 Changed 2 years ago by karsten

Okay, we're not doing anything with lookup responses, so let's say: "Return only relays with a domain name ending in the given (partial) host name. Non-ASCII host name characters must be encoded as punycode. Filtering by host name is case-insensitive." Please tweak as necessary! Thanks!

comment:12 Changed 2 years ago by irl

That description looks good to me functionally. (:

It might be useful to also add a tip to users:

To search only for subdomains of a specific domain, prefix your search string with a period, for example: "host_name:.csail.mit.edu".

comment:13 in reply to:  12 Changed 2 years ago by karsten

Replying to irl:

That description looks good to me functionally. (:

It might be useful to also add a tip to users:

To search only for subdomains of a specific domain, prefix your search string with a period, for example: "host_name:.csail.mit.edu".

Looks good! Rephrasing a tiny bit for uniform style and to avoid the ":" notation that is only used for qualified search terms:

"Return only relays with a domain name ending in the given (partial) host name. Searches for subdomains of a specific domain should ideally be prefixed with a period, for example: ".csail.mit.edu". Non-ASCII host name characters must be encoded as punycode. Filtering by host name is case-insensitive."

Obviously, feel free to use your description for Atlas where instructions are more user-facing and where qualified search terms are the only way to even use this parameter.

comment:14 Changed 2 years ago by irl

Description looks good to me.

comment:15 Changed 2 years ago by iwakeh

Milestone: Onionoo-1.7.0

Adding the next milestone as these tickets are close to completion.

comment:16 Changed 2 years ago by iwakeh

There are checkstyle complaints for the tests (line length and method names).

For SummayDocument the variables should be named correctly and instead of suppressing warnings using @SuppressWarnings("checkstyle:membername") the exposed values should be annotated (@Expose) and the @SerializedName should be defined. Of course, here only variable h was added, but this is a good time to overhaul all fields.

Rather use if(null == this.hostName) than if(this.hostName == null) (cf. RequestHandler).

Consider using Java8 idiom, for example:

--- a/src/main/java/org/torproject/onionoo/server/RequestHandler.java
+++ b/src/main/java/org/torproject/onionoo/server/RequestHandler.java
@@ -551,15 +551,12 @@ public class RequestHandler {
     }
     String hostName = this.hostName.toLowerCase();
     Set<String> removeRelays = new HashSet<>(this.filteredRelays.keySet());
-    for (Map.Entry<String, Set<String>> e :
-        this.nodeIndex.getRelaysByHostName().entrySet()) {
-      if (e.getKey().endsWith(hostName)) {
-        removeRelays.removeAll(e.getValue());
-      }
-    }
-    for (String fingerprint : removeRelays) {
-      this.filteredRelays.remove(fingerprint);
-    }
+    this.nodeIndex.getRelaysByHostName().entrySet().stream()
+      .filter((mapEntry) -> mapEntry.getKey().endsWith(hostName))
+      .forEach((mapEntry) -> removeRelays.removeAll(mapEntry.getValue()));
+    removeRelays.stream()
+        .forEach((fingerprint) -> this.filteredRelays.remove(fingerprint));
     this.filteredBridges.clear();
   }

(Again timeout for testCountryDeDe needed to be increased; weird, the stack trace is usually in a native method when the timeout occurs. Maybe, simply increase this timeout to 200ms?)

comment:17 Changed 2 years ago by iwakeh

Status: needs_reviewneeds_revision

comment:18 Changed 2 years ago by iwakeh

See #24059 for documenting the protocol changes.

comment:19 Changed 2 years ago by karsten

I'll work on the checkstyle complaints (oops!) and on renaming variables.

Regarding that Java 8 idiom, do you know whether it's possible to rewrite that to something that doesn't require a temporary data structure like removeRelays? Would be cool to simplify that code and related code in the same class even more.

comment:20 Changed 2 years ago by karsten

Pushed two commits to my task-16553 branch, not including the Java 8 change.

comment:21 Changed 2 years ago by karsten

(Plus another fixup commit... Bah.)

comment:22 in reply to:  19 Changed 2 years ago by iwakeh

Replying to karsten:

I'll work on the checkstyle complaints (oops!) and on renaming variables.

Regarding that Java 8 idiom, do you know whether it's possible to rewrite that to something that doesn't require a temporary data structure like removeRelays? Would be cool to simplify that code and related code in the same class even more.

A simplification rather due to using the hostname from the summary doc than to j8:

diff --git a/src/main/java/org/torproject/onionoo/server/RequestHandler.java b/src/main/java/org/torproject/onionoo/server/RequestHandler.java
index 23af60b..dd923d4 100644
--- a/src/main/java/org/torproject/onionoo/server/RequestHandler.java
+++ b/src/main/java/org/torproject/onionoo/server/RequestHandler.java
@@ -17,6 +17,7 @@ import java.util.List;
 import java.util.Map;
 import java.util.Set;
 import java.util.SortedMap;
+import java.util.stream.Collectors;
 
 public class RequestHandler {
 
@@ -550,16 +551,12 @@ public class RequestHandler {
       return;
     }
     String hostName = this.hostName.toLowerCase();
-    Set<String> removeRelays = new HashSet<>(this.filteredRelays.keySet());
-    for (Map.Entry<String, Set<String>> e :
-        this.nodeIndex.getRelaysByHostName().entrySet()) {
-      if (e.getKey().endsWith(hostName)) {
-        removeRelays.removeAll(e.getValue());
-      }
-    }
-    for (String fingerprint : removeRelays) {
-      this.filteredRelays.remove(fingerprint);
-    }
+    this.filteredRelays
+        .entrySet().retainAll(this.filteredRelays.entrySet().stream()
+        .filter(entry
+            -> null != entry.getValue().getHostName()
+              && entry.getValue().getHostName().endsWith(hostName))
+        .collect(Collectors.toSet()));
     this.filteredBridges.clear();
   }

comment:23 in reply to:  20 Changed 2 years ago by iwakeh

Replying to karsten:

Pushed two commits to my task-16553 branch, not including the Java 8 change.

These changes and the additional fixup look fine. So much more readable now! I'll wait with more than visual inspection for the final branch :-)

comment:24 Changed 2 years ago by karsten

Hmm, can you rewrite that Java 8 snippet to something that uses the node index rather than iterate over all remaining SummaryDocuments? One reason is that it's potentially faster (otherwise we could as well drop the index entirely), and another reason is that even if it doesn't make a huge performance difference in this case, we could adapt the Java 8 approach more easily to the other methods in RequestHandler that do rely on the index.

comment:25 Changed 2 years ago by karsten

How about we merge my branch for the upcoming release and leave the Java 8 stuff for the next release?

comment:26 Changed 2 years ago by iwakeh

Status: needs_revisionmerge_ready

Merge ready.

comment:27 Changed 2 years ago by karsten

Rebased and merged to master. Not closing yet until we have moved the Java 8 part to another ticket.

comment:28 Changed 2 years ago by iwakeh

Resolution: implemented
Status: merge_readyclosed

Java8 remark added to #23752.

Closing, thanks!

Note: See TracTickets for help on using tickets.