Opened 19 months ago

Last modified 15 months ago

#25799 new enhancement

Utilize all Onionoo instances

Reported by: iwakeh Owned by: metrics-team
Priority: Medium Milestone:
Component: Metrics Version:
Severity: Normal Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

This issue regards Onionoo as data provider, RelaySearch as consumer, and maybe also operation of instances. Hence, adding it to the main Metrics component.

RelaySearch only request data from the main Onionoo.tp.o instance. Sometimes the onionoo.tp.o instance serves stale data while oo-hetzner-03.tp.o keeps providing up-to-date data.

There are at least two options to utilize the second instance (oo-hetzner-03) better and keep providing useful data on RS:

  • operational level: requests to onionoo.tp.o should be directed to both instances and in case of stale data on one only to the up-to-date instance; this would be transparent for RS and all onionoo-data consuming clients would benefit.
  • RS could be configured with a list of onionoo data providers and choose alternate instances when the main one serves stale data (or is unavailable). This would require some programming effort.

Please add other options to make best use of all onionoo instances.

Child Tickets

Change History (6)

comment:1 Changed 18 months ago by karsten

Wait, I think it's not that the case that RS only requests data from the main Onionoo instance. IIUC, onionoo.tpo sometimes goes to omeiense and sometimes to oo-hetzner-03. Or rather, it goes to one of the several caches which in turn use both backends. The effect is that some requests made by RS are answered by the first instance, directly or indirectly, and some by the next. This switch may even happen in the middle of a user session.

Regarding your second suggestion to have RS fetch data from several instances, I think that wouldn't scale. Remember that it's really the clients/browsers making those requests. So, basically, that would double the number of requests.

Regarding your first suggestion, I don't really know how the DNS round-robin thing works or how we would change that to detect stale data. It might be that this requires some programming/configuration effort, too.

What else could we do? How about we teach Onionoo instances to detect when their data has become stale. In that case they could check whether other instances have more recent data and reply with redirects to other instances until their data is not stale anymore.

comment:2 in reply to:  1 Changed 18 months ago by iwakeh

Replying to karsten:

...

What else could we do? How about we teach Onionoo instances to detect when their data has become stale. In that case they could check whether other instances have more recent data and reply with redirects to other instances until their data is not stale anymore.

This solution alleviates the admins work load (no additional scripting, monitoring etc. on the admin's side). The necessary configuration implementation should pay attention to the decisions made in #24041. That's ok.

Seems like we have an agreement on keeping the solution inside Onionoo avoiding pushing the task toward operation?

If yes, the next steps would be:

  • define the redirect algorithm
  • define the configuration accordingly
  • implement the above.

comment:3 Changed 16 months ago by irl

Perhaps the easiest solution is to have Onionoo instances die if their data has become stale. We should make sure that in this case, Varnish then automatically stops routing requests to that instance and instead routes to another live instance (this is not stale).

comment:4 Changed 15 months ago by karsten

Priority: HighMedium

This ticket is about the same priority as most other Metrics/* tickets. Setting priority back to medium.

comment:5 in reply to:  3 Changed 15 months ago by karsten

Replying to irl:

Perhaps the easiest solution is to have Onionoo instances die if their data has become stale. We should make sure that in this case, Varnish then automatically stops routing requests to that instance and instead routes to another live instance (this is not stale).

While being simple, this solution has a major disadvantage: what if all Onionoo instances receive stale data as input and each of them decides for itself that they should rather shut down? Hmm.

comment:6 Changed 15 months ago by irl

We would notice more quickly perhaps. (:

Note: See TracTickets for help on using tickets.