#25575 closed project (fixed)

Server space request (175 GB total) for hosting Tor Browser downloads

Reported by: arthuredelstein Owned by: tpa
Priority: Medium Milestone:
Component: Internal Services/Tor Sysadmin Team Version:
Severity: Normal Keywords:
Cc: weasel, gk, phoul, erinm@…, boklm, brade, mcs Actual Points:
Parent ID: #20628 Points:
Reviewer: Sponsor:

Description

We would like to offer more Tor Browser locales for download, and we'll need more disk space. Currently we serve 16 locales using a total of 41.23 GB of disk space (with two releases of Tor Browser currently on disk). In addition, we currently have an additional 16 further locales for which translations are 100% complete.

A single locale (including alpha and stable, bundles and update (mar) files, and all platforms, and two consecutive releases) requires 2.5 GB, which means we expect approximately 40 GB additional would be needed to host two releases of the remaining Tor Browser versions. That means a total of roughly 82 GB.

However, we might expect to sometimes host as much as many three consecutive releases simultaneously, or 123 GB. On top of that, in the future, we will be adding mobile support, which increases the estimate to ~150 GB.

In the future, we likely also will want to add more locales. It's difficult to know exactly how many this will be, but I would guess it will be 5 or less. Therefore I'm inclined to request that we increase the total space to 175 GB for now (including the ~42 GB already in use) for this year, and re-evaluate 1 year from now. Happy to discuss my calculations further as needed.

Child Tickets

Change History (9)

comment:1 Changed 17 months ago by boklm

Cc: boklm added

comment:2 Changed 17 months ago by mcs

Cc: brade mcs added

comment:3 Changed 17 months ago by weasel

Right now, torbrowser seems to ship things via at least three different services:

  1. aus1.torproject.org/torbrowser
  2. dist.torproject.org/torbrowser
  3. cdn.torproject.org/aus1/torbrowser

Of these, (1) has negligible disk space requirements, as all the big files are in both (2) and (3).

E.g., currently:

 5.2M    aus1.torproject.org/
 42G     dist.torproject.org/torbrowser/
 28G     cdn.torproject.org/aus1/torbrowser/

Do you have an estimate how those 175GB would be spread across the services?

comment:4 Changed 16 months ago by weasel

Status: newneeds_information

comment:5 in reply to:  3 Changed 16 months ago by arthuredelstein

Replying to weasel:

 5.2M    aus1.torproject.org/
 42G     dist.torproject.org/torbrowser/
 28G     cdn.torproject.org/aus1/torbrowser/

Do you have an estimate how those 175GB would be spread across the services?

Great question. Unfortunately I was ignorant of the cdn.torproject.org part. The 175 GB applied to dist.torproject.org only.

For cdn.torproject.org, I am making a similar calculation. Currently there are already 4 releases of Tor Browser mar files on disk. (I don't expect more than this.) But we do expect to double the number of locales, and add mobile. So that suggests a total of 28 GB * 2 * 1.2 = 67 GB. Adding a margin of safety, I would suggest then we could use a total of 80 GB on cdn.torproject.org.

So to summarize, the request of total disk space (including space already in use):
dist.torproject.org: 175 GB
cdn.torproject.org: 80 GB

Now, I should mention that boklm explained to me some details of the servers that I had not previously been aware of. One important detail he explained is that multiple copies of all files are stored on staticiforme.

As far as I understand (gk and boklm, please correct me if I'm wrong) first the files intended for dist.torproject.org are staged in
/srv/dist-master.torproject.org.
Some of these (the mar files) are then hard-linked from
`/srv/cdn-master.torproject.org/
(Hard-links imply no extra space is needed.)

Then the files from each of these master directories are rsync'd to
/srv/static.torproject.org/master/cdn.torproject.org-current-live
and
/srv/static.torproject.org/master/dist.torproject.org-current-live
respectively.

So, because of this duplication on the disk, it seems we would need substantially more storage than I had previously understood. Namely, 175 GB for the master and then 175 GB + 80 GB for the live directories.

comment:6 Changed 16 months ago by arthuredelstein

Parent ID: #20628


comment:7 Changed 16 months ago by weasel

The hardlinking thing is not quite as you describe. In particular, we never hardlink files owned by the non-mirror user. So we copy things from dist-master, and then we hardlink the various trees under control of the mirroring stuff.

The mirroring stuff itself does not necessarily need twice the disk space for an update, but it needs the disk space of the union of the old and new tree. This is true for the master *and* for each mirror. In general, more smaller updates are better than one big one.

Let's assume that between two updates not more than 100g changes, and that you need 250gb in total, of which 100g is already being used.

Then the disk space on the static-master is tight, but if we run out there it's easy to expand and also we notice quickly. Of the static mirrors that cover dist, archeotrichon, listera, and saxatile are fine (now that some of them have been resized). savii isn't and will need to be retired and/or moved.

The CDN backends are ok for now too, but they will need some redesigning soonish anyway.

So, todo list:

  • move/retire savii as a dist.tpo mirror.
  • decide who the cdn redirectors should be and make sure they have sufficient disk space.
  • decide who the cdn backends should be and make sure they have sufficient disk space.

comment:8 Changed 15 months ago by weasel

retired savii as a dist mirror.

comment:9 Changed 15 months ago by weasel

Resolution: fixed
Status: needs_informationclosed

I think we're good to go on this. Please report back after adding a few locales, so we can keep an eye on things.

Note: See TracTickets for help on using tickets.