stop GitLab: assume that we don't, after all, want to use GitLab and shut it down. I am not sure anyone is actually proposing this, but I'm putting it out there.
status quo: no or piecemeal migration. dip/gitlab exists and teams migrate organically or not at all, when/if they want, and we keep trac forever. probably not acceptable.
migrate team by team: a) pick a team. b) convince team to migrate. c) migrate all issues, code and wiki pages to gitlab d) move on to next team e) congrats, you're done. seems like the ideal plan to me, because it can be done incrementally and allows for progressive testing and ironing out of issues. might be difficult to automate.
migrate in one shot. we just bite the bullet and migrate everything and everyone all at once, with a flag day when Trac becomes readonly. radical solution. might be faster and easier to perform than the other solution (less labor) but is much riskier because, if things break, we need to fix them VERY FAST NOW and people will/may be upset
In the parent ticket, I mentioned tracboat as a tool that might be used to migrate from Trac to GitLab. I am not sure it supports migrating one project/component at a time, at least it's not obvious how to do so in the documentation.
Another problem is how to deal with Trac in the long term. A complete migration wouldn't be complete if Trac still requires maintenance. For this, I see those options:
the golden redirect set: every migrated ticket and wiki page has a corresponding ticket/wiki page in GitLab and a gigantic set of redirection rules makes sure they are mapped correctly. probably impractical, but solves the maintenance problem possibly forever.
read-only Trac: user creation is disabled and existing users are locked from making any change to the site. only a temporary or intermediate measure.
fossilization: Trac is turned into a static HTML site that can be mirrored like any other site. can be a long term solution and a good compromise with a possibly impossible to design and therefore failing (because incomplete) set of redirection rules.
destruction: we hate the web and pretend link rot is not a problem and just get rid of the old site, assuming everything is migrated and people will find their stuff eventually. probably not an option.
As a safety precaution, I have already started step 3, in a way. I am working with Archive Team to send a copy of the Trac website into the internet archive, thanks to archivebot. This will also allow us to build a good "ignore set", a list of patterns to ignore to avoid getting lost in the website when/if we decide to create a static HTML copy. It's also a good practice to have a backup of all of our stuff in the internet archive.
This currently consists of two crawl jobs:
https://trac.torproject.org/ - just feed the site into wpull (this is what archivebot does, basically) and tweak the ignores to skip the nastier stuff. Ignore set currently includes:
^https?://trac\.torproject\.org/projects/tor/wiki/.*[?&]action=diff(&|$): diffs covered by previous revisions
Update: Fossilization seems less and less practical. The archivebot jobs are yielding large results, with an archive of only the tickets (https://trac.torproject.org/projects/tor/ticket/\d+) at 400MB after 6000 tickets (1/5th of the tickets), which would yield around 2GB, excluding the wiki. The full crawl is close to 1GB with at least less than 10% of the crawl done.
Therefore a full static copy of the Trac website would be at least 10GB, quite impractical. It might be worth looking into proper redirects or whether it's acceptable to have those links broken. Alternatively, we could simply redirect to IA or assume people will look there for missing bits.
Hi anarcat. Why is 10 GB impractical? In terms of disk storage that's pretty tiny. Is there something about the content that makes that size problematic?
It's around 700MB, compressed. That might seem like a lot, but that's just for the tickets. The crawl job for the entire Trac site is still ongoing, and is currently at 40GB, with 160,000 URLs crawled, and still 500,000 more to go, so we can assume it will be at least 200GB, but we just don't really know until the crawl is finished (because each new page can yield new links).
The problem is not 10GB, it's the 200GB or 500GB or more. :) Maybe it's fine to have such a large dataset around forever, but from other experience, I see we have trouble holding on to that stuff (see for example the problems we have with archive.tpo now, in #29697 (moved)).
So, TL;DR: it's not 10GB. It will be closer to 200GB, maybe a terabyte, for the full crawl.
Is each page large, or have we missed a cardinality explosion somewhere?
Are there some elements we can strip out of each page?
Are there some links or sections we can ignore (for example, ticket queries?)
Archivebot crawls outgoing links as well, and there are probably a lot of those in the wiki. It crawls only one level out, but it probably still adds up. A simpler crawl would obviously be smaller.
There are probably cardinality explosions and things we can ignore. You are welcome to contribute to improving the crawl, still in progress here:
Hi anarcat. This ticket's description argues against having both trac and gitlab, but does not explain why we're migrating to gitlab. Where was this decided? Mind pointing this ticket toward that thread?
Last year there was some discussion regarding github, but my understanding was that we're keeping our tpo git and trac instances as they are.
So while I personnally believe we should migrate to GitLab for a variety of reasons, we may want to keep running Trac forever instead. I suspect that will not be the case, but I'm open to the idea.
What I am strongly against, however, is running both software indefinitely. They are both complex pieces of machinery (GitLab maybe even more so) and it would me nonsensical to run them both at the same time. That would make basically everyone unhappy: people unhappy with trac or GitLab would still have to deal with it, and I, as a sysadmin, would still need to maintain both as well. That's the "status quo" option 1, above, and I really think it's a bad idea.
If we're going to use GitLab, we should migrate. I don't think it's reasonable to maintain both services forever. I must admit that I was assuming that, by setting up the GitLab instance, a decision had been made to migrate as well, but you are right that this decision hasn't been clearly made yet either.
Therefore, this is also a space to have that discussion. I have heard rumours of concern about GitLab, but nothing clearly substanciated yet.
Also, just to keep the idea open and recognize a decision hasn't actually been made, I'll add the "option zero" of not using GitLab at all and shutting down the experiment. I feel that's also the wrong way to go as people are generally enthusiastic about the project, but I'll keep an open mind. :)
Trac: Description: having both Trac and GitLab for TPO might not be desirable in the long term, both for maintenance and consistency. if GitLab is okay for people, we should consider migrating to it and turning off (or turning into a static website) this Trac instance.
to
Having both Trac and GitLab for TPO is not desirable in the long term, both for maintenance and consistency across projects.
If GitLab is okay for people, we should consider migrating to it and turning off (or turning into a static website) this Trac instance.
This ticket explores the practicalities behind this project.
The wiki of trac can be easily redirect without a gigantic redirect file because it can be set in the section and page directly.
Tickets are a different story.
Gitlab is also organized in projects and we have been using Trac with tags. We might not have a complete mapping between the two that doesn't overlap two projects and we might have to make some hard choices.
Furthermore when we did the last survey about trac vs github a few years ago we talked that trac links had been used as references for papers so preserving those was a hard requirement.
I am personally for making trac read only and above all not searchable. That way we save a lot of resources and we still preserve old tickets.
Finally, I think we should start to migrate active tickets of certain projects only at this point, so that we don't go through a radical switch from one system to another, also while we just freeze old tickets.
The wiki of trac can be easily redirect without a gigantic redirect file because it can be set in the section and page directly.
Tickets are a different story.
Gitlab is also organized in projects and we have been using Trac with tags. We might not have a complete mapping between the two that doesn't overlap two projects and we might have to make some hard choices.
I think it's a similar problem, actually: every ticket, every wiki page is in this single project in Trac. I doubt that it makes sense to keep the same "one gigantic wiki" approach if/when we migrate to GitLab: each project or team could have its own wiki...
So we will probably want to split wikis and tickets up by "component" or some sort of delimiter. This could be done in the migration or after, in GitLab.
Furthermore when we did the last survey about trac vs github a few years ago we talked that trac links had been used as references for papers so preserving those was a hard requirement.
Yes, that was my understanding as well. One thing I am thinking of is to make sure that, in the migration, the original URL of the migrated page (ticket or wiki) is retained somewhere in GitLab so that it can be searched. That way we could have a redirection that finds that stuff more easily. I don't know how practical that can be, but that's the kind of stuff we'll find out about when we start working on the migration.
I am personally for making trac read only and above all not searchable. That way we save a lot of resources and we still preserve old tickets.
Well, maybe I'm not familiar enough with Trac, but what does that actually mean? We might be able to disable all users and disable user registration, but then people can still search for tickets and crawl the website and cause trouble. If we disable all queries, then people can't find tickets any more, and I'm not even sure we can just disable searching like that.
In any case, this all means maintaining Trac forever: "readonly" still means "Trac is installed, running, upgraded and maintained", and I would very much like to stop doing that eventually.
Finally, I think we should start to migrate active tickets of certain projects only at this point, so that we don't go through a radical switch from one system to another, also while we just freeze old tickets.
So my enquiries so far at the migration systems is they (well, "it", really) proceed in one big batch, per Trac project. Because we have a single Trac project, it will actually be pretty difficult to migrate tickets one at a time: I suspect that will not be possible at all, and especially tricky if we want to retain ticket number associations.
To put it quite bluntly, we're need to shit or get off the pot here at some point. :) Maybe a few teams can start using it for new projects: the website stuff is a good example. Or small projects can be migrated if they don't mind losing ticket references.
But if ticket portability is that critical, I think the only way this can be ensured in the long term is to do a proper migration, with Trac ticket metadata embedded inside GitLab tickets.
Because there's no way we can keep maintaining Trac forever and I am not sure at all we'll be able to permanently archive it. That's still an open question, mind you, but I can't help but feel that if we're going to migrate tickets anyways, we might as well do it correctly...
and before everyone goes off the rails freaking out about me shutting down Trac tomorrow forever, please:
= DON'T PANIC!
the main objective in opening this ticket is to brainstorm and document how to possibly migrate from Trac to GitLab. it's going to take some time and we'll get to talk about it.
if you have objections, they are welcome and you can state them here! but please stay calm, we're doing this for the win and hopefully make everyone happier with our tools, not the opposite. :)
the actual server uses around 25GB of disk space because of random junk here and there but that's the very minimum it can be trimmed down to. naturally, we can keep that data forever, the problem is keeping the app running on top of that...
I must admit that I was assuming that, by setting up the GitLab instance, a decision had been made to migrate as well, but you are right that this decision hasn't been clearly made yet either.
Therefore, this is also a space to have that discussion. I have heard rumours of concern about GitLab, but nothing clearly substanciated yet.
Ahhh! Thank you anarcat, this makes a lot more sense.
If some folks prefer GitLab that's great! But migrating us away from Trac is not a decision to be taken lightly, and requires community buy-in.
I was around a decade ago for our migration to Trac, and what you're proposing is a big move that impacts us all. Especially if you want to propose shutting Trac down without redirects. As I see it there's three open questions...
Do we want to migrate away from Trac at all?
If so, what would we prefer to move to? GitHub? GitLab? Something else?
What will happen with Trac's ticket and wiki data?
This ticket is not the proper place develop consensus on such a large move. If you'd care to pursue this I'd suggest...
Open the GitLab instance up. I tried to look at https://dip.torproject.org/ to see what my projects look like on it but I'm presented with a login page. As an open source developer this makes it DOA right from the starting gate. :)
Begin a thread on tor-project@ to see how the community feels about this. I suspect if we move at all folks will prefer GitHub to GitLab, but I'm definitely curious to see what people think.
If some folks prefer GitLab that's great! But migrating us away from Trac is not a decision to be taken lightly, and requires community buy-in.
I totally agree. I consider we're at the "feasability study" stage. :)
I was around a decade ago for our migration to Trac, and what you're proposing is a big move that impacts us all. Especially if you want to propose shutting Trac down without redirects. As I see it there's three open questions...
For the record, I really, really want to have redirects, if we can't archive the entire website. I understand it's a solid requirement.
Do we want to migrate away from Trac at all?
If so, what would we prefer to move to? GitHub? GitLab? Something else?
What will happen with Trac's ticket and wiki data?
I think there are some answers to this above, from my perspective, but this should definitely be discussed more widely, once we have a clearer idea of a possible way forward.
This ticket is not the proper place develop consensus on such a large move. If you'd care to pursue this I'd suggest...
Open the GitLab instance up. I tried to look at https://dip.torproject.org/ to see what my projects look like on it but I'm presented with a login page. As an open source developer this makes it DOA right from the starting gate. :)
Begin a thread on tor-project@ to see how the community feels about this. I suspect if we move at all folks will prefer GitHub to GitLab, but I'm definitely curious to see what people think.
Hear hear. I'm all for discussing this more widely, but I also think it's a good idea to have a plan first.
I intend to research the topic a little more, maybe do a few actual tests (archiving trac into HTML, testing a migration to a test gitlab project or fake instance) to see how a migration could look like and/or how much time it would take. Then we can come up with something more concrete that people will understand better than the current vague idea of where we're going. :)
More concretely, I'm thinking of writing a design doc for the migration, hopefully it will make everything and the options a little more concrete. Trac was one of the first thing added to my priority list when I was hired at TPI, three months ago, and it's still high on my radar. I haven't had any concrete bug reports other than the occasional "trac is slow" which is generally transient, so it's hard to figure out what the next step is. But people are getting more and more aggravated about the service and I think we need to start to think about the exit strategy...
How does that sound?
One thing I don't really want is a huge flamewar/bikeshed on this, so i think doing this research is definitely useful.
Also note that this ticket is part of #29400 (moved) which explicitely says:
We are going to evaluate Gitlab as a replacement for trac, gitweb.tpo, git-rw.tpo, github.com.
Some team (snowflake?) to use gitlab exclusively. move (copy + add link to gl) existing tickets to gitlab service (not by tsa but by gitlab team)
Runners could be provided by anyone. so, it could be done outside of tpa/tpo for evaluation, and if we like it in the end we can add some runners later.
So there's a precedent in the idea of migrating at least some teams to GitLab permanently. I'm taking the next step and asking what's going to happen with the rest of trac, because I certainly don't want to keep that technical debt around too long. ;)
Great! Think we're completely on the same page. :)
Some team (snowflake?) to use gitlab exclusively... So there's a precedent in the idea of migrating at least some teams to GitLab permanently.
Gotcha. My understanding is that Snowflake uses GitLab whereas the Network team and Ooni (?) are moving toward GitHub. Snowflake is tiny by comparison, which is why I suspect if we're going to move at all it will be toward GitHub rather than GitLab. That said, delighted for folks to experiment.
Gotcha. My understanding is that Snowflake uses GitLab whereas the Network team and Ooni (?) are moving toward GitHub. Snowflake is tiny by comparison, which is why I suspect if we're going to move at all it will be toward GitHub rather than GitLab. That said, delighted for folks to experiment.
I can't speak for either team, but I do not think there's a consensus there yet, although Ooni do seem to be using GitHub extensively already.
Great! Think we're completely on the same page. :)
Some team (snowflake?) to use gitlab exclusively... So there's a precedent in the idea of migrating at least some teams to GitLab permanently.
Gotcha. My understanding is that Snowflake uses GitLab whereas the Network team and Ooni (?) are moving toward GitHub. Snowflake is tiny by comparison, which is why I suspect if we're going to move at all it will be toward GitHub rather than GitLab. That said, delighted for folks to experiment.
Ooni use GitHub as their main development platform, including tickets and pull requests.
In the network team, we've tried using GitHub and various GitLab instances for a few different things. But we tend to want to retain control of our git and tickets. So at the moment, we use GitHub as a git mirror, for pull request review, and to trigger branch and pull request CI on Travis and Appveyor. If GitLab can work with Travis and Appveyor, then that would make the transition easier for us. (We also use tor's git and Jenkins, for CI, and to build nightly binaries.)
But I have no idea what GitLab will do for us. So it's very hard for me to have an informed opinion on any transition.
Can we please create a list of:
the things GitLab MUST give us
these are our acceptance criteria: if the migration doesn't do the thing, we should roll it back and try again
the things GitLab SHOULD NOT give us:
these are our known sacrifices: if the migration loses the thing, we agree to accept it anyway
Anything not listed might be included, if it's easy to do. But we can't rely on it.
Who can create a list like this, and when can we have it ready?
(Or is there an existing list?)
Great! Think we're completely on the same page. :)
Some team (snowflake?) to use gitlab exclusively... So there's a precedent in the idea of migrating at least some teams to GitLab permanently.
Gotcha. My understanding is that Snowflake uses GitLab whereas the Network team and Ooni (?) are moving toward GitHub. Snowflake is tiny by comparison, which is why I suspect if we're going to move at all it will be toward GitHub rather than GitLab. That said, delighted for folks to experiment.
Great! Think we're completely on the same page. :)
Some team (snowflake?) to use gitlab exclusively... So there's a precedent in the idea of migrating at least some teams to GitLab permanently.
Gotcha. My understanding is that Snowflake uses GitLab whereas the Network team and Ooni (?) are moving toward GitHub. Snowflake is tiny by comparison, which is why I suspect if we're going to move at all it will be toward GitHub rather than GitLab. That said, delighted for folks to experiment.
Ooni use GitHub as their main development platform, including tickets and pull requests.
In the network team, we've tried using GitHub and various GitLab instances for a few different things. But we tend to want to retain control of our git and tickets. So at the moment, we use GitHub as a git mirror, for pull request review, and to trigger branch and pull request CI on Travis and Appveyor. If GitLab can work with Travis and Appveyor, then that would make the transition easier for us. (We also use tor's git and Jenkins, for CI, and to build nightly binaries.)
But I have no idea what GitLab will do for us. So it's very hard for me to have an informed opinion on any transition.
Can we please create a list of:
the things GitLab MUST give us
these are our acceptance criteria: if the migration doesn't do the thing, we should roll it back and try again
the things GitLab SHOULD NOT give us:
these are our known sacrifices: if the migration loses the thing, we agree to accept it anyway
Anything not listed might be included, if it's easy to do. But we can't rely on it.
Who can create a list like this, and when can we have it ready?
(Or is there an existing list?)
So far it seems that we only have 1 feature from trac that can not migrate into gitlab (the parent/child relationship between tickets) but we can have something similar that is adding relationship (links) between tickets.
most of the time, i use parent/child relationships as just that, a relationship, not specifically for a hierarchy. this could easily be replaced by just mentioning tickets in the summary. for more elaborate things, the milestone support for gitlab is enough, imho.
There is other one big issue to resolve in gitlab. Right now people need to have an account in gitlab to be able to fill new issues. We need anybody to be able to create issues in gitlab (cypherpunks account in trac).
I must admit that I was assuming that, by setting up the GitLab instance, a decision had been made to migrate as well, but you are right that this decision hasn't been clearly made yet either.
Therefore, this is also a space to have that discussion. I have heard rumours of concern about GitLab, but nothing clearly substanciated yet.
Ahhh! Thank you anarcat, this makes a lot more sense.
If some folks prefer GitLab that's great! But migrating us away from Trac is not a decision to be taken lightly, and requires community buy-in.
I was around a decade ago for our migration to Trac, and what you're proposing is a big move that impacts us all. Especially if you want to propose shutting Trac down without redirects. As I see it there's three open questions...
Do we want to migrate away from Trac at all?
If so, what would we prefer to move to? GitHub? GitLab? Something else?
What will happen with Trac's ticket and wiki data?
This ticket is not the proper place develop consensus on such a large move. If you'd care to pursue this I'd suggest...
Open the GitLab instance up. I tried to look at https://dip.torproject.org/ to see what my projects look like on it but I'm presented with a login page. As an open source developer this makes it DOA right from the starting gate. :)
Atagar, projects have not been migrated yet, but you can still check how everything looks like by using your tpo email and request a password reset.
There is other one big issue to resolve in gitlab. Right now people need to have an account in gitlab to be able to fill new issues. We need anybody to be able to create issues in gitlab (cypherpunks account in trac).
What's "cypherpunks account in Trac"?
We could just open registration on GitLab. We need to keep in mind this could create exactly the same kind of issue we're having right now in Trac, namely that we have thousands of "junk" users (see #29420 (moved)).
most of the time, i use parent/child relationships as just that, a relationship, not specifically for a hierarchy. this could easily be replaced by just mentioning tickets in the summary. for more elaborate things, the milestone support for gitlab is enough, imho.
I use parent/child relationships to get an automatically updated list of related tickets. I don't know if GitLab does that.
There is other one big issue to resolve in gitlab. Right now people need to have an account in gitlab to be able to fill new issues. We need anybody to be able to create issues in gitlab (cypherpunks account in trac).
We could just open registration on GitLab. We need to keep in mind this could create exactly the same kind of issue we're having right now in Trac, namely that we have thousands of "junk" users (see #29420 (moved)).
I'd like to know what our solution is for account and form spammers.
I've added a "need to check" section to the document, and moved some of the features to that section.
I checked the GitLab integrations, Jenkins works, but Travis and Appveyor don't. So we need to keep mirroring git to GitHub for Travis and Appveyor.
Yeah, I don't know if we're talkinga bout getting rid of GitHub here yet. One thing at a time. GitLab CI, however, might allow us to replace Travis eventually...
I also wonder how many GitLab CI runner machines we will have, and who will pay for them.
... however this is also out of scope for now. We're talking abuot (possibly) replacing Trac, not Trac and Jenkins all at once. One thing at a time. :)
(That said, if we do eventually replace jenkins with Gitlab CI, the builders we have now can just be repurposed for GitLab CI. It's all hardware in the end.)
So, TL;DR: no runners yet, as far as I know, but if people want to provision external ones, the sweet thing is this can be done without intervention from TPA or GitLab admins...
There are a few blockers (from network team people) about this migration:
ticket number preservation
They want to not have collition between trac ticket numbers and gitlab issue numbers. That would mean to have new numbers for new tickets when starting to use gitlab officially.
add all tickets (including closed ones)
They want to have ALL tickets from trac in gitlab to preserve the history of Tor in one place.
get all info from each ticket into an issue (including comments in the trac ticket addded as a 'trac user' to the gitlab issue)
This would mean to have each comment from each trac ticket as a comment in the gitlab issue. The possible solution would be to have a 'trac user' in gitlab that is the one making all the comments that are being migrated from trac.
If we are including this 3 points in the migration then we do not need to archive trac and it could be decomission once the migration is complete.
Agreed. I think it would be essential to keep that. Any self-respecting migration tool should allow us to "dump" all the trac tickets into a (single!) GitLab project, keeping ticket numbers.
They want to not have collition between trac ticket numbers and gitlab issue numbers.
This, however, seems to say something else: does it mean that we don't want Trac ticket #1 to be the same ticket as GitLab ticket #1? That would be in contradiction with "ticket number preservation" in my mind.
That would mean to have new numbers for new tickets when starting to use gitlab officially.
I interpret this as meaning that, assuming we migrate Trac tickets from 1 to N when Trac is made readonly (for the migration, it can be turned off after), the next ticket in gitlab will be N+1?
add all tickets (including closed ones)
They want to have ALL tickets from trac in gitlab to preserve the history of Tor in one place.
Sure, that should be done. Then we have this "legacy" gitlab project with a humongous pile of tickets like we have in Trac right now, but we can "split" those up as needed by moving tickets around with the API.
get all info from each ticket into an issue (including comments in the trac ticket addded as a 'trac user' to the gitlab issue)
This would mean to have each comment from each trac ticket as a comment in the gitlab issue. The possible solution would be to have a 'trac user' in gitlab that is the one making all the comments that are being migrated from trac.
That makes sense as well, I'd be happy to see that happen, and I think this is all the kind of stuff Tracboat should do.
I would still put Trac readonly during and after the migration, then do one last archival to the Internet archive. I would then create a "redirection site" that would do things like:
And then trac can be totally decommissioned (although I would keep backups for a while, just to be sure, of course, but that's part of our decommissioning procedure anyways.
Agreed. I think it would be essential to keep that. Any self-respecting migration tool should allow us to "dump" all the trac tickets into a (single!) GitLab project, keeping ticket numbers.
Tickets will be imported by team/project. It will not work for us to have ALL trac tickets in one project in gitlab.
And that brings me the question on where are we going to have sysadmin tickets in gitlab? I was thinking as its own group in gitlab but you may have other idea for it.
They want to not have collition between trac ticket numbers and gitlab issue numbers.
This, however, seems to say something else: does it mean that we don't want Trac ticket #1 to be the same ticket as GitLab ticket #1? That would be in contradiction with "ticket number preservation" in my mind.
Sorry that I was not clear. Any new ticket in gitlab will have a number that has not being assigned in trac yet. We preserve the number for tickets that already exist.
That would mean to have new numbers for new tickets when starting to use gitlab officially.
I interpret this as meaning that, assuming we migrate Trac tickets from 1 to N when Trac is made readonly (for the migration, it can be turned off after), the next ticket in gitlab will be N+1?
Yes.
add all tickets (including closed ones)
They want to have ALL tickets from trac in gitlab to preserve the history of Tor in one place.
Sure, that should be done. Then we have this "legacy" gitlab project with a humongous pile of tickets like we have in Trac right now, but we can "split" those up as needed by moving tickets around with the API.
get all info from each ticket into an issue (including comments in the trac ticket addded as a 'trac user' to the gitlab issue)
This would mean to have each comment from each trac ticket as a comment in the gitlab issue. The possible solution would be to have a 'trac user' in gitlab that is the one making all the comments that are being migrated from trac.
That makes sense as well, I'd be happy to see that happen, and I think this is all the kind of stuff Tracboat should do.
I would still put Trac readonly during and after the migration, then do one last archival to the Internet archive. I would then create a "redirection site" that would do things like:
And then trac can be totally decommissioned (although I would keep backups for a while, just to be sure, of course, but that's part of our decommissioning procedure anyways.
There is other one big issue to resolve in gitlab. Right now people need to have an account in gitlab to be able to fill new issues. We need anybody to be able to create issues in gitlab (cypherpunks account in trac).
I feel like this is one of the most important issues, given how Github and Gitlab treat Tor users (try to make an account on Gitlab with a throwaway mail using Tor).
Tickets will be imported by team/project. It will not work for us to have ALL trac tickets in one project in gitlab.
I don't see why that has to be the case. We could (more!) easily import everything in a single project and then, post-import, split tickets up between projects.
Not doing so will make it impossible to fill that first requirement, as there will not be a stable URL on GitLab's side for ticket #1234 (closed) from Trac.
And that brings me the question on where are we going to have sysadmin tickets in gitlab? I was thinking as its own group in gitlab but you may have other idea for it.
Sure, they can be moved to its own group after the import, like everything else.
Sorry that I was not clear. Any new ticket in gitlab will have a number that has not being assigned in trac yet. We preserve the number for tickets that already exist.
Agreed, although you need to understand that ticket numbering is per project in GitLab. (Strictly speaking, that's also the case in Trac, but we have only a single project in Trac, while we already have multiple project in GitLab.)
So in practice, we will have multiple #1234 (closed) tickets in GitLab. This is why we need to import everything in a single project at first so that we have consistent numbering. Then when we move issues around in GitLab, the numbers will change, but there will be a note in the "legacy" tickets pointing to the new one.
I don't know how else you could implement those constraints otherwise.
Tickets will be imported by team/project. It will not work for us to have ALL trac tickets in one project in gitlab.
I don't see why that has to be the case. We could (more!) easily import everything in a single project and then, post-import, split tickets up between projects.
Not doing so will make it impossible to fill that first requirement, as there will not be a stable URL on GitLab's side for ticket #1234 (closed) from Trac.
We need to find a way to get this requirement (ticket number unique across tor project group but in its own project) with tickets in its own project. Check the plan document to see the structure we are proposing (it is at the end of the document)
And that brings me the question on where are we going to have sysadmin tickets in gitlab? I was thinking as its own group in gitlab but you may have other idea for it.
Sure, they can be moved to its own group after the import, like everything else.
Sorry that I was not clear. Any new ticket in gitlab will have a number that has not being assigned in trac yet. We preserve the number for tickets that already exist.
Agreed, although you need to understand that ticket numbering is per project in GitLab. (Strictly speaking, that's also the case in Trac, but we have only a single project in Trac, while we already have multiple project in GitLab.)
So in practice, we will have multiple #1234 (closed) tickets in GitLab. This is why we need to import everything in a single project at first so that we have consistent numbering. Then when we move issues around in GitLab, the numbers will change, but there will be a note in the "legacy" tickets pointing to the new one.
I don't know how else you could implement those constraints otherwise.
Ahf is working on that already. I think the idea is to have gaps in ticket numbers in projects to be able to fullfill this requirement.
We need to find a way to get this requirement (ticket number unique across tor project group but in its own project) with tickets in its own project. Check the plan document to see the structure we are proposing (it is at the end of the document)
The process I'm suggesting (import everything in a single project and move in a separate projects in a subsequent operation) fulfills this requirement.
Ahf is working on that already. I think the idea is to have gaps in ticket numbers in projects to be able to fullfill this requirement.
I don't think it does. It will work for a single project (say the tor little t project), but it can't work for all.
Just to be clear, I'm fine with having tickets split up in different projects. I just don't think it's possible to have redirections working if we split them up at import time.
The answer is: there's no way to know, short of making an explicit, 40 thousand long list of redirections. I think that's deeply impractical, and counter to the spirit of the requirement.
Instead, what I am proposing is this: tickets #1 and #2 (closed) would map into:
Issue #1 in GitLab would have a label "component: Core tor/tor" and #2 (closed) would have a label "component: Internal servives/Services Admin team". Then a post-processing script, which can easily be made by only talking with the GitLab API, moves those tickets to the right project, their final destination stated above:
Yes. I understand the problem you are describing and the solution you have. And I'm not sure how we are going to have something usable in gitlab with all the issues in one project (legacy in your example).
We could have all trac issues in a 'legacy project' and then any new issue in its own project (the structure that we proposed in the gitlab migration document). But still will make it hard to manage issues that way.
Right now we have (as a way to test) a project Scalability (https://dip.torproject.org/torproject/scalability) that is at the base of the Tor project group and is shared with other groups (like metrics and core). Still we can not add issues from scalability to the metrics and core kanban boards...
I understand the problem but I do not think the legacy project is a solution that works for us.
Yes. I understand the problem you are describing and the solution you have. And I'm not sure how we are going to have something usable in gitlab with all the issues in one project (legacy in your example).
We could have all trac issues in a 'legacy project' and then any new issue in its own project (the structure that we proposed in the gitlab migration document). But still will make it hard to manage issues that way.
You're absolutely right: it would be awful to have all tickets in the same project in GitLab.
That's not what I'm proposing here.
What I am proposing is that we import all tickets in the same legacy project BUT we then move each ticket to the right project outside of legacy.
The goal of importing everything in the same project is to make redirections workable. Without this, we have to guess, on the redirection side, which project the ticket ended up in. This could be quite difficult to implement and will lead a complex redirection system. We're lucky enough to have a "flat" numbering space for the ticket numbers in Trac (there's only one list of tickets), so it would be great to have the same thing on GitLab's side.
By importing all tickets in the same project and then moving them, we accomplish this: the redirector can point to the legacy project, which in turn will point to the right project the issue has been moved to. I think it's a win-win...
I understand the problem but I do not think the legacy project is a solution that works for us.
... but I'm ready to accept that as well. It's the best solution I can think of but I'd be happy to hear about possible alternatives. The only one I can think of is to have an explicit list of ticket N -> GITLAB_PROJECT_NAME/Y with ~40,000 entries, and I think that would be a pain in the ass to create and carry around forever. :)
In summary, I agree with you that having all tickets in the same project is not workable, and that's not what I'm suggesting.
I added comments in the GitLab migration plan. The gist of my modifications is as follows:
added the migration itself as a "challenge"
added "milestones" as a possible solution for "ticket relationships"
added details and possible solutions for the irc bot problem
added another possible solution (OpenPGP signatures on commits and tags) to the "gitolite" problem
expanded on the CI section (we will still use jenkins at first)
i'm not sure it's totally accurate to say trac is unmaintained upstream. the 1.2.x branch had a release about a month ago (aug. 2019) and they also released a new stable branch (1.4) not long ago... so it's still maintained
also outlined that Trac also uses javascript in the table
finally, i think i identified a new issue with git repository redirections:
== New issue: git repository redirections
Finally, i'm a little confused about the way the group/project namespace is organized... i see that everything seems to be under "torproject/foo" except "web/foo" and i wonder why it's been done that way. I would definitely put stuff under tpa/* for example, and have one project per service, with all the service admins stuff under services/ maybe?
I'm not sure how best to organize this, but having "everything under torproject/ except not quite" doesn't seem like a great match ;) Couldn't we replicate the hierarchy from https://gitweb.torproject.org/ ? that would make git repository redirections much easier...
Note that renaming projects in gitlab is cheap and reliable (it keeps redirects) so we can also fix this later if we need to, i think, but i'd like to get it right, at least in terms of redirections. After all, we don't want to tell people that all their git URLs are broken now