Opened 4 months ago

Last modified 5 weeks ago

#30857 new project

migrate (some projects? everything?) from trac to gitlab

Reported by: anarcat Owned by:
Priority: Medium Milestone:
Component: Internal Services/Services Admin Team Version:
Severity: Normal Keywords: tickets-migration
Cc: gk, intrigeri, hiro, gaba, mcs, boklm Actual Points:
Parent ID: #29400 Points:
Reviewer: Sponsor:

Description (last modified by anarcat)

Having both Trac and GitLab for TPO is not desirable in the long term, both for maintenance and consistency across projects.

If GitLab is okay for people, we should consider migrating to it and turning off (or turning into a static website) this Trac instance.

This ticket explores the practicalities behind this project.

Child Tickets

TicketStatusOwnerSummaryComponent
#31052newqbiGuest accounts in the ticketing systemInternal Services/Services Admin Team
#31690newqbistudy trac.torproject.org archival possibilitiesInternal Services/Service - trac

Change History (50)

comment:1 Changed 4 months ago by gaba

Cc: gaba added

comment:2 Changed 4 months ago by mcs

Cc: mcs added

comment:3 Changed 4 months ago by anarcat

there are four possible ways to go forward here:

  1. stop GitLab: assume that we don't, after all, want to use GitLab and shut it down. I am not sure anyone is actually proposing this, but I'm putting it out there.
  1. status quo: no or piecemeal migration. dip/gitlab exists and teams migrate organically or not at all, when/if they want, and we keep trac forever. probably not acceptable.
  1. migrate team by team: a) pick a team. b) convince team to migrate. c) migrate all issues, code and wiki pages to gitlab d) move on to next team e) congrats, you're done. seems like the ideal plan to me, because it can be done incrementally and allows for progressive testing and ironing out of issues. might be difficult to automate.
  1. migrate in one shot. we just bite the bullet and migrate everything and everyone all at once, with a flag day when Trac becomes readonly. radical solution. might be faster and easier to perform than the other solution (less labor) but is much riskier because, if things break, we need to fix them VERY FAST NOW and people will/may be upset

In the parent ticket, I mentioned tracboat as a tool that might be used to migrate from Trac to GitLab. I am not sure it supports migrating one project/component at a time, at least it's not obvious how to do so in the documentation.

Another problem is how to deal with Trac in the long term. A complete migration wouldn't be complete if Trac still requires maintenance. For this, I see those options:

  1. the golden redirect set: every migrated ticket and wiki page has a corresponding ticket/wiki page in GitLab and a gigantic set of redirection rules makes sure they are mapped correctly. probably impractical, but solves the maintenance problem possibly forever.
  1. read-only Trac: user creation is disabled and existing users are locked from making any change to the site. only a temporary or intermediate measure.
  1. fossilization: Trac is turned into a static HTML site that can be mirrored like any other site. can be a long term solution and a good compromise with a possibly impossible to design and therefore failing (because incomplete) set of redirection rules.
  1. destruction: we hate the web and pretend link rot is not a problem and just get rid of the old site, assuming everything is migrated and people will find their stuff eventually. probably not an option.

As a safety precaution, I have already started step 3, in a way. I am working with Archive Team to send a copy of the Trac website into the internet archive, thanks to archivebot. This will also allow us to build a good "ignore set", a list of patterns to ignore to avoid getting lost in the website when/if we decide to create a static HTML copy. It's also a good practice to have a backup of all of our stuff in the internet archive.

This currently consists of two crawl jobs:

  • https://trac.torproject.org/ - just feed the site into wpull (this is what archivebot does, basically) and tweak the ignores to skip the nastier stuff. Ignore set currently includes:
    • ^https?://trac\.torproject\.org/projects/tor/wiki/.*[?&]action=diff(&|$): diffs covered by previous revisions
    • ^https?://trac\.torproject\.org/projects/tor/timeline\?: requires login
    • ^https?://trac\.torproject\.org/projects/tor/query.*[?&]order=(?!priority): tries to avoid cardinality explosion in sorted results
    • ^https?://trac\.torproject\.org/projects/tor/query.*[&?]desc=1: skip reverse-sorted entries, redundant
    • ^https?://trac\.torproject\.org/projects/tor/changeset: changesets not configured in Trac
    • ^https?://gitweb\.torproject\.org/: not under threat right now
    • ^https?://trac\.torproject\.org/projects/tor/query\?(.*&)?(reporter|priority|component|severity|cc|owner|version)=
  • a single pull of each ticket, from 1 to the latest ticket (#30856 at the time of writing, full list)

This is being coordinated on the #archivebot channel in efnet.

Last edited 4 months ago by anarcat (previous) (diff)

comment:4 Changed 4 months ago by anarcat

Update: Fossilization seems less and less practical. The archivebot jobs are yielding large results, with an archive of only the tickets (https://trac.torproject.org/projects/tor/ticket/\d+) at 400MB after 6000 tickets (1/5th of the tickets), which would yield around 2GB, excluding the wiki. The full crawl is close to 1GB with at least less than 10% of the crawl done.

Therefore a full static copy of the Trac website would be at least 10GB, quite impractical. It might be worth looking into proper redirects or whether it's acceptable to have those links broken. Alternatively, we could simply redirect to IA or assume people will look there for missing bits.

Last edited 4 months ago by anarcat (previous) (diff)

comment:5 Changed 4 months ago by atagar

Hi anarcat. Why is 10 GB impractical? In terms of disk storage that's pretty tiny. Is there something about the content that makes that size problematic?

comment:6 Changed 4 months ago by boklm

Cc: boklm added

comment:7 Changed 4 months ago by anarcat

10GB was a low-end estimate. The final crawl will be much, much larger. Just to give you a range, the crawl of each ticket page was completed, here:

https://archive.fart.website/archivebot/viewer/job/5vytc

It's around 700MB, compressed. That might seem like a lot, but that's just for the tickets. The crawl job for the entire Trac site is still ongoing, and is currently at 40GB, with 160,000 URLs crawled, and still 500,000 more to go, so we can assume it will be at least 200GB, but we just don't really know until the crawl is finished (because each new page can yield new links).

The problem is not 10GB, it's the 200GB or 500GB or more. :) Maybe it's fine to have such a large dataset around forever, but from other experience, I see we have trouble holding on to that stuff (see for example the problems we have with archive.tpo now, in #29697).

So, TL;DR: it's not 10GB. It will be closer to 200GB, maybe a terabyte, for the full crawl.

comment:8 Changed 4 months ago by teor

Do we know *why* it's so big?

Is each page large, or have we missed a cardinality explosion somewhere?
Are there some elements we can strip out of each page?
Are there some links or sections we can ignore (for example, ticket queries?)

comment:9 Changed 4 months ago by anarcat

I don't know, but there are a few theories.

Archivebot crawls outgoing links as well, and there are probably a lot of those in the wiki. It crawls only one level out, but it probably still adds up. A simpler crawl would obviously be smaller.

There are probably cardinality explosions and things we can ignore. You are welcome to contribute to improving the crawl, still in progress here:

http://dashboard.at.ninjawedding.org/?showNicks=1

grep for trac or torproject.

It's now at 400k links parsed and still 400k to go, and 47GB.

comment:10 Changed 4 months ago by atagar

Hi anarcat. This ticket's description argues against having both trac and gitlab, but does not explain why we're migrating to gitlab. Where was this decided? Mind pointing this ticket toward that thread?

Last year there was some discussion regarding github, but my understanding was that we're keeping our tpo git and trac instances as they are.

comment:11 Changed 4 months ago by anarcat

So while I personnally believe we should migrate to GitLab for a variety of reasons, we may want to keep running Trac forever instead. I suspect that will not be the case, but I'm open to the idea.

What I am strongly against, however, is running *both* software indefinitely. They are *both* complex pieces of machinery (GitLab maybe even more so) and it would me nonsensical to run them both at the same time. That would make basically *everyone* unhappy: people unhappy with trac or GitLab would still have to deal with it, and I, as a sysadmin, would still need to maintain both as well. That's the "status quo" option 1, above, and I really think it's a bad idea.

If we're going to use GitLab, we should migrate. I don't think it's reasonable to maintain both services forever. I must admit that I was assuming that, by setting up the GitLab instance, a decision had been made to migrate as well, but you are right that this decision hasn't been clearly made yet either.

Therefore, this is also a space to have that discussion. I have heard rumours of concern about GitLab, but nothing clearly substanciated yet.

Also, just to keep the idea open and recognize a decision hasn't actually been made, I'll add the "option zero" of not using GitLab at all and shutting down the experiment. I feel that's also the wrong way to go as people are generally enthusiastic about the project, but I'll keep an open mind. :)

comment:12 Changed 4 months ago by anarcat

Description: modified (diff)

comment:13 Changed 4 months ago by hiro

The wiki of trac can be easily redirect without a gigantic redirect file because it can be set in the section and page directly.

Tickets are a different story.

Gitlab is also organized in projects and we have been using Trac with tags. We might not have a complete mapping between the two that doesn't overlap two projects and we might have to make some hard choices.

Furthermore when we did the last survey about trac vs github a few years ago we talked that trac links had been used as references for papers so preserving those was a hard requirement.

I am personally for making trac read only and above all not searchable. That way we save a lot of resources and we still preserve old tickets.

Finally, I think we should start to migrate active tickets of certain projects only at this point, so that we don't go through a radical switch from one system to another, also while we just freeze old tickets.

comment:14 Changed 4 months ago by anarcat

The wiki of trac can be easily redirect without a gigantic redirect file because it can be set in the section and page directly.

Tickets are a different story.

Gitlab is also organized in projects and we have been using Trac with tags. We might not have a complete mapping between the two that doesn't overlap two projects and we might have to make some hard choices.

I think it's a similar problem, actually: every ticket, every wiki page is in this single project in Trac. I doubt that it makes sense to keep the same "one gigantic wiki" approach if/when we migrate to GitLab: each project or team could have its own wiki...

So we will *probably* want to split wikis *and* tickets up by "component" or some sort of delimiter. This could be done in the migration or after, in GitLab.

Furthermore when we did the last survey about trac vs github a few years ago we talked that trac links had been used as references for papers so preserving those was a hard requirement.

Yes, that was my understanding as well. One thing I am thinking of is to make sure that, in the migration, the original URL of the migrated page (ticket or wiki) is retained somewhere in GitLab so that it can be searched. That way we could have a redirection that finds that stuff more easily. I don't know how practical that can be, but that's the kind of stuff we'll find out about when we start working on the migration.

I am personally for making trac read only and above all not searchable. That way we save a lot of resources and we still preserve old tickets.

Well, maybe I'm not familiar enough with Trac, but what does that *actually* mean? We might be able to disable all users and disable user registration, but then people can still search for tickets and crawl the website and cause trouble. If we disable all queries, then people can't find tickets any more, and I'm not even sure we can just disable searching like that.

In any case, this all means maintaining Trac forever: "readonly" still means "Trac is installed, running, upgraded and maintained", and I would very much like to stop doing that eventually.

Finally, I think we should start to migrate active tickets of certain projects only at this point, so that we don't go through a radical switch from one system to another, also while we just freeze old tickets.

So my enquiries so far at the migration systems is they (well, "it", really) proceed in one big batch, per Trac project. Because we have a single Trac project, it will actually be pretty difficult to migrate tickets one at a time: I suspect that will not be possible at all, and especially tricky if we want to retain ticket number associations.

To put it quite bluntly, we're need to shit or get off the pot here at some point. :) Maybe a few teams can start using it for new projects: the website stuff is a good example. Or small projects can be migrated if they don't mind losing ticket references.

But if ticket portability is that critical, I think the only way this can be ensured in the long term is to do a proper migration, with Trac ticket metadata embedded inside GitLab tickets.

Because there's no way we can keep maintaining Trac forever and I am not sure at all we'll be able to permanently archive it. That's still an open question, mind you, but I can't help but feel that if we're going to migrate tickets anyways, we might as well do it correctly...

comment:15 Changed 4 months ago by anarcat

and before everyone goes off the rails freaking out about me shutting down Trac tomorrow forever, please:

💖 💖 💖 😅 DON'T PANIC! 😛 😻 💖 💖 💖

the main objective in opening this ticket is to brainstorm and document how to *possibly* migrate from Trac to GitLab. it's going to take some time and we'll get to talk about it.

if you have objections, they are welcome and you can state them here! but please stay calm, we're doing this for the win and hopefully make everyone happier with our tools, not the opposite. :)

comment:16 Changed 4 months ago by anarcat

some more numbers on trac:

  • ~1GB of attachments
  • 4GB pgsql database

the actual server uses around 25GB of disk space because of random junk here and there but that's the very minimum it can be trimmed down to. naturally, we can keep *that* data forever, the problem is keeping the app running on top of that...

comment:17 Changed 4 months ago by atagar

I must admit that I was assuming that, by setting up the GitLab instance, a decision had been made to migrate as well, but you are right that this decision hasn't been clearly made yet either.

Therefore, this is also a space to have that discussion. I have heard rumours of concern about GitLab, but nothing clearly substanciated yet.

Ahhh! Thank you anarcat, this makes a lot more sense.

If some folks prefer GitLab that's great! But migrating us away from Trac is not a decision to be taken lightly, and requires community buy-in.

I was around a decade ago for our migration to Trac, and what you're proposing is a big move that impacts us all. Especially if you want to propose shutting Trac down without redirects. As I see it there's three open questions...

  1. Do we want to migrate away from Trac at all?
  2. If so, what would we prefer to move to? GitHub? GitLab? Something else?
  3. What will happen with Trac's ticket and wiki data?

This ticket is not the proper place develop consensus on such a large move. If you'd care to pursue this I'd suggest...

  1. Open the GitLab instance up. I tried to look at https://dip.torproject.org/ to see what my projects look like on it but I'm presented with a login page. As an open source developer this makes it DOA right from the starting gate. :)
  1. Begin a thread on tor-project@ to see how the community feels about this. I suspect if we move at all folks will prefer GitHub to GitLab, but I'm definitely curious to see what people think.

comment:18 Changed 4 months ago by anarcat

If some folks prefer GitLab that's great! But migrating us away from Trac is not a decision to be taken lightly, and requires community buy-in.

I totally agree. I consider we're at the "feasability study" stage. :)

I was around a decade ago for our migration to Trac, and what you're proposing is a big move that impacts us all. Especially if you want to propose shutting Trac down without redirects. As I see it there's three open questions...

For the record, I really, really want to have redirects, if we can't archive the entire website. I understand it's a solid requirement.

  1. Do we want to migrate away from Trac at all?
  2. If so, what would we prefer to move to? GitHub? GitLab? Something else?
  3. What will happen with Trac's ticket and wiki data?

I think there are some answers to this above, from my perspective, but this should definitely be discussed more widely, once we have a clearer idea of a possible way forward.

This ticket is not the proper place develop consensus on such a large move. If you'd care to pursue this I'd suggest...

  1. Open the GitLab instance up. I tried to look at ​https://dip.torproject.org/ to see what my projects look like on it but I'm presented with a login page. As an open source developer this makes it DOA right from the starting gate. :)

It's already kind of open, here: https://dip.torproject.org/explore

We could improve the splash page, for sure.

  1. Begin a thread on tor-project@ to see how the community feels about this. I suspect if we move at all folks will prefer GitHub to GitLab, but I'm definitely curious to see what people think.

Hear hear. I'm all for discussing this more widely, but I also think it's a good idea to have a plan first.

I intend to research the topic a little more, maybe do a few actual tests (archiving trac into HTML, testing a migration to a test gitlab project or fake instance) to see how a migration could look like and/or how much time it would take. Then we can come up with something more concrete that people will understand better than the current vague idea of where we're going. :)

More concretely, I'm thinking of writing a design doc for the migration, hopefully it will make everything and the options a little more concrete. Trac was one of the first thing added to my priority list when I was hired at TPI, three months ago, and it's still high on my radar. I haven't had any concrete bug reports other than the occasional "trac is slow" which is generally transient, so it's hard to figure out what the next step is. But people are getting more and more aggravated about the service and I think we need to start to think about the exit strategy...

How does that sound?

One thing I don't really want is a huge flamewar/bikeshed on this, so i think doing this research is definitely useful.

Also note that this ticket is part of #29400 which explicitely says:

We are going to evaluate Gitlab as a replacement for trac, gitweb.tpo, git-rw.tpo, github.com.

Then it is mentioned, in the brussels admin meeting minutes that:

Some team (snowflake?) to use gitlab exclusively. move (copy + add link to gl) existing tickets to gitlab service (not by tsa but by gitlab team)

Runners could be provided by anyone. so, it could be done outside of tpa/tpo for evaluation, and if we like it in the end we can add some runners later.

So there's a precedent in the idea of migrating at *least* some teams to GitLab permanently. I'm taking the next step and asking what's going to happen with the rest of trac, because I certainly don't want to keep that technical debt around too long. ;)

comment:19 Changed 4 months ago by atagar

How does that sound?

Great! Think we're completely on the same page. :)

Some team (snowflake?) to use gitlab exclusively... So there's a precedent in the idea of migrating at *least* some teams to GitLab permanently.

Gotcha. My understanding is that Snowflake uses GitLab whereas the Network team and Ooni (?) are moving toward GitHub. Snowflake is tiny by comparison, which is why I suspect if we're going to move at all it will be toward GitHub rather than GitLab. That said, delighted for folks to experiment.

comment:20 Changed 4 months ago by anarcat

Gotcha. My understanding is that Snowflake uses GitLab whereas the Network team and Ooni (?) are moving toward GitHub. Snowflake is tiny by comparison, which is why I suspect if we're going to move at all it will be toward GitHub rather than GitLab. That said, delighted for folks to experiment.

I can't speak for either team, but I do not think there's a consensus there yet, although Ooni do seem to be using GitHub extensively already.

comment:21 in reply to:  19 ; Changed 4 months ago by teor

Replying to atagar:

How does that sound?

Great! Think we're completely on the same page. :)

Some team (snowflake?) to use gitlab exclusively... So there's a precedent in the idea of migrating at *least* some teams to GitLab permanently.

Gotcha. My understanding is that Snowflake uses GitLab whereas the Network team and Ooni (?) are moving toward GitHub. Snowflake is tiny by comparison, which is why I suspect if we're going to move at all it will be toward GitHub rather than GitLab. That said, delighted for folks to experiment.

Ooni use GitHub as their main development platform, including tickets and pull requests.

In the network team, we've tried using GitHub and various GitLab instances for a few different things. But we tend to want to retain control of our git and tickets. So at the moment, we use GitHub as a git mirror, for pull request review, and to trigger branch and pull request CI on Travis and Appveyor. If GitLab can work with Travis and Appveyor, then that would make the transition easier for us. (We also use tor's git and Jenkins, for CI, and to build nightly binaries.)

But I have no idea what GitLab will do for us. So it's very hard for me to have an informed opinion on any transition.

Can we please create a list of:

  • the things GitLab MUST give us
    • these are our acceptance criteria: if the migration doesn't do the thing, we should roll it back and try again
  • the things GitLab SHOULD NOT give us:
    • these are our known sacrifices: if the migration loses the thing, we agree to accept it anyway

Anything not listed might be included, if it's easy to do. But we can't rely on it.

Who can create a list like this, and when can we have it ready?
(Or is there an existing list?)

comment:22 in reply to:  19 Changed 4 months ago by gaba

Replying to atagar:

How does that sound?

Great! Think we're completely on the same page. :)

Some team (snowflake?) to use gitlab exclusively... So there's a precedent in the idea of migrating at *least* some teams to GitLab permanently.

Gotcha. My understanding is that Snowflake uses GitLab whereas the Network team and Ooni (?) are moving toward GitHub. Snowflake is tiny by comparison, which is why I suspect if we're going to move at all it will be toward GitHub rather than GitLab. That said, delighted for folks to experiment.

  • Gettor is using gitlab now

About tickets:

I'm up for archiving trac (have it as a read only) with gitlab issues linking historical information about specific tickets that continue from trac.

I agree that this needs to be done component by component.
I agree that there has to be consensus in this migration for it to happen succesful.

Last edited 4 months ago by gaba (previous) (diff)

comment:23 in reply to:  21 ; Changed 4 months ago by gaba

Replying to teor:

Replying to atagar:

How does that sound?

Great! Think we're completely on the same page. :)

Some team (snowflake?) to use gitlab exclusively... So there's a precedent in the idea of migrating at *least* some teams to GitLab permanently.

Gotcha. My understanding is that Snowflake uses GitLab whereas the Network team and Ooni (?) are moving toward GitHub. Snowflake is tiny by comparison, which is why I suspect if we're going to move at all it will be toward GitHub rather than GitLab. That said, delighted for folks to experiment.

Ooni use GitHub as their main development platform, including tickets and pull requests.

In the network team, we've tried using GitHub and various GitLab instances for a few different things. But we tend to want to retain control of our git and tickets. So at the moment, we use GitHub as a git mirror, for pull request review, and to trigger branch and pull request CI on Travis and Appveyor. If GitLab can work with Travis and Appveyor, then that would make the transition easier for us. (We also use tor's git and Jenkins, for CI, and to build nightly binaries.)

But I have no idea what GitLab will do for us. So it's very hard for me to have an informed opinion on any transition.

Can we please create a list of:

  • the things GitLab MUST give us
    • these are our acceptance criteria: if the migration doesn't do the thing, we should roll it back and try again
  • the things GitLab SHOULD NOT give us:
    • these are our known sacrifices: if the migration loses the thing, we agree to accept it anyway

Anything not listed might be included, if it's easy to do. But we can't rely on it.

Who can create a list like this, and when can we have it ready?
(Or is there an existing list?)

ok. It started here: https://nc.riseup.net/s/TYX37BDT4eQfTiW

comment:24 in reply to:  23 ; Changed 4 months ago by teor

Replying to gaba:

Replying to teor:

But I have no idea what GitLab will do for us. So it's very hard for me to have an informed opinion on any transition.

Can we please create a list of:

  • the things GitLab MUST give us
    • these are our acceptance criteria: if the migration doesn't do the thing, we should roll it back and try again
  • the things GitLab SHOULD NOT give us:
    • these are our known sacrifices: if the migration loses the thing, we agree to accept it anyway

Anything not listed might be included, if it's easy to do. But we can't rely on it.

Who can create a list like this, and when can we have it ready?
(Or is there an existing list?)

ok. It started here: https://nc.riseup.net/s/TYX37BDT4eQfTiW

It looks like you've written a wish list: a list of features that we want GitLab to have.
That's a good start.

But I was asking for acceptance criteria for the data migration:

  • a list of features that GitLab must have
  • a list of data that we must migrate from Trac to GitLab

And known sacrifices:

  • a list of features that we use in Trac, that GitLab doesn't have
  • a list of data that we won't migrate to GitLab

Maybe we should set up a table for each feature, so we can track our progress:

  • how important is this feature?
  • who needs this feature?
  • are they happy with the gitlab version of this feature?
  • does trac have it?
  • does gitlab have it?
  • can we migrate the data for this feature?
  • does the data migration actually work?

We should also look at the survey we did in 2018? so we can make sure we are not missing anything.

comment:25 in reply to:  24 Changed 4 months ago by gaba

Replying to teor:

Replying to gaba:

Replying to teor:

But I have no idea what GitLab will do for us. So it's very hard for me to have an informed opinion on any transition.

Can we please create a list of:

  • the things GitLab MUST give us
    • these are our acceptance criteria: if the migration doesn't do the thing, we should roll it back and try again
  • the things GitLab SHOULD NOT give us:
    • these are our known sacrifices: if the migration loses the thing, we agree to accept it anyway

Anything not listed might be included, if it's easy to do. But we can't rely on it.

Who can create a list like this, and when can we have it ready?
(Or is there an existing list?)

ok. It started here: https://nc.riseup.net/s/TYX37BDT4eQfTiW

It looks like you've written a wish list: a list of features that we want GitLab to have.
That's a good start.

Actually I was mostly thinking about the things that we *already* have in gitlab that are things we use in trac or we can use when we migrate.

But I was asking for acceptance criteria for the data migration:

  • a list of features that GitLab must have
  • a list of data that we must migrate from Trac to GitLab

And known sacrifices:

  • a list of features that we use in Trac, that GitLab doesn't have
  • a list of data that we won't migrate to GitLab

Maybe we should set up a table for each feature, so we can track our progress:

  • how important is this feature?
  • who needs this feature?
  • are they happy with the gitlab version of this feature?
  • does trac have it?
  • does gitlab have it?
  • can we migrate the data for this feature?
  • does the data migration actually work?

Sounds good.

We should also look at the survey we did in 2018? so we can make sure we are not missing anything.

I saw the survey data before at the end of 2018 but when I went to look for it recently I couldn't find it. Maybe it was in a riseup pad...

comment:26 Changed 4 months ago by gaba

I added the survey data to the document.

So far it seems that we only have 1 feature from trac that can not migrate into gitlab (the parent/child relationship between tickets) but we can have something similar that is adding relationship (links) between tickets.

comment:27 Changed 4 months ago by anarcat

most of the time, i use parent/child relationships as just that, a relationship, not specifically for a hierarchy. this could easily be replaced by just mentioning tickets in the summary. for more elaborate things, the milestone support for gitlab is enough, imho.

comment:28 Changed 4 months ago by gaba

There is other one big issue to resolve in gitlab. Right now people need to have an account in gitlab to be able to fill new issues. We need anybody to be able to create issues in gitlab (cypherpunks account in trac).

comment:29 in reply to:  17 Changed 4 months ago by hiro

Replying to atagar:

I must admit that I was assuming that, by setting up the GitLab instance, a decision had been made to migrate as well, but you are right that this decision hasn't been clearly made yet either.

Therefore, this is also a space to have that discussion. I have heard rumours of concern about GitLab, but nothing clearly substanciated yet.

Ahhh! Thank you anarcat, this makes a lot more sense.

If some folks prefer GitLab that's great! But migrating us away from Trac is not a decision to be taken lightly, and requires community buy-in.

I was around a decade ago for our migration to Trac, and what you're proposing is a big move that impacts us all. Especially if you want to propose shutting Trac down without redirects. As I see it there's three open questions...

  1. Do we want to migrate away from Trac at all?
  2. If so, what would we prefer to move to? GitHub? GitLab? Something else?
  3. What will happen with Trac's ticket and wiki data?

This ticket is not the proper place develop consensus on such a large move. If you'd care to pursue this I'd suggest...

  1. Open the GitLab instance up. I tried to look at https://dip.torproject.org/ to see what my projects look like on it but I'm presented with a login page. As an open source developer this makes it DOA right from the starting gate. :)

Atagar, projects have not been migrated yet, but you can still check how everything looks like by using your tpo email and request a password reset.

comment:30 Changed 4 months ago by anarcat

There is other one big issue to resolve in gitlab. Right now people need to have an account in gitlab to be able to fill new issues. We need anybody to be able to create issues in gitlab (cypherpunks account in trac).

What's "cypherpunks account in Trac"?

We could just open registration on GitLab. We need to keep in mind this could create exactly the same kind of issue we're having right now in Trac, namely that we have thousands of "junk" users (see #29420).

comment:31 in reply to:  27 Changed 4 months ago by teor

Replying to anarcat:

most of the time, i use parent/child relationships as just that, a relationship, not specifically for a hierarchy. this could easily be replaced by just mentioning tickets in the summary. for more elaborate things, the milestone support for gitlab is enough, imho.

I use parent/child relationships to get an automatically updated list of related tickets. I don't know if GitLab does that.

I also use queries on wiki pages to get automatically updating lists of tickets. For example:
https://trac.torproject.org/projects/tor/wiki/user/teor

I would miss this feature if it went away. I will add it to the list of features we want.

comment:32 in reply to:  30 Changed 4 months ago by teor

Replying to anarcat:

There is other one big issue to resolve in gitlab. Right now people need to have an account in gitlab to be able to fill new issues. We need anybody to be able to create issues in gitlab (cypherpunks account in trac).

What's "cypherpunks account in Trac"?

A shared anonymous account:
https://trac.torproject.org/projects/tor/wiki/WikiStart#UnofficialDocumentation

We could just open registration on GitLab. We need to keep in mind this could create exactly the same kind of issue we're having right now in Trac, namely that we have thousands of "junk" users (see #29420).

I'd like to know what our solution is for account and form spammers.

I've added a "need to check" section to the document, and moved some of the features to that section.

comment:33 Changed 4 months ago by teor

If I remember correctly, we can't move tickets between projects in gitlab. I've added it to the list as something to check.

I checked the GitLab integrations, Jenkins works, but Travis and Appveyor don't. So we need to keep mirroring git to GitHub for Travis and Appveyor.

I also wonder how many GitLab CI runner machines we will have, and who will pay for them.

comment:34 Changed 4 months ago by anarcat

If I remember correctly, we can't move tickets between projects in gitlab. I've added it to the list as something to check.

I think that was solved recently:

https://about.gitlab.com/2016/04/20/feature-highlight-move-issues/

I checked the GitLab integrations, Jenkins works, but Travis and Appveyor don't. So we need to keep mirroring git to GitHub for Travis and Appveyor.

Yeah, I don't know if we're talkinga bout getting rid of GitHub here yet. One thing at a time. GitLab CI, however, might allow us to replace Travis eventually...

I also wonder how many GitLab CI runner machines we will have, and who will pay for them.

... however this is also out of scope for now. We're talking abuot (possibly) replacing Trac, not Trac and Jenkins all at once. One thing at a time. :)

(That said, if we do eventually replace jenkins with Gitlab CI, the builders we have now can just be repurposed for GitLab CI. It's all hardware in the end.)

So, TL;DR: no runners yet, as far as I know, but if people want to provision external ones, the sweet thing is this can be done without intervention from TPA or GitLab admins...

Last edited 4 months ago by anarcat (previous) (diff)

comment:35 Changed 3 months ago by anarcat

just a quick mention that gitlab could solve a request for private wikis we had about a year ago: #26714

comment:37 Changed 2 months ago by gaba

Trying to convene a plan for gitlab here: https://nc.riseup.net/s/SnQy3yMJewRBwA7

comment:38 Changed 5 weeks ago by anarcat

FWIW, I created a subticket for the trac archival questions, which are relevant regardless of whether we switch to gitlab or not, see #31690.

comment:39 Changed 5 weeks ago by gaba

thanks!

There are a few blockers (from network team people) about this migration:

1) ticket number preservation

They want to not have collition between trac ticket numbers and gitlab issue numbers. That would mean to have new numbers for new tickets when starting to use gitlab officially.

2) add all tickets (including closed ones)

They want to have ALL tickets from trac in gitlab to preserve the history of Tor in one place.

3) get all info from each ticket into an issue (including comments in the trac ticket addded as a 'trac user' to the gitlab issue)

This would mean to have each comment from each trac ticket as a comment in the gitlab issue. The possible solution would be to have a 'trac user' in gitlab that is the one making all the comments that are being migrated from trac.

If we are including this 3 points in the migration then we do not need to archive trac and it could be decomission once the migration is complete.

comment:40 Changed 5 weeks ago by gaba

Keywords: tickets-migration added

comment:41 Changed 5 weeks ago by anarcat

  1. ticket number preservation

Agreed. I think it would be essential to keep that. Any self-respecting migration tool should allow us to "dump" all the trac tickets into a (single!) GitLab project, keeping ticket numbers.

They want to not have collition between trac ticket numbers and gitlab issue numbers.

This, however, seems to say something else: does it mean that we don't want Trac ticket #1 to be the same ticket as GitLab ticket #1? That would be in contradiction with "ticket number preservation" in my mind.

That would mean to have new numbers for new tickets when starting to use gitlab officially.

I interpret this as meaning that, assuming we migrate Trac tickets from 1 to N when Trac is made readonly (for the migration, it can be turned off after), the next ticket in gitlab will be N+1?

2) add all tickets (including closed ones)

They want to have ALL tickets from trac in gitlab to preserve the history of Tor in one place.

Sure, that should be done. Then we have this "legacy" gitlab project with a humongous pile of tickets like we have in Trac right now, but we can "split" those up as needed by moving tickets around with the API.

3) get all info from each ticket into an issue (including comments in the trac ticket addded as a 'trac user' to the gitlab issue)

This would mean to have each comment from each trac ticket as a comment in the gitlab issue. The possible solution would be to have a 'trac user' in gitlab that is the one making all the comments that are being migrated from trac.

That makes sense as well, I'd be happy to see that happen, and I think this is all the kind of stuff Tracboat should do.

I would still put Trac readonly during and after the migration, then do one last archival to the Internet archive. I would then create a "redirection site" that would do things like:

https://trac.torproject.org/projects/tor/ticket/N -> https://dip.tracproject.org/tor/legacy/issues/N
https://trac.torproject.org/projects/tor/wiki/PAGE -> https://dip.tracproject.org/tor/legacy/wiki/PAGE
(...anything else?)

And *then* trac can be totally decommissioned (although I would keep backups for a while, just to be sure, of course, but that's part of our decommissioning procedure anyways.

comment:42 in reply to:  41 Changed 5 weeks ago by gaba

Replying to anarcat:

  1. ticket number preservation

Agreed. I think it would be essential to keep that. Any self-respecting migration tool should allow us to "dump" all the trac tickets into a (single!) GitLab project, keeping ticket numbers.

Tickets will be imported by team/project. It will not work for us to have ALL trac tickets in one project in gitlab.

And that brings me the question on where are we going to have sysadmin tickets in gitlab? I was thinking as its own group in gitlab but you may have other idea for it.

They want to not have collition between trac ticket numbers and gitlab issue numbers.

This, however, seems to say something else: does it mean that we don't want Trac ticket #1 to be the same ticket as GitLab ticket #1? That would be in contradiction with "ticket number preservation" in my mind.

Sorry that I was not clear. Any new ticket in gitlab will have a number that has not being assigned in trac yet. We preserve the number for tickets that already exist.

That would mean to have new numbers for new tickets when starting to use gitlab officially.

I interpret this as meaning that, assuming we migrate Trac tickets from 1 to N when Trac is made readonly (for the migration, it can be turned off after), the next ticket in gitlab will be N+1?

Yes.

2) add all tickets (including closed ones)

They want to have ALL tickets from trac in gitlab to preserve the history of Tor in one place.

Sure, that should be done. Then we have this "legacy" gitlab project with a humongous pile of tickets like we have in Trac right now, but we can "split" those up as needed by moving tickets around with the API.

3) get all info from each ticket into an issue (including comments in the trac ticket addded as a 'trac user' to the gitlab issue)

This would mean to have each comment from each trac ticket as a comment in the gitlab issue. The possible solution would be to have a 'trac user' in gitlab that is the one making all the comments that are being migrated from trac.

That makes sense as well, I'd be happy to see that happen, and I think this is all the kind of stuff Tracboat should do.

I would still put Trac readonly during and after the migration, then do one last archival to the Internet archive. I would then create a "redirection site" that would do things like:

https://trac.torproject.org/projects/tor/ticket/N -> https://dip.tracproject.org/tor/legacy/issues/N
https://trac.torproject.org/projects/tor/wiki/PAGE -> https://dip.tracproject.org/tor/legacy/wiki/PAGE
(...anything else?)

And *then* trac can be totally decommissioned (although I would keep backups for a while, just to be sure, of course, but that's part of our decommissioning procedure anyways.

Yes.

comment:43 in reply to:  28 Changed 5 weeks ago by cypherpunks

Replying to gaba:

There is other one big issue to resolve in gitlab. Right now people need to have an account in gitlab to be able to fill new issues. We need anybody to be able to create issues in gitlab (cypherpunks account in trac).

I feel like this is one of the most important issues, given how Github and Gitlab treat Tor users (try to make an account on Gitlab with a throwaway mail using Tor).

comment:44 Changed 5 weeks ago by anarcat

Tickets will be imported by team/project. It will not work for us to have ALL trac tickets in one project in gitlab.

I don't see why that has to be the case. We could (more!) easily import everything in a single project and then, post-import, split tickets up between projects.

Not doing so will make it impossible to fill that first requirement, as there will not be a stable URL on GitLab's side for ticket #1234 from Trac.

And that brings me the question on where are we going to have sysadmin tickets in gitlab? I was thinking as its own group in gitlab but you may have other idea for it.

Sure, they can be moved to its own group after the import, like everything else.

Sorry that I was not clear. Any new ticket in gitlab will have a number that has not being assigned in trac yet. We preserve the number for tickets that already exist.

Agreed, although you need to understand that ticket numbering is *per project* in GitLab. (Strictly speaking, that's also the case in Trac, but we have only a single project in Trac, while we already have multiple project in GitLab.)

So in practice, we will have multiple #1234 tickets in GitLab. This is why we need to import everything in a single project at first so that we have consistent numbering. *Then* when we move issues around in GitLab, the numbers will change, but there will be a note in the "legacy" tickets pointing to the new one.

I don't know how else you could implement those constraints otherwise.

comment:45 in reply to:  44 Changed 5 weeks ago by gaba

Replying to anarcat:

Tickets will be imported by team/project. It will not work for us to have ALL trac tickets in one project in gitlab.

I don't see why that has to be the case. We could (more!) easily import everything in a single project and then, post-import, split tickets up between projects.

Not doing so will make it impossible to fill that first requirement, as there will not be a stable URL on GitLab's side for ticket #1234 from Trac.

We need to find a way to get this requirement (ticket number unique across tor project group but in its own project) with tickets in its own project. Check the plan document to see the structure we are proposing (it is at the end of the document)

And that brings me the question on where are we going to have sysadmin tickets in gitlab? I was thinking as its own group in gitlab but you may have other idea for it.

Sure, they can be moved to its own group after the import, like everything else.

Sorry that I was not clear. Any new ticket in gitlab will have a number that has not being assigned in trac yet. We preserve the number for tickets that already exist.

Agreed, although you need to understand that ticket numbering is *per project* in GitLab. (Strictly speaking, that's also the case in Trac, but we have only a single project in Trac, while we already have multiple project in GitLab.)

So in practice, we will have multiple #1234 tickets in GitLab. This is why we need to import everything in a single project at first so that we have consistent numbering. *Then* when we move issues around in GitLab, the numbers will change, but there will be a note in the "legacy" tickets pointing to the new one.

I don't know how else you could implement those constraints otherwise.

Ahf is working on that already. I think the idea is to have gaps in ticket numbers in projects to be able to fullfill this requirement.

comment:46 Changed 5 weeks ago by anarcat

We need to find a way to get this requirement (ticket number unique across tor project group but in its own project) with tickets in its own project. Check the plan document to see the structure we are proposing (it is at the end of the document)

The process I'm suggesting (import everything in a single project and move in a separate projects in a subsequent operation) fulfills this requirement.

Ahf is working on that already. I think the idea is to have gaps in ticket numbers in projects to be able to fullfill this requirement.

I don't think it does. It will work for a single project (say the tor little t project), but it can't work for all.

Just to be clear, I'm fine with having tickets split up in different projects. I just don't think it's possible to have redirections working if we split them up at import time.

Say you have:

Under the process you propose, those would map into:

(project names may vary, this is just an example)

How do I map https://trac.torproject.org/projects/tor/ticket/2 to https://dip.torproject.org/tor/sysadmin/issues/2? More generally, how would I know which GitLab project an arbitrary https://trac.torproject.org/projects/tor/ticket/N would map into?

The answer is: there's no way to know, short of making an explicit, 40 thousand long list of redirections. I think that's deeply impractical, and counter to the spirit of the requirement.

Instead, what I am proposing is this: tickets #1 and #2 would map into:

Issue #1 in GitLab would have a label "component: Core tor/tor" and #2 would have a label "component: Internal servives/Services Admin team". Then a post-processing script, which can easily be made by only talking with the GitLab API, moves those tickets to the right project, their final destination stated above:

... but because ticket moves in GitLab leave a trace, we can *still* redirect from:

And we can therefore have a generic redirector that looks like:

https://trac.torproject.org/projects/tor/ticket/N -> https://dip.torproject.org/tor/legacy/issues/N

It's fundamentally the same idea, it just differs as to where we first import the tickets.

comment:47 Changed 5 weeks ago by gaba

Yes. I understand the problem you are describing and the solution you have. And I'm not sure how we are going to have something usable in gitlab with all the issues in one project (legacy in your example).

We could have all trac issues in a 'legacy project' and then any new issue in its own project (the structure that we proposed in the gitlab migration document). But still will make it hard to manage issues that way.

Right now we have (as a way to test) a project Scalability (https://dip.torproject.org/torproject/scalability) that is at the base of the Tor project group and is shared with other groups (like metrics and core). Still we can not add issues from scalability to the metrics and core kanban boards...

I understand the problem but I do not think the legacy project is a solution that works for us.

comment:48 Changed 5 weeks ago by anarcat

Yes. I understand the problem you are describing and the solution you have. And I'm not sure how we are going to have something usable in gitlab with all the issues in one project (legacy in your example).

We could have all trac issues in a 'legacy project' and then any new issue in its own project (the structure that we proposed in the gitlab migration document). But still will make it hard to manage issues that way.

You're absolutely right: it would be awful to have all tickets in the same project in GitLab.

That's not what I'm proposing here.

What I am proposing is that we import all tickets in the same legacy project BUT we then move each ticket to the right project outside of legacy.

The goal of importing everything in the same project is to make redirections workable. Without this, we have to guess, on the redirection side, which project the ticket ended up in. This could be quite difficult to implement and will lead a complex redirection system. We're lucky enough to have a "flat" numbering space for the ticket numbers in Trac (there's only one list of tickets), so it would be great to have the same thing on GitLab's side.

By importing all tickets in the same project and then moving them, we accomplish this: the redirector can point to the legacy project, which in turn will point to the right project the issue has been moved to. I think it's a win-win...

I understand the problem but I do not think the legacy project is a solution that works for us.

... but I'm ready to accept that as well. It's the best solution I can think of but I'd be happy to hear about possible alternatives. The only one I can think of is to have an explicit list of ticket N -> GITLAB_PROJECT_NAME/Y with ~40,000 entries, and I think that would be a pain in the ass to create and carry around forever. :)

In summary, I agree with you that having all tickets in the same project is not workable, and that's not what I'm suggesting.

comment:49 Changed 5 weeks ago by anarcat

I added comments in the GitLab migration plan. The gist of my modifications is as follows:

  1. added the migration itself as a "challenge"
  2. added "milestones" as a possible solution for "ticket relationships"
  3. added details and possible solutions for the irc bot problem
  4. added another possible solution (OpenPGP signatures on commits and tags) to the "gitolite" problem
  5. expanded on the CI section (we will still use jenkins at first)
  6. i'm not sure it's totally accurate to say trac is unmaintained upstream. the 1.2.x branch had a release about a month ago (aug. 2019) and they also released a new stable branch (1.4) not long ago... so it's still maintained
  7. also outlined that Trac also uses javascript in the table
  8. finally, i think i identified a new issue with git repository redirections:

New issue: git repository redirections

Finally, i'm a little confused about the way the group/project namespace is organized... i see that everything seems to be under "torproject/foo" *except* "web/foo" and i wonder why it's been done that way. I would definitely put stuff under tpa/* for example, and have one project per service, with all the service admins stuff under services/ maybe?

I'm not sure how best to organize this, but having "everything under torproject/ except not quite" doesn't seem like a great match ;) Couldn't we replicate the hierarchy from https://gitweb.torproject.org/ ? that would make git repository redirections much easier...

Note that renaming projects in gitlab is cheap and reliable (it keeps redirects) so we can also fix this later if we need to, i think, but i'd like to get it right, at least in terms of redirections. After all, we don't want to tell people that all their git URLs are broken now

comment:50 Changed 5 weeks ago by anarcat

IRC bot issues discussions

To expand on what is discussed in the document about IRC.... There are two things we do with Trac on IRC, as far as I know:

  • "#1234" gets turned into a message by zwiebelbot showing the ticket title, status and URL, this happens in multiple channels
  • another bot ("nsa") announces new commits and Trac ticket changes in #tor-bots

Examples:

09:47:20 <anarcat> test: #1234
09:47:21 -zwiebelbot:#tpo-admin- tor#1234: Exception in Firebug console in FF3.6 Sessionstore - [closed] - https://bugs.torproject.org/1234
09:40:53 <+nsa> or: [Tor Bug Tracker & Wiki] #30857 was updated:  #30857: migrate (some projects? everything?) from trac to gitlab - 
                https://trac.torproject.org/projects/tor/ticket/30857#comment:49
09:40:53 <+nsa> or: Comment (by anarcat):
09:40:53 <+nsa> or:  I added comments in the GitLab migration plan. The gist of my
09:40:53 <+nsa> or:  modifications is as follows:[...]
09:41:43 <+nsa> or: [styleguide/master] b75855b 2019-09-12 13:41:26 hiro <hiro@torproject.org>: Add .github/FUNDING.yml

The above is me pinging the bot for information about ticket #1234, and the NSA bot announcing, without being prompted, a modification to ticket #30857 and a commit from hiro in the styleguide project.

There are multiple projects to do the latter: I wrote one myself based on the irker irc bot (which doesn't work very well):

https://gitlab.com/anarcat/irklab

Another implementation is the "KGB" bot which can interpret GitLab webhooks on its own:

https://salsa.debian.org/kgb-team/kgb

This is what's used on salsa: https://salsa.debian.org/kgb-team/kgb/wikis/usage

Then there are two more similar bots:

https://github.com/chkelly/gitlab-irc
https://github.com/nTraum/gitlab-irc

And finally, GitLab itself has "native" "integration" with irker, provided you set it up somewhere.

All of those (except the native integration) generally work as "webhooks" in that they "ping" (make an HTTP request) to a web server endpoint, which in turns talks to IRC.

We'd need to do this to replace the "nsa" service. A friend implemented this with the KGB bot by setting up a webhook in GitLab that points to his own KGB install. He setup a reverse proxy in Apache with a configuration that looks like this:

ProxyPassReverse /kgb/ http://127.0.0.1:5391/
<Location /kgb/>
   Require expr %{HTTP:X-Gitlab-Token} == 'GITLAB TOKEN HERE'
</Location>

Finally, note that group-level webhooks are a paid feature, which means that we'd need to hook each project to those bots *individually* which is pretty annoying. It should probably be part of the migration to facilitate our lives. Alternatively, I think the debian.org folks wrote commandline shortcuts to configure a project like this automatically.

For the zwiebelbot functionality, something new would need to be implemented. We should check with the Debian folks if that already exists, and if not, it will need to be done ourselves.

comment:51 Changed 5 weeks ago by anarcat

re the webhooks configuration, the Debian Perl team has this program which installs a kgb hook automatically and does other things:

https://manpages.debian.org/buster/pkg-perl-tools/dpt-salsa.1.en.html

it's part of the pkg-perl-tools debian package and might need a little bit of configuration to do what we want, but it's a thing.

Note: See TracTickets for help on using tickets.