Skip to main content

Splitting source-git and upstream

Some ideas, possibilities, pros and cons of moving out the source-git related work.

What does the source-git workflow mean?

Must have:

  • If user creates a merge-request on the source-git repository:
    • Create a matching merge-request to the dist-git repository.
    • Sync the CI results from the dist-git merge-request to the source-git merge-request.
  • If the dist-git is updated, update the source-git repository by opening a PR.
  • User is able to convert source-git change to the dist-git change locally via CLI.

Should have:

  • If the source-git merge-request is updated, update the dist-git merge-request.
  • If the source-git merge-request is closed, close the dist-git merge-request.

Could have:

  • User is able to re-trigger the dist-git CI from the source-git merge-request.
  • User is able to re-create the dist-git MR from the source-git merge-request.

Key questions:

  1. Should we start developing a new service or modify the existing packit-service to be able to deploy only with a gitlab-endpoint?
  2. How to link merge requests in the src namespace to the ones in the rpms namespace using the GitLab API? This should be bidirectional.
  3. Check the GitLab API to learn more about working with merge-trains and pipelines, in order to support the UX for merging source-git MRs (see the doc linked bellow).
  4. How are CI results going to be displayed in dist-git MRs? We wan't to know this so that we can think about ways to take those results and display them for contributors on the source-git MRs.

Split

We have multiple options:

0. no split

  • No extra cost of two deployments and two codebases.
  • New jobs will be implemented as new handlers.
  • We don't have support for multiple identities for one forge.
    • Not hard to do but requires some work.
  • Fedora-source-git friendly: easy combination of events (different for Fedora and Stream) and handlers (=implementation, can be shared).

1. same codebase, new deployment

  • No extra cost of maintenance of two codebases.
  • New jobs are implemented as new handlers.
  • Different identities can be used in one git forge (=gitlab.com).
  • Resources can be tweaked separately.
  • Fedora-source-git friendly: easy combination of events (different for Fedora and Stream) and handlers (=implementation, can be shared).

2. separate workers

  • New jobs are implemented as new handlers in a separate repository.
  • The centos-stream related code is in one place, based-on the packit-service code.
  • We can use same or a separate deployment. (Fedora-source-git can be separated or with Stream.)

3. split the packit-service repo and build upstream/centos-stream workers

  • One repo with the scheduler. Two repositories with the worker(=handlers) definition: one for the upstream, one for the stream.
  • Requires more work.
  • Can lead to a cleaner architecture. Something we were discussing for some time.
    • The centos-stream related code is in one place, upstream code is in one place and the shared code is in one place.
  • Another dependency in the chain.
    • It's sometimes hard to work on the functionality that goes across multiple git projects.
  • We can use same or a separate deployment. (Fedora-source-git can be separated or with Stream.)

4. fork and improve

  • The benefits of the current service code can be preserved.
  • The non-relevant/bad code can be removed.
  • More time needed for development.
  • Improvements relevant for both are hard to sync.
  • More time needed for maintenance.

5. separate project from scratch

  • The new service can be more lightweight and efficient.
  • We can iterate on the prototype more quickly.
  • We can get rid of the old bad staff in the packit-service.
  • We can go through the same pain we've already gone through.
  • We need to maintain two separate projects. (We need more people or reduce the productivity.)
  • We are not motivated to improve the current projects.
  • To share the code between the upstream and source project, we need to create some shared libraries.
    • Can lead to 3. and/or having another project on our dependency chain.

Dashboard

  • For the current goals, there is no need for having a dashboard.
    • GitLab is our interface, and we can link the same result links as the CI in dist-git.

Database

What are the differences in the schema?

  • Event related models can be shared (GitProject, JobTriggerModel, RunModel, PullRequestModel`, ...).
  • Result models are probably not necessary (SRPMBuildModel, CoprBuildModel, ...).
  • Allow/deny list can be preserved.
  • Models for the setup procedure aren't necessary for stream (InstallationModel, ProjectAuthenticationIssueModel).
  • Connection between source-git and dist-git MRs can be done by creating a new join table.
  • If we need to track the results, we need to create a new model for that.

We have multiple ways how we can manage database schema after the split:

One schema for all

  • One schema that works for all use-cases.
  • The schema is defined in one place.
  • Databases can contain unused tables.
  • Schema can be more complicated.

Independent schemas

  • Each schema fits the use-case.
  • Common changes are harder to share.

Multiple alembic branches

Multiple alembic bases

Linking of the merge-requests

We have two goals:

  1. User can easily find the related merge request on the web UI. (In both ways.)
  2. We can get the related merge request from the service.

Some related GitLab issues:

Merge trains

Allow merging multiple MRs in one target branch safely:

  • We put MRs to the queue.
  • For each MR, we run pipeline on the code containing this MR and all before.
  • Pipelines are run in parallel to safe time.

Conclusion:

  • This feature is something like zuul's gating pipeline with auto-rebase done in parallel.
  • It's not meant for cross-project MRs => not useful for us.

Sources:

Multi-project pipelines

Conclusion:

  • Don't give us much benefits. (We need to work with dynamic reference on the second repository.)

Parent-child pipelines

  • https://docs.gitlab.com/ee/ci/parent_child_pipelines.html
  • Pipeline can trigger a set of concurrently running child pipelines within the same project.
    • Child jobs are not dependant on the state of non-related jobs on parent pipeline.
    • Configuration can be split into multiple smaller easy-to-understand parts.
    • Avoids name collisions. (Comparing to pure import.)

Pipelines API

Looks like pipelines need to be defined beforehand. We can manipulate only defined ones.

Pipelines for merge requests are configured. A detached pipeline runs in the context of the merge request, and not against the merged result. Learn more in the documentation for Pipelines for Merged Results.

To support this we can define the pipelines, wait for and get the result from Packit API. (Goes against the current Packit workflow.)

We can also have a custom gitlab runner running our implementation in our infrastructure.

Potentially, we can use only: - external when defining the pipeline and combine it with the commit statuse:

Some related GitLab issues:

Commit status

CI results

Answers to Key questions

  1. Should we start developing a new service or modify the existing packit-service to be able to deploy only with a gitlab-endpoint?
    • If we want to have a clean architecture, we can use version 3 with separate deployments. Version 2 can be done as a middle step.
  2. How to link merge requests in the src namespace to the ones in the rpms namespace using the GitLab API? This should be bidirectional.
    • There is only a GUI to do that. We can use comments via API.
  3. Check the GitLab API to learn more about working with merge-trains and pipelines, in order to support the UX for merging source-git MRs (see the doc linked bellow).
    • Looks like we can't use any GitLab structure to make this automatic, but can provide the UX independently.
    • We can define pipelines or use commit statuses (=detached pipelines).
  4. How are CI results going to be displayed in dist-git MRs? We wan't to know this so that we can think about ways to take those results and display them for contributors on the source-git MRs.
    • Displayed as pipelines.

The plan

  1. Set up a new repository for the stream worker.
    • stream-worker/packit-stream-worker/source-git-worker/...
    • Build the image in quay.
    • Setup zuul and pre-commit.ci.
    • Create a stable branch.
  2. Set up a new deployment repository for stream.
    • Create the new playbooks and share as much as possible with current workflow.
    • We will have only one deployment for stream for start.
    • Increase (=buy) the resources in openshift online.
    • Deploy the stream service to openshift online.
    • Update the script for moving stable branches.
  3. Implement the stream worker.
    • New celery tasks are defined.
    • Implementation is done as new handlers.
    • Start with the really basic version so we can work on deployment ASAP:
      • If user creates a merge-request on the source-git repository, create a matching merge-request to the dist-git repository.
  4. Put the current worker away from the service.
    • packit-worker/packit-upstream-worker/...
    • What about the process_message task? (SPIKE card for that has been created.)
      • Do we want to share it? What about a dedicated worker just for this?
    • Move the build process.
    • Setup zuul and pre-commit.ci.
    • Update deployment if necessary.
    • Update the script for moving stable branches.
    • (Can be done in parallel with other steps.)
  5. Implement Sync the CI results from the dist-git merge-request to the source-git merge-request.
  6. Implement If the dist-git is updated, update the source-git repository by opening a PR.
  7. Implement If the source-git merge-request is updated, update the dist-git merge-request.
  8. Implement If the source-git merge-request is closed, close the dist-git merge-request.