Usage of composite type to store the pipelines

Composite column types: Sets of columns can be associated with a single user-defined datatype, which in modern use is normally a Python dataclass. The ORM provides a single attribute which represents the group of columns using the class you provide.

Examples how can this be done:

sqlalchemy_utils CompositeType: a custom SQLAlchemy type designed to work with PostgreSQL's composite types:

class MyCompositeType(CompositeType):
    attribute1 = Column(Integer)
    attribute2 = Column(String)

class MyModel(Base):
    __tablename__ = 'my_table'

    id = Column(Integer, primary_key=True)
    my_composite = Column(MyCompositeType)

sqlalchemy.orm.composite(): allows to define composite types as a Python class:

class MyCompositeType(object):
    def __init__(self, attribute1, attribute2):
        self.attribute1 = attribute1
        self.attribute2 = attribute2

class MyModel(Base):
    __tablename__ = 'my_table'

    id = Column(Integer, primary_key=True)
    composite = composite(MyCompositeType, Column('attribute1', Integer), Column('attribute2', Integer))

subclassing custom class by sqlalchemy.types.TypeDecorator and implementing the necessary conversion methods

@dataclass
class MyCompositeType:
    attribute1: int
    attribute2: int

class MyCompositeTypeDecorator(TypeDecorator):
    impl = SQLInteger

    def process_bind_param(self, value, dialect):
        if value is not None:
            return f"{value.attribute1},{value.attribute2}"

    def process_result_value(self, value, dialect):
        if value is not None:
            attribute1, attribute2 = map(int, value.split(","))
            return MyCompositeType(attribute1, attribute2)

class MyModel(Base):
    __tablename__ = 'my_table'

    id = Column(Integer, primary_key=True)
    composite = Column(MyCompositeTypeDecorator())

For simplification, let's work with a pipeline model that has only SRPM, Copr build and test.

Current pipeline model

Pipeline model in current fashion:

class PipelineModel(Base):
    __tablename__ = "pipelines"
    id = Column(Integer, primary_key=True)
    datetime = Column(DateTime, default=datetime.utcnow)

    job_trigger_id = Column(Integer, ForeignKey("job_triggers.id"))
    job_trigger = relationship("JobTriggerModel", back_populates="runs")

    srpm_build_id = Column(Integer, ForeignKey("srpm_builds.id"), index=True)
    srpm_build = relationship("SRPMBuildModel", back_populates="runs")
    copr_build_group_id = Column(
        Integer, ForeignKey("copr_build_groups.id"), index=True
    )
    copr_build_group = relationship("CoprBuildGroupModel", back_populates="runs")
    test_run_group_id = Column(
        Integer, ForeignKey("tft_test_run_groups.id"), index=True
    )
    test_run_group = relationship("TFTTestRunGroupModel", back_populates="runs")

has the foreign keys to other tables to reference steps of the pipeline (groups), the group then references particular targets, therefore when getting the whole pipeline, join on multiple tables has to happen
with the composite types we would like to solve the current need of doing multiple joins to get the data about one pipeline

1.option

How could the model look like when using composite type on target level:

storing the data for the particular target directly in pipeline
this would be a step back since we already did the grouping refactoring

Issues

1. querying concrete steps of the pipeline

Examples of when this happens:

when updating Copr build status in DB, we get the corresponding builds via the build ID from Copr (index)
when updating TF run status in DB, we get the corresponding test run via the TF pipeline ID (index)
babysit tasks - they get all the pending Copr builds / Test runs
when triggering /packit test - we get the latest Copr build model with corresponding commit SHA

2. data duplication

updating SRPM build would require getting all pipelines and updating the data everywhere

Average number of Copr builds using the same SRPM build

packit=# SELECT AVG(copr_build_target_count) FROM (
    SELECT COUNT(DISTINCT copr_build_targets.id) AS copr_build_target_count
    FROM pipelines
    JOIN copr_build_targets ON pipelines.copr_build_group_id = copr_build_targets.copr_build_group_id
    GROUP BY pipelines.srpm_build_id
) AS copr_targets_count;
        avg
--------------------
 6.4208079446515274
(1 row)

similar situation for Copr builds ⋄ TF runs, since the relationship can be 1:n

Average number of tests using the same Copr build

packit=# SELECT AVG(test_run_count) AS average_test_runs FROM (
    SELECT copr_id, COUNT(DISTINCT tft_id) AS test_run_count
    FROM tf_copr_build_association_table
    GROUP BY copr_id
) AS tf_counts;
 average_test_runs
--------------------
 1.2618628164375168
(1 row)

Usage of composite type to store the pipelines

Current pipeline model

1.option

Issues

1. querying concrete steps of the pipeline

2. data duplication

Average number of Copr builds using the same SRPM build

Average number of tests using the same Copr build

2.option

Issues

1. querying concrete steps of the pipeline

2. more complicated manipulation with the data in general

3. race conditions

4. more complicated implementation

Current pipeline model​

1.option​

Issues​

1. querying concrete steps of the pipeline​

2. data duplication​

Average number of Copr builds using the same SRPM build​

Average number of tests using the same Copr build​

2.option​

Issues​

1. querying concrete steps of the pipeline​

2. more complicated manipulation with the data in general​

3. race conditions​

4. more complicated implementation​

Current pipeline model

1.option

Issues

1. querying concrete steps of the pipeline

2. data duplication

Average number of Copr builds using the same SRPM build

Average number of tests using the same Copr build

2.option

Issues

1. querying concrete steps of the pipeline

2. more complicated manipulation with the data in general

3. race conditions

4. more complicated implementation