Skip to main content

Counting patches in specfiles

Ways to count the number of patches in the packages in a distribution.

Fedora Rawhide

Download and unpack the specfile archive for the packages currently in Rawhide:

$ curl -LO https://src.fedoraproject.org/repo/rpm-specs-latest.tar.xz
$ tar -xvf rpm-specs-latest.tar.xz

Count the number of of specfiles with a given number of patches:

$ grep -crE '^Patch[0-9]*:' rpm-specs/ | grep --only-matching -E '[0-9]+$' | sort -g | uniq -c

The command above tells how many specfiles there are with 0, 1, 2, ... patches.

CentOS Stream

Use download-stream.py and specify the stream from which specfiles should be downloaded. This uses GitLab.com API, so you'll need to specify the token either by using the GITLAB_TOKEN environment variable or by providing the '--gitlab-token' option.

$ ./download-stream.py c8s

After the specfiles are downloaded, use the rg command above, to produce a statistic on the result.

RHEL

Use download-rhel.py to download spec-files for a given RHEL version.

The following will attempt to download specfiles from the rhel-8.7.0 branch and save them in the rhel-8 directory.

$ ./download-rhel.py rhel-8.7.0 rhel-8

When multiple branches are specified, the specfile will be downloaded from the branch which is first found in a repository:

$ ./download-rhel.py rhel-9.0.0 rhel-9.1.0 rhel-9.2.0 rhel-9

A Deeper Look

To have a better understanding of this data, one could produce CSV files with "package" and "patches" columns, and load this data with pandas for further inspection.

$ grep -crE '^Patch[0-9]*:' rhel-9/  | sed -e 's/:/,/' -e 's/^rhel-9\///' -e 's/\.spec,/,/' > rhel-9.csv

Then in Python:

import pandas as pd

rhel_9 = pd.read_csv("rhel-9.csv", names=["package", "patches"], index_col=0)
c9s = pd.read_csv("c9s.csv", names=["package", "patches"], index_col=0)

# Get packages with more than 10 patches
p_rhel_9 = set(rhel_9[rhel_9["patches"] >= 10].index)
p_c9s = set(c9s[c9s["patches"] >= 10].index)

# Packages which have more than 10 patches both in c9s and rhel-9
print(p_rhel_9 & p_c9s)