Skip to main content

Counting patches in specfiles

Ways to count the number of patches in the packages in a distribution.

Fedora Rawhide

Download and unpack the specfile archive for the packages currently in Rawhide:

$ curl -LO
$ tar -xvf rpm-specs-latest.tar.xz

Count the number of of specfiles with a given number of patches:

$ grep -crE '^Patch[0-9]*:' rpm-specs/ | grep --only-matching -E '[0-9]+$' | sort -g | uniq -c

The command above tells how many specfiles there are with 0, 1, 2, ... patches.

CentOS Stream

Use and specify the stream from which specfiles should be downloaded. This uses API, so you'll need to specify the token either by using the GITLAB_TOKEN environment variable or by providing the '--gitlab-token' option.

$ ./ c8s

After the specfiles are downloaded, use the rg command above, to produce a statistic on the result.


Use to download spec-files for a given RHEL version.

The following will attempt to download specfiles from the rhel-8.7.0 branch and save them in the rhel-8 directory.

$ ./ rhel-8.7.0 rhel-8

When multiple branches are specified, the specfile will be downloaded from the branch which is first found in a repository:

$ ./ rhel-9.0.0 rhel-9.1.0 rhel-9.2.0 rhel-9

A Deeper Look

To have a better understanding of this data, one could produce CSV files with "package" and "patches" columns, and load this data with pandas for further inspection.

$ grep -crE '^Patch[0-9]*:' rhel-9/  | sed -e 's/:/,/' -e 's/^rhel-9\///' -e 's/\.spec,/,/' > rhel-9.csv

Then in Python:

import pandas as pd

rhel_9 = pd.read_csv("rhel-9.csv", names=["package", "patches"], index_col=0)
c9s = pd.read_csv("c9s.csv", names=["package", "patches"], index_col=0)

# Get packages with more than 10 patches
p_rhel_9 = set(rhel_9[rhel_9["patches"] >= 10].index)
p_c9s = set(c9s[c9s["patches"] >= 10].index)

# Packages which have more than 10 patches both in c9s and rhel-9
print(p_rhel_9 & p_c9s)