Skip to content

Conversation

@fmoessbauer
Copy link
Member

We add the merge subcommand which runs against a download directory and creates combined archive files, which can be used as input to license clearing tools that only support a single archive per component.

@Urist-McGit
Copy link
Collaborator

Urist-McGit commented Sep 8, 2025

merge makes it sound like we merge SBOMs, a very common thing to do with them. Maybe something like archive instead?

@fmoessbauer
Copy link
Member Author

merge makes it sound like we merge SBOMs, a very common thing to do with them. Maybe something like archive instead?

Which name do you propose instead for the command?

@Urist-McGit
Copy link
Collaborator

merge makes it sound like we merge SBOMs, a very common thing to do with them. Maybe something like archive instead?

Which name do you propose instead for the command?

source-merge or archive?

@fmoessbauer fmoessbauer force-pushed the fm/merge-tars branch 3 times, most recently from 31b1f44 to 88886bf Compare September 8, 2025 15:10
@Urist-McGit Urist-McGit changed the title feat: add merge subcommand feat: add source-merge subcommand Sep 9, 2025
@fmoessbauer fmoessbauer marked this pull request as ready for review September 9, 2025 14:48
On repeated executions some files already have been downloaded.
Currently these are already skipped when downloading again, but they
show up in the statistics. We now split the statistics into the total
data and the data we already have. This gives a better overview what
still is missing and needs to be downloaded. We further change the
return type to a named tuple to make it easier for downstream users to
access the various values.

Signed-off-by: Felix Moessbauer <[email protected]>
All debian source packages come with a .dsc file that provides the links
to the other files (e.g. .orig.tar or .debian.tar), along with some
other information. As a safety measure, we check on downloading if every
source package has this file.

Signed-off-by: Felix Moessbauer <[email protected]>
A source package consist of multiple individual parts (e.g. policy 4.x
diff files, .orig and .debian tarballs). To create a single artifact
which can be used for license clearing, we need to merge these into a
single archive. For that, we introduce the SourceArchiveMerger class
which performs the merge based on the .dsc data.

Signed-off-by: Felix Moessbauer <[email protected]>
We add the merge subcommand which runs against a download directory and
creates combined archive files, which can be used as input to license
clearing tools that only support a single archive per component.

Signed-off-by: Felix Moessbauer <[email protected]>
We add a unit test that checks the various debian formats, as well as
compressing the output tar with all supported compressors.

Signed-off-by: Felix Moessbauer <[email protected]>
The mirror might have the same file under various names and paths.
Currently, we only return the first instance, but this is not sufficient
if other files link to a filename that is not a first instance. We now
expand this list and return all instances.

Signed-off-by: Felix Moessbauer <[email protected]>
As the snapshot client now returns all file instances, we also download
them multiple times. To optimize this, we check if we already have a
file with that hash and just link it.

Signed-off-by: Felix Moessbauer <[email protected]>
@Urist-McGit Urist-McGit merged commit 195804b into siemens:main Sep 10, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants