snappy pull-raw-data subcommand does not find all fastq files

**Describe the bug**
`cubi-tk snappy pull-raw-data` doesn't seem to find fastq files with a different name than the collection. This is unfortunate,  because DKFZ Heidelberg (for example) has its own file naming convention, and it is an important data provider.


**To Reproduce**

```
# SODAR project
project=7097920f-d1ce-4014-a4f0-97f2c2ef9b81
assay=9b897bea-e233-4610-9f9c-5536ec850c3f

# Create snappy environment
mkdir -p dir/.snappy_pipeline RAW_DATA
cat > dir/.snappy_pipeline/config.yaml << __EOF
data_sets:
  exomes:
    sodar_uuid: 7097920f-d1ce-4014-a4f0-97f2c2ef9b81
    file: SampleSheet.tsv
    search_patterns:
    - {"left": "*.read1.fastq.gz", "right": "*.read2.fastq.gz"}
    search_paths:
    - /data/hdd/eblanc/tmp/tmp/2025-09-22_cubi_tk_cancer/RAW_DATA
    type: matched_cancer
    naming_scheme: only_secondary_id
__EOF
cat > dir/.snappy_pipeline/SampleSheet.tsv << __EOF
[Metadata]
schema  cancer_matched
schema_version  v1
title   Becnel public dataset
description     Multiple tumor/normal pairs with WES, WGS & RNA-seq data

[Data]
patientName     sampleName      libraryType     folderName      isTumor
case001 N1      WES     case001-N1-DNA1-WES1    N
case001 T1      WES     case001-T1-DNA1-WES1    Y
case002 N1      WES     case002-N1-DNA1-WES1    N
case002 T1      WES     case002-T1-DNA1-WES1    Y
__EOF

# Offending command
cubi-tk snappy pull-raw-data \
    --tsv-shortcut cancer \
    --assay-uuid $assay \
    --output-directory RAW_DATA --base-path dir \
    --samples case001 \
    $project
```

**Command output**

```
I - 22.09.2025 18:00:32 - Will start at dir
I - 22.09.2025 18:00:32 - Loading configuration file and look for dataset
I - 22.09.2025 18:00:32 - => will download to RAW_DATA
I - 22.09.2025 18:00:32 - Will start at dir
W - 22.09.2025 18:00:32 - No file was found using the selected criteria.
Available files (limited to first 50):
TCRBOA1-N-WEX.read1.fastq.gz
TCRBOA1-N-WEX.read2.fastq.gz
TCRBOA1-T-WEX.read1.fastq.gz
TCRBOA1-T-WEX.read2.fastq.gz
TCRBOA2-N-WEX.read1.fastq.gz
TCRBOA2-N-WEX.read2.fastq.gz
TCRBOA2-T-WEX.read1.fastq.gz
TCRBOA2-T-WEX.read2.fastq.gz
TCRBOA3-N-WEX.read1.fastq.gz
TCRBOA3-N-WEX.read2.fastq.gz
TCRBOA3-T-WEX.read1.fastq.gz
TCRBOA3-T-WEX.read2.fastq.gz
...
S - 22.09.2025 18:00:32 - All done. Have a nice day!
```

**Expected behavior**
`TCRBOA1-N-WEX.read1.fastq.gz` & `TCRBOA1-N-WEX.read2.fastq.gz` should be downloaded in folder `RAW_DATA/case001-N1-DNA1-WES1`, & 
`TCRBOA1-T-WEX.read1.fastq.gz` & `TCRBOA1-T-WEX.read2.fastq.gz` in `RAW_DATA/case001-T1-DNA1-WES1`.

**Additional context**
I also tried using `case001-N1` & `case001-N1-DNA1-WES1` as values of the `--sample-list` argument, but without success. The error message is identical.

This feature is important for the cancer branch, because DKFZ Heidelberg (an important provider of cancer-related sequencing data) has its own file naming convention.

In particular. It uses extension `.md5sum` to store raw data file checksums. It would be very useful to have (optionally) a flag allowing to upload `*.md5sum` as `*.md5` in the landing zone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

snappy pull-raw-data subcommand does not find all fastq files #306

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

snappy pull-raw-data subcommand does not find all fastq files #306

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions