Code for paired TCR data analysis from TIRTL-seq experiments. Please see the preprint for details. Data for the manuscript is available at Zenodo .
For Nvidia GPU functionality you will need to have CUDA installed. You can install it from here. For Apple Silicon GPU install Apple MLX library from here.
R dependencies are Matrix (vesrsion >=1.5) and data.table (v.>1.15) packages, available by default with most distributions. If not, you can install them by running from R console:
install.packages("Matrix")
install.packages("data.table")Python3 dependencies are numpy (v.>1.26), pandas (v.>2.2), cupy (v. >13.2 for Nvidia GPU runs, optional), mlx (v. >=0.13 for Apple GPU runs, optional). You can install them by running:
pip install numpy pandas
pip install cupy
pip install mlxIf the dependencies are installed, just copying the repository is enough to run the code, no additional install or setup needed. The code was tested on macOS 14-15 (CPU/GPU), Red Hat Enterprise Linux 8.8 (CPU/Nvidia GPU) and Windows 11 Enterprise (tested on CPU only).
Install dependencies, download repository and run run the following commands in R:
source("TIRTL_functions.R")
# to run on cpu
test<-run_single_point_analysis_sub_gpu("data/")
# to run on Nvidia gpu
test_gpu_nvidia<-run_single_point_analysis_sub_gpu("data/",backend="cupy")
# to run on Apple Silicon gpu
test_gpu_apple<-run_single_point_analysis_sub_gpu("data/",backend="mlx")
# to run on 192 wells, first 12 columns of 384-well plate
test_half_plate<-run_single_point_analysis_sub_gpu("data/",wellset1=get_well_subset(1:16,1:12))
# DEMO! lets run on the small dataset from the manuscript (download and unpack data from zenodo https://doi.org/10.5281/zenodo.14010377 first!)
# This test takes ~6 minutes on CPU on a 2021 Macbook Pro (M1 Pro 32 GB RAM)
cd8_tp2<-run_single_point_analysis_sub_gpu("exp3_clones/TCR_clones_ID03/",wellset1=get_well_subset(1:16,1:12),backend="numpy") # CD8 repertoire for time point 2 for COVID patient (left half of 384-well plate)
print(cd8_tp2) # we got alpha-beta pairs!
table(cd8_tp2$method) # note that same alpha-beta pairs can be called by different methods, so there are duplicates
#expected output:
#madhype tshell
# 16369 9891 The input data should be a folder with TCRalpha and TCRbeta repertoires from one plate in separate tab separated files from standard mixcr output. Filenames should contain alphanumeric well id in the format "A1", "B2", etc separated by underbars.
Other parameters are optional and can be set to change the default behavior of the pipeline:
prefix (default="tmp") - prefix for the output files
well_filter_thres (default=0.5 of mean number of unique clones across alpha wells) - threshold for the number unique clones in a well to be considered as a valid well. Decreasing this threshold will increase the number of wells considered as valid, but may also increase the number of false positives.
min_reads (default=0) - minimum number of reads a clone to be included. By default all clones are included. Increase this number if you suspect cross-contamination between wells or high number of sequencing errors.
well_pos (default=3) - position of the well id in the file name if split by underbar. For example, if the file name is "3123233_MVP093_TIRTLseq_A1_S8.alpha_old_ID10.clones_TRA.tsv", the well_pos should be 4.
wellset1 (default=get_well_subset(1:16,1:24)) - subset of wells to be analyzed (if you loaded your sample to part of the plate). By default all wells in 384 well plate are analyzed. For example, to analyze the first 12 columns of a 384-well plate, you can use get_well_subset(1:16,1:12).
compute (default=T) - whether to run backend script for pairing. Only set F if you want to run the python backend script separately.
backend (default="numpy") - backend for the pairing. Can be "numpy" for CPU, "cupy" for Nvidia GPU, or "mlx" for Apple Silicon GPU.
This will take TCRalpha and TCRbeta repertoires from the folder "data/" and pair them using the default parameters. The output will be saved in the working directory under tmp_TIRTLoutput.tsv and also returned to R as data.table.
wi - number of wells where alpha is found, but not beta
wj - number of wells where beta is found, but not alpha
wij - number of wells where both alpha and beta are found
wa - number of wells where alpha is found
wb -number of wells where beta is found
alpha_nuc_seq, alpha_nuc - alpha CDR3nuc sequence
beta_nuc_seq, beta_nuc - beta CDR3nuc sequence
alpha_beta - alpha-beta pair concatenated nucleotide sequence
method - method used for pairing t-shell, or MAD-HYPE
r - for t-shell, the Pearson correlation coefficient between alpha and beta frequencies
ts - for t-shell, the t-statistic of the correlation
pval - for t-shell, the p-value of the correlation
pval_adj - for t-shell, the adjusted p-value of the correlation
loss_a_frac - the fraction of wells with lost alpha chain
loss_b_frac - the fraction of wells with lost beta chain
score - for MAD-HYPE, the score of the pairing (higher is better)
cdr3a - alpha CDR3aa sequence
va - alpha V gene
ja - alpha J gene
cdr3b - beta CDR3aa sequence
vb - beta V gene
jb - beta J gene
To run the pipeline on your own data, you need to have TCRalpha and TCRbeta repertoires processed with mixcr (v. >= 4.6) separate for each well.
We use the following mixcr pipeline for each well:
$MIXCR analyze generic-amplicon --species hsa --rna \
--tag-pattern "^N{0:1}(SAMPLE:N{7})(R1:*)\^(R2:*)" \
--floating-left-alignment-boundary \
--floating-right-alignment-boundary C \
--split-by-sample \
--sample-sheet plate_index.tsv \
${fp}_L{{n}}_R1.fastq.gz ${fp}_L{{n}}_R2.fastq.gz \
$OUTDIR/${sample} \
--use-local-tempwhere plate_index.tsv is a tab separated file containing you PCR I plate indices (see example in this repo!).
MIXCR variable contains path to mixcr executable.
fp is the path to the fastq files for given wells without the _L{n}_R1.fastq.gz and _L{n}_R2.fastq.gz suffixes.
OUTDIR is the output directory for the mixcr output.
sample is the file name for the output for the given well (usually only filename from fp).