TIRTL-seq data analysis pipeline

Code for paired TCR data analysis from TIRTL-seq experiments. Please see the preprint for details. Data for the manuscript is available at Zenodo .

Installation

Requirements

For Nvidia GPU functionality you will need to have CUDA installed. You can install it from here. For Apple Silicon GPU install Apple MLX library from here.

R dependencies are Matrix (vesrsion >=1.5) and data.table (v.>1.15) packages, available by default with most distributions. If not, you can install them by running from R console:

install.packages("Matrix")
install.packages("data.table")

Python3 dependencies are numpy (v.>1.26), pandas (v.>2.2), cupy (v. >13.2 for Nvidia GPU runs, optional), mlx (v. >=0.13 for Apple GPU runs, optional). You can install them by running:

pip install numpy pandas
pip install cupy
pip install mlx

If the dependencies are installed, just copying the repository is enough to run the code, no additional install or setup needed. The code was tested on macOS 14-15 (CPU/GPU), Red Hat Enterprise Linux 8.8 (CPU/Nvidia GPU) and Windows 11 Enterprise (tested on CPU only).

Quick Start

Install dependencies, download repository and run run the following commands in R:

source("TIRTL_functions.R")
# to run on cpu
test<-run_single_point_analysis_sub_gpu("data/")
# to run on Nvidia gpu
test_gpu_nvidia<-run_single_point_analysis_sub_gpu("data/",backend="cupy")
# to run on Apple Silicon gpu
test_gpu_apple<-run_single_point_analysis_sub_gpu("data/",backend="mlx")
# to run on 192 wells, first 12 columns of 384-well plate
test_half_plate<-run_single_point_analysis_sub_gpu("data/",wellset1=get_well_subset(1:16,1:12))
# DEMO! lets run on the small dataset from the manuscript (download and unpack data from zenodo https://doi.org/10.5281/zenodo.14010377 first!)
# This test takes ~6 minutes on CPU on a 2021 Macbook Pro (M1 Pro 32 GB RAM)
cd8_tp2<-run_single_point_analysis_sub_gpu("exp3_clones/TCR_clones_ID03/",wellset1=get_well_subset(1:16,1:12),backend="numpy") # CD8 repertoire for time point 2 for COVID patient (left half of 384-well plate)
print(cd8_tp2) # we got alpha-beta pairs!
table(cd8_tp2$method) # note that same alpha-beta pairs can be called by different methods, so there are duplicates
#expected output: 
#madhype  tshell 
#  16369    9891

Input for pairing pipeline

The input data should be a folder with TCRalpha and TCRbeta repertoires from one plate in separate tab separated files from standard mixcr output. Filenames should contain alphanumeric well id in the format "A1", "B2", etc separated by underbars.

Other parameters are optional and can be set to change the default behavior of the pipeline:

prefix (default="tmp") - prefix for the output files

well_filter_thres (default=0.5 of mean number of unique clones across alpha wells) - threshold for the number unique clones in a well to be considered as a valid well. Decreasing this threshold will increase the number of wells considered as valid, but may also increase the number of false positives.

min_reads (default=0) - minimum number of reads a clone to be included. By default all clones are included. Increase this number if you suspect cross-contamination between wells or high number of sequencing errors.

well_pos (default=3) - position of the well id in the file name if split by underbar. For example, if the file name is "3123233_MVP093_TIRTLseq_A1_S8.alpha_old_ID10.clones_TRA.tsv", the well_pos should be 4.

wellset1 (default=get_well_subset(1:16,1:24)) - subset of wells to be analyzed (if you loaded your sample to part of the plate). By default all wells in 384 well plate are analyzed. For example, to analyze the first 12 columns of a 384-well plate, you can use get_well_subset(1:16,1:12).

compute (default=T) - whether to run backend script for pairing. Only set F if you want to run the python backend script separately.

backend (default="numpy") - backend for the pairing. Can be "numpy" for CPU, "cupy" for Nvidia GPU, or "mlx" for Apple Silicon GPU.

Output for pairing pipeline

This will take TCRalpha and TCRbeta repertoires from the folder "data/" and pair them using the default parameters. The output will be saved in the working directory under tmp_TIRTLoutput.tsv and also returned to R as data.table.

wi - number of wells where alpha is found, but not beta

wj - number of wells where beta is found, but not alpha

wij - number of wells where both alpha and beta are found

wa - number of wells where alpha is found

wb -number of wells where beta is found

alpha_nuc_seq, alpha_nuc - alpha CDR3nuc sequence

beta_nuc_seq, beta_nuc - beta CDR3nuc sequence

alpha_beta - alpha-beta pair concatenated nucleotide sequence

method - method used for pairing t-shell, or MAD-HYPE

r - for t-shell, the Pearson correlation coefficient between alpha and beta frequencies

ts - for t-shell, the t-statistic of the correlation

pval - for t-shell, the p-value of the correlation

pval_adj - for t-shell, the adjusted p-value of the correlation

loss_a_frac - the fraction of wells with lost alpha chain

loss_b_frac - the fraction of wells with lost beta chain

score - for MAD-HYPE, the score of the pairing (higher is better)

cdr3a - alpha CDR3aa sequence

va - alpha V gene

ja - alpha J gene

cdr3b - beta CDR3aa sequence

vb - beta V gene

jb - beta J gene

Running on your own data

To run the pipeline on your own data, you need to have TCRalpha and TCRbeta repertoires processed with mixcr (v. >= 4.6) separate for each well.

We use the following mixcr pipeline for each well:

$MIXCR analyze generic-amplicon --species hsa --rna \
--tag-pattern "^N{0:1}(SAMPLE:N{7})(R1:*)\^(R2:*)" \
--floating-left-alignment-boundary \
--floating-right-alignment-boundary C \
--split-by-sample \
--sample-sheet plate_index.tsv \
${fp}_L{{n}}_R1.fastq.gz ${fp}_L{{n}}_R2.fastq.gz \
$OUTDIR/${sample} \
--use-local-temp

where plate_index.tsv is a tab separated file containing you PCR I plate indices (see example in this repo!).

MIXCR variable contains path to mixcr executable.

fp is the path to the fastq files for given wells without the _L{n}_R1.fastq.gz and _L{n}_R2.fastq.gz suffixes.

OUTDIR is the output directory for the mixcr output.

sample is the file name for the output for the given well (usually only filename from fp).

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
LICENSE		LICENSE
README.md		README.md
TIRTL_functions.R		TIRTL_functions.R
cupy_backend_script.py		cupy_backend_script.py
mad_hyper.R		mad_hyper.R
mlx_backend_script.py		mlx_backend_script.py
numpy_backend_script.py		numpy_backend_script.py
plate_index.tsv		plate_index.tsv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TIRTL-seq data analysis pipeline

Installation

Requirements

Quick Start

Input for pairing pipeline

Output for pairing pipeline

Running on your own data

About

Uh oh!

Releases

Packages

Languages

License

vsuppiyar/TIRTL

Folders and files

Latest commit

History

Repository files navigation

TIRTL-seq data analysis pipeline

Installation

Requirements

Quick Start

Input for pairing pipeline

Output for pairing pipeline

Running on your own data

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages