Overview of the Pipeline
To set up the pipeline for projects, the following setups need to be performed:
Setting up wildcards for the project (see how).
According to the experiment select a module (module options).
As required and wherever possible, setup the folder structures for different programs.
Check and modify parameters of the programs to be run (in new_config.yaml file).
(Currently) make sure all required python and R packages are present.
Change global variables as required (in Snakefile - check).
Wildcard Processing
For the purpose of creating wildcards a list of samples to be processed is provided to the pipeline. There are 3 ways to achieve this:
list of samples/pools (as a folder structure)
yaml file containing the list of samples/pools
Directory containing input files
List of samples
This pipeline has many combinations of the aforementioned programs as a built-in set that can be executed using specific keywords.
Selectable Modules
The following combinations of programs can be run:
- all
- starsolo
- starsolo_rnaseqmet
- starsolo_gcbiasmet
- starsolo_kb_solo
- starsolo_picard
- starsolo_gt_demux
- starsolo_split_bams
- starsolo_split_bams_gt_demux
- starsolo_split_bams_gt_demux_multi_vcf
- starsolo_gt_demux_multi_vcf
- starsolo_cellsnp
- starsolo_rnaseqmet_kb_solo
- starsolo_gcbiasmet_kb_solo
- starsolo_gt_demux_identify_swaps
- starsolo_resolve_swaps_gt_demux #
where starsolo represents STARsolo; rnaseqmet and gcbiasmet refer to PICARD’s CollectRnaSeqMetrics and CollectGcBiasMetrics, respectively while picard represents inclusion of both the previously-mentioned programs; kb_solo refers to using kallisto, bustools and calico_solo for demultiplexing; gt_demux refers to using cellSNP and vireoSNP for genotype based demultiplexing; split_bams refers to splitting pooled/multiplexed bams using hashsolo’s outputs while split_bams_gt_demux refers to splitting pooled/multiplexed bams using vireo’s output; identify_swaps refers to using qtltools_mbv. The option multi_vcf is to provide muiltiple runs (i.e. multiple sets of vcf inputs) for the same sample.
Module description
Module Name |
Module Info |
Sub Worflows Involved |
---|---|---|
all |
module_info (more desc in its own file) |
sub_wkfl |
all_multi_vcf |
module_info (more desc in its own file) |
sub_wkfl |
starsolo |
module_info (more desc in its own file) |
sub_wkfl |
starsolo_kb_solo |
module_info (more desc in its own file) |
sub_wkfl |
starsolo_gt_demux |
module_info (more desc in its own file) |
sub_wkfl |
starsolo_split_bams |
module_info (more desc in its own file) |
sub_wkfl |
starsolo_split_bams_gt_demux |
module_info (more desc in its own file) |
sub_wkfl |
starsolo_split_bams_gt_demux_multi_vcf |
module_info (more desc in its own file) |
sub_wkfl |
starsolo_gt_demux_multi_vcf |
module_info (more desc in its own file) |
sub_wkfl |
starsolo_cellsnp |
module_info (more desc in its own file) |
sub_wkfl |
starsolo_gt_demux_identify_swaps |
module_info (more desc in its own file) |
sub_wkfl |
starsolo_resolve_swaps_gt_demux |
module_info (more desc in its own file) |
sub_wkfl |
Sub-Snakemake workflows
This pipeline divides each module into its self-contained individual workflows. These are:
Name of Workflow |
Description |
---|---|
resources.snkmk |
It contains memory (in MB per thread) and time requirements (in minutes) for each rule. |
calico_solo_demux.snkmk |
It contains hashsolo rule. |
split_bams.snkmk 1 |
It contains rules needed to split pooled bams into individual bams dependent on output produced by either hashsolo or vireoSNP using custom scripts. |
input_processing.snkmk |
It contains rules that collects values for all the wildcards. |
STARsolo.snkmk |
It contains rules for STARsolo. |
produce_targets.snkmk |
It contains the rule all and the needed functions. |
snv_aware_align.snkmk 2 |
This might be removed soon |
kite.snkmk |
It contains rules for the kite workflow. |
picard_metrics.snkmk |
It contains rules for all PICARD metrics (GCBiasMetrics and RNAseqMetrics). |
pheno_demux3.snkmk |
It contains rules for the cellSNP-vireoSNP pipeline. |
split_bams_gt.snkmk 1 |
It contains rules needed to split pooled bams into individual bams dependent on output produced by vireo. |
demultiplex_no_argp.snkmk |
It contains rules for demultiplexing using hashsolo and/or vireoSNP output and create a count matrix file. |
identify_swaps.snkmk |
It contains rules for identifying swaps using QTLtools-mbv. |
Return HTO information and classification for each cell barcode. |