Demultiplex pooled snRNA seq datasets

This setup shows one complex workflow that will be simplified and streamlined by this pipeline.

To make it more interesting, this tutorial will annotate individual samples through genotype based demultiplexing (using cellSNP-vireoSNP workflow) as well as HTO based demultiplexing (using kite-hashsolo workflow).

Pipeline overwiew

The pipeline can be visualized as:

%%{ init: { "theme":"neutral", "themeVariables": { "fontSize":20, "primaryColor":"#BB2528", "primaryTextColor":"#fff", "primaryBorderColor":"#7C0000", "lineColor":"#F8B229", "secondaryColor":"#006100", "tertiaryColor":"#fff" }, "flowchart": { "wrap": true, "width": 300 } } }%% flowchart TB subgraph cDNA direction TB id1[/cDNA fastqs/]-->|align|id2(STARsolo)-->|"filter cells"|id3("cellSNP (cellsnp-lite)") id3-->id6{"all genotypes available?"} subgraph PICARD direction LR A(CollectGcBiasMetrics) B(CollectRnaSeqMetrics) end end subgraph genotype-based id6-->|yes|id8("vireoSNP without genotypes")-->|identify swaps|id9(QTLtools-mbv) id9-->|"rectify swaps"|id11(vireoSNP) subgraph SNPs direction LR id4[("External Genotypes (SNParray or WGS)")] id5[("1000 Genomes Project")] style id4 fill:#348ceb,stroke:#333,stroke-width:4px style id5 fill:#348ceb,stroke:#333,stroke-width:4px end end subgraph demultiplex direction TB id12(custom scripts)-->id13[/"final count matrix"/] end subgraph kite direction TB id14[/HTO fastqs/]-->id18("run kallisto") id16("create feature barcode file")-->|"create mismatch FASTA and t2g files"|id17("featuremap (pachter/kite lab)") id17-->|mismatch FASTA|id15("build kallisto index") id15-->id18 id18-->|"run bustools"|id19("correct, sort and count") id19-->|"hashing count matrix"|id20(hashsolo) end id2-->|"collect read stats"|PICARD SNPs-->|"common SNPs"|id3 id4-->id9 id4-->|"correct donors"|id11 id11-->id12 id2-->|"filter cells"|id20 id20-->id12 id6-->|no|kite

Preparing target files

Firstly, we need to create a list of file structure (derived from our fastq files), which will be used by the rule input_processing(add link here) to read in wildcards

Fastq File Structure

asdasd

Configuration File

To begin with, any utilisation of this pipeline starts with setting up the configuration file new_config.yaml

This yaml config file (new_config.yaml) has all relevant options for each rule present in this pipeline. Furthermore, this file has been sectioned, through comments, into separate sub-workflow modules in a way containing rule-specific options/parameters (ocurring in the order of their appearance in the sub-workflow scripts). Typically, there are certain parameters that need not be changed irrespective of the project the pipeline is being used for

Common (project-specific) parameters

The following pictures showcase parameters that are only project-specific.

DAG control and project info params

Folder structures

Extra Info (can be removed soon!)

Module selector

last_step: This is the key which needs to be fed one of the pre-selected modules

Project-specific changes to rules

Changes to executor script

Finally we have to setup the 2 executor scripts:

..Snakefile: