demultiplex_helper_funcs module

auto_read(fname, **kwargs) DataFrame
demux_by_calico_solo(bcs: Series, df_s: DataFrame, samp: str, sep: str, col_list: list[str, str], dem_cs: Series, donor_convert: bool) list[pandas.core.series.Series, pandas.core.series.Series, list[str]]

Main function for classification by calico_solo.

This function assigns calico_solo classification using another function demultiplex_helper_funcs.ret_htos_calico_solo().

demux_by_vireo(bcs: Series, vir_out_file: str, conv_df: Optional[DataFrame] = None) list[pandas.core.series.Series, list[str]]

Main function for classification by vireoSNP.

This function assigns vireo classification.

Paramters

bcs

A pd series of cell barcodes from gene count matrix

vir_out_file

Path to donor_ids.tsv file produced by vireo

conv_df

A converter file to change donor names in the vireo output, if needed

returns
  • pd.Series – Classification by vireo per cell

  • list – Demux stats

demux_stats(demux_freq: Series, demux_name: str) list[str]
get_don_ids(x: str, t_df: DataFrame) str

Coverting the donor names of vireo output

Parameters
  • x – A string representing a donor

  • t_df – A converter file that contains the donor and its converted

Returns

String representing converted donor name

Return type

str

parse_file(wet_lab_df, cols, s_name, hs, d_con) Union[list[str], list[list[str], list[str]]]
ret_htos_calico_solo(bcs: Series, df_s: DataFrame, samp: str, sep: Optional[str], col_list: list[str, str], dem_cs: Series, donor_convert: bool) list[pandas.core.series.Series, pandas.core.series.Series, int, int]

Return HTO information and classification for each cell barcode.

This function returns a 2 pandas series representing donor IDs and HTO name (used for calico_solo) and the number of doublets and negatives identified.

Parameters
  • bcs – A pd series of cell barcodes from gene count matrix

  • df_s

    Wet lab file containing HTO information and SubID (donor IDs)

    for each pool

  • samp – Pool name (present in df_s file)

  • sep

    Separator used if all HTOs and donors are present in one row

    otherwise None

  • col_list – List of column names (first val HTO, second val SubID)

  • dem_cs

    A pd series with cell barcodes as index and “HTO classification”

    (solo output)

  • donor_convert – If donor names have to be converted from the names used in calico_solo (hashsolo) demultiplexing method

Returns

  • pd.Series – Contains donor IDs with cell barcodes as index

  • pd.Series – Contains HTO name with cell barcodes as index

  • int – number of doublets

  • int – number of negatives.

ret_subj_ids(ser: list, t_df: DataFrame) DataFrame

Returns vireo demux stats

This function returns extra stats from vireo demux output

Parameters
  • ser – A pd series of cell barcodes from gene count matrix

  • t_df – Vireo output (donor_ids.tsv)

Returns

A dataframe with extra stats

Return type

pd.DataFrame

set_don_ids(x: str) str

Change naming conventions of Vireo

Use this function to change the naming convetion used in the vireo output - donor_ids.tsv - (generally to make this similar to that of calico_solo/hashsolo but also suits to beautify donor names so as to make it feasible to be classified by using a converter file.

Parameters

x – A string representing a donor classification from vireo

Returns

The ‘changed’ classification

Return type

str