demultiplex_helper_funcs module

auto_read(fname, lev=1, **kwargs) DataFrame
demux_by_calico_solo(bcs: Series, df_s: DataFrame, samp: str, sep: str, col_list: list[str, str], dem_cs: Series, donor_convert: bool, hto_count: int, multi_hto_sep: str = '') list[Series, Series, list[str]]

Main function for classification by calico_solo.

This function assigns calico_solo classification using another function demultiplex_helper_funcs.ret_htos_calico_solo().

demux_by_vireo(bcs: Series, vir_out_file: str, conv_df: DataFrame | None = None, donor_col: str | None = None, conv_col: str | None = None, pool_col: str | None = None, pool_name: str | None = None) tuple[Series, list[str], Series | None]

Main function for classification by vireoSNP.

This function assigns vireo classification.

Paramters

bcs

A pd series of cell barcodes from gene count matrix.

vir_out_file

Path to donor_ids.tsv file produced by vireo.

conv_df

A converter file to change donor names in the vireo output, if needed.

donor_col

Donor names containing column in the converter file that matches the vireo output.

conv_col

Column, in the converter file, containing the converted names.

pool_col

Column, in the converter file, containing the pool names.

pool_name

Pool name.

returns:
  • pd.Series – Classification by vireo per cell

  • list – Demux stats

  • pd.Series – Converted donor names of classification by vireo per cell

demux_stats(demux_freq: Series, demux_name: str) list[str]
get_donor_info(hto_df: DataFrame, pool_info_df: DataFrame, sep: str, col_list: list)

Return donor information for each cell barcode for multi-HTO setup.

This function returns a pandas series containing demultiplexed donor info according to the data contained in the wet lab info file. This function is made specially for multi-HTO setup.

Parameters:
  • hto_df – A series of cell barcodes from gene count matrix

  • pool_info_df – Subset of wet lab file containing multi-HTO information and SubID (donor IDs)

  • col_list – List of column names in the wet lab file in the sequence: pool name, HTO names (separated by ‘sep’), HTO barcode, donor info

Returns:

Contains donor IDs with cell barcodes as index

Return type:

pd.Series

parse_file(wet_lab_df, cols, s_name, hs, d_con) list[str] | tuple[list[str], list[str]]
ret_htos_calico_solo(bcs: Series, df_s: DataFrame, samp: str, sep: str | None, col_list: list[str, str], dem_cs: Series, donor_convert: bool, hto_count: int, multi_hto_setp: bool) list[Series, Series, int, int]

Return HTO information and classification for each cell barcode.

This function returns a 2 pandas series representing donor IDs and HTO name (used for calico_solo) and the number of doublets and negatives identified.

Parameters:
  • bcs – A pd series of cell barcodes from gene count matrix

  • df_s – Wet lab file containing HTO information and SubID (donor IDs) for each pool

  • samp – Pool name (present in df_s file)

  • sep – Separator used if all HTOs and donors are present in one row or if multi-HTO setup otherwise None

  • col_list – List of column names (first val HTO, second val SubID)

  • dem_cs – A pd series with cell barcodes as index and “HTO classification” (solo output)

  • donor_convert – If donor names have to be converted from the names used in calico_solo (hashsolo) demultiplexing method

  • hto_count – If run for multi-HTO setup this indicates the position of HTO in the sequence

  • multi_hto_setp – True for multi-HTO setup

Returns:

  • pd.Series – Contains donor IDs with cell barcodes as index

  • pd.Series – Contains HTO name with cell barcodes as index

  • int – number of doublets

  • int – number of negatives.

ret_subj_ids(ser: list, t_df: DataFrame) DataFrame

Returns vireo demux stats

This function returns extra stats from vireo demux output

Parameters:
  • ser – A pd series of cell barcodes from gene count matrix

  • t_df – Vireo output (donor_ids.tsv)

Returns:

A dataframe with extra stats

Return type:

pd.DataFrame

set_don_ids(x: str) str

Change naming conventions of Vireo

Use this function to change the naming convetion used in the vireo output - donor_ids.tsv - (generally to make this similar to that of calico_solo/hashsolo but also suits to beautify donor names so as to make it feasible to be classified by using a converter file.

Parameters:

x – A string representing a donor classification from vireo

Returns:

The ‘changed’ classification

Return type:

str