scmagnify.tools.infer_signal_pairs

scmagnify.tools.infer_signal_pairs#

scmagnify.tools.infer_signal_pairs(data, meta_mdata, liana_res, rtf_prior_net='scMLnet_RTF', target_celltypes=None, rna_key='RNA', grn_key='GRN', use_layer='log1p_norm', pseudotime_key='dpt_pseudotime', metacell_key='SEACell', num_perms=1000, p_adj_method='fdr_bh')#

Infer receptor-to-transcription factor (RTF) downstream activity.

This function analyzes the temporal correlation between receptor expression and the expression of its downstream transcription factors (TFs) along a pseudotime trajectory at the metacell level. It uses permutation testing to assess significance.

Parameters:
  • data (AnnData | MuData) – A single-cell AnnData or MuData object. Used to calculate average pseudotime for metacells if not already present in meta_mdata.

  • meta_mdata (MuData) – A MuData object containing metacell data, with an RNA modality. The core analysis is performed on this object.

  • liana_res (DataFrame) – A DataFrame from a cell-cell communication tool like liana+, containing at least ‘target’ and ‘receptor_complex’ columns.

  • rtf_prior_net (str | DataFrame (default: 'scMLnet_RTF')) –

    A DataFrame or a string specifying the prior knowledge network of Receptor-TF interactions. - If a DataFrame is provided, it must have ‘Receptor’ and ‘TF’ columns. - If a string is provided, it can be a file path to a CSV or one of the built-in network names:

    • ’combined_RTF’: Loads the combined network from OmniPath, TRRUST, etc. (Default)

    • ’scMLnet_RTF’: Loads a subset of the combined network sourced from scMLnet.

  • target_celltypes (Optional[list[str]] (default: None)) – A list of cell type names in the ‘target’ column of liana_res to be analyzed.

  • rna_key (str (default: 'RNA')) – The key for the RNA modality in data and meta_mdata.

  • grn_key (str (default: 'GRN')) – The key for the GRN (Gene Regulatory Network) modality in meta_mdata. Used to identify the list of TFs.

  • use_layer (str (default: 'log1p_norm')) – The layer in meta_mdata[rna_key] to use for expression values. If not present, .X is used and a log1p normalization is stored.

  • pseudotime_key (str (default: 'dpt_pseudotime')) – The key in .obs that stores pseudotime values. This function will first look in meta_mdata and, if not found, calculate it from data.

  • metacell_key (str (default: 'SEACell')) – The key in data.obs that stores metacell assignments. Required to calculate average pseudotime if it is not in meta_mdata.obs.

  • num_perms (int (default: 1000)) – The number of permutations to perform for the significance test.

  • p_adj_method (str (default: 'fdr_bh')) – The method for multiple testing correction (see statsmodels.stats.multitest.multipletests).

Return type:

DataFrame

Returns:

pd.DataFrame A DataFrame containing the inferred RTF activities, with scores (dot product, covariance) and associated p-values. Columns include: ‘Receptor’, ‘TF’, ‘dot_product’, ‘covariance’, ‘pval_dot’, ‘pval_cov’, ‘pval_dot_adj’, ‘pval_cov_adj’.