scmagnify.tools.extract_regfactor_genes#
- scmagnify.tools.extract_regfactor_genes(data, regfactor_key='regfactors', mode='TF', threshold=0.0, n_top=None, percentile=None, plot=False, ncols=3, figsize=(15, 8), bins=30, kde=True, palette=None, context=None, font_scale=1, default_context=None, theme='whitegrid', save=None, show=None)#
Extract TFs or TGs with high loadings for each RegFactor and optionally plot their distributions.
- Parameters:
data (
AnnData|MuData|GRNMuData) – Single cell data object. Can be ananndata.AnnData,mudata.MuData,scmagnify.GRNMuDatathreshold (
float(default:0.0)) – The minimum loading value to include a gene (used if n_top and percentile are None).n_top (
Optional[int] (default:None)) – The number of top genes to extract for each RegFactor.percentile (
Optional[float] (default:None)) – The percentile of loadings to use as a threshold.regfactor_key (
str(default:'regfactors')) – The key indata.unswhere the RegFactor loadings are stored.mode (
str(default:'TF')) – The mode (‘TF’ or ‘TG’) to extract genes from.plot (
bool(default:False)) – Whether to plot the distribution of loadings with thresholds.ncols (
int(default:3)) – Number of columns in the plot grid.figsize (
tuple[int,int] (default:(15, 8))) – Size of the figure.bins (
int(default:30)) – Number of bins for the histogram.kde (
bool(default:True)) – Whether to overlay a KDE on the histogram.palette (
Optional[str] (default:None)) – Color palette for the plots.context (
Optional[str] (default:None)) – Seaborn context for the plots.font_scale (
float|None(default:1)) – Scaling factor for fonts in the plots.default_context (
Optional[dict] (default:None)) – Default context settings for the plots.theme (
str|None(default:'whitegrid')) – Seaborn theme for the plots.save (
Union[bool,str,None] (default:None)) – Whether to save the plot. If a string is provided, it is used as the filename.show (
Optional[bool] (default:None)) – Whether to display the plot.
- Return type:
- Returns:
Dict[str, pd.DataFrame] A dictionary where keys are RegFactor names and values are DataFrames of genes with high loadings.