scmagnify.tools.FuncEnrich#
- class scmagnify.tools.FuncEnrich(gene_sets, geneset_col='geneset', genesymbol_col='genesymbol')#
Performs Over-Representation Analysis (ORA) to identify enriched biological pathways or gene sets from a given list of genes.
- Parameters:
The source of gene sets. Can be: - A pre-loaded long-format pandas DataFrame. - A full path to a .gmt file. - The name of a built-in gene set (e.g., ‘msigdb_gobp’), which will be
loaded from the package’s default data directory.
geneset_col (
str(default:'geneset')) – Ifgene_setsis a DataFrame, this specifies the column containing the gene set names.genesymbol_col (
str(default:'genesymbol')) – Ifgene_setsis a DataFrame, this specifies the column containing the gene symbols.
Methods table#
|
Adds new gene sets to the object from a dictionary or DataFrame. |
|
Filters the gene sets based on a keyword or regular expression. |
|
Retrieves the overlapping genes for specified enriched terms. |
|
Performs Over-Representation Analysis (ORA) on a given list of genes. |
Methods#
- FuncEnrich.add_genesets(new_sets, geneset_col='geneset', genesymbol_col='genesymbol', inplace=True)#
Adds new gene sets to the object from a dictionary or DataFrame.
If any of the new gene set names already exist in the object, they will be overwritten by the new definitions.
- Parameters:
The new gene sets to add. Can be: - A dictionary where keys are gene set names and values are lists
of gene symbols (e.g., {‘MY_SET’: [‘GENE1’, ‘GENE2’]}).
A long-format pandas DataFrame.
geneset_col (
str(default:'geneset')) – Ifnew_setsis a DataFrame, this specifies the column with gene set names.genesymbol_col (
str(default:'genesymbol')) – Ifnew_setsis a DataFrame, this specifies the column with gene symbols.inplace (
bool(default:True)) – If True, modifies the current object directly. If False, returns a newFuncEnrichobject with the added gene sets.
- Return type:
- Returns:
Optional[FuncEnrich] If
inplace=False, returns a newFuncEnrichobject. Ifinplace=True, returnsNone.
- FuncEnrich.filter_genesets(pattern, case=False, regex=True, inplace=True)#
Filters the gene sets based on a keyword or regular expression.
This method allows you to narrow down the analysis to a subset of gene sets (e.g., only those related to ‘T_CELL’ or ‘KEGG_’).
- Parameters:
pattern (
str) – The keyword or regular expression pattern to search for in gene set names.case (
bool(default:False)) – If True, the pattern matching is case-sensitive.regex (
bool(default:True)) – If True, treats thepatternas a regular expression. If False, treats it as a literal string.inplace (
bool(default:True)) – If True, modifies the current object directly. If False, returns a newFuncEnrichobject with the filtered gene sets.
- Return type:
- Returns:
Optional[FuncEnrich] If
inplace=False, returns a new filteredFuncEnrichobject. Ifinplace=True, returnsNone.
- FuncEnrich.get_overlap_genes(terms, sortby=None, n_top=5)#
Retrieves the overlapping genes for specified enriched terms.
- Parameters:
terms (
list[str]) – A list of enriched term names for which to retrieve overlapping genes.sortby (
Optional[DataFrame] (default:None)) – An optional DataFrame with gene symbols as the index and a numeric column to sort the overlapping genes by (e.g., log fold change).n_top (
int|None(default:5)) – Ifsortbyis provided, this specifies the number of top genes to return for each term based on the sorting. If None or <=0, returns all overlapping genes without sorting.
- Return type:
- Returns:
Dict[str, List[str]] A dictionary where keys are term names and values are lists of overlapping gene symbols.
- FuncEnrich.run_ora(gene_list, n_background=None, top_n_results=10)#
Performs Over-Representation Analysis (ORA) on a given list of genes.
- Parameters:
gene_list (
list|Series|Index) – A list, Series, or Index of significant gene symbols to be tested for enrichment.n_background (
Optional[int] (default:None)) – The total number of genes in the background universe. If None, the background is defined as all unique genes present in the loadedgene_sets. It is highly recommended to provide the total number of genes detected in your experiment.top_n_results (
int(default:10)) – The number of top enriched terms to display in a summary table after the run.
- Return type:
- Returns:
pd.DataFrame A DataFrame containing the ORA results, sorted by the ‘Combined score’.