scmagnify.tools.MotifScanner#
- class scmagnify.tools.MotifScanner(motif_db=None, motif_objects=None, genome_file=None)#
A class for scanning DNA sequences for motifs using position frequency matrices (PFMs).
- Parameters:
motif_db (
Optional[str] (default:None)) – Path to motif database file. If a basename is provided, it is treated as a database name from the default motif directory (MOTIF_DIR) and loaded in PFM format.motif_objects (
Optional[list[PFM]] (default:None)) – List of PFM objects to initialize the scanner with.genome_file (
Optional[str] (default:None)) – Path to genome FASTA file. Defaults to path in scmagnify settings.
Methods table#
|
Adds a custom motif to the collection from a pandas DataFrame. |
|
Exports the motifs stored in the scanner to a specified format. |
|
Imports motifs from a file, skipping duplicates and optionally filtering by factors. |
|
Perform motif scanning on the selected peaks using stored motifs. |
|
Lists available motif databases in the specified directory. |
Methods#
- MotifScanner.add_custom_motif(matrix_id, name, counts_df)#
Adds a custom motif to the collection from a pandas DataFrame.
- MotifScanner.export_motifs(output_path, format, **kwargs)#
Exports the motifs stored in the scanner to a specified format.
- MotifScanner.import_motifs(motif_db='HOCOMOCOv11_HUMAN', format=None, factor_file=None, factor_list=None, target_organism=None)#
Imports motifs from a file, skipping duplicates and optionally filtering by factors.
If
motif_dbis provided without an extension, it is treated as a database name from the default motif directory (MOTIF_DIR) and loaded in PFM format.- Parameters:
motif_db (
str(default:'HOCOMOCOv11_HUMAN')) – Path to the input motif file or a database name (e.g., “HOCOMOCOv11_HUMAN”).format (
Optional[Literal['meme','jaspar','pfm']] (default:None)) – The format of the input file (‘meme’, ‘jaspar’, or ‘pfm’). Defaults to ‘pfm’ if a basename is provided.factor_file (
Optional[str] (default:None)) – Path to the motif-to-factors mapping file. Required for ‘pfm’ format.factor_list (
Optional[list[str]] (default:None)) – A list of factor names. If provided, only motifs associated with these factors will be imported.target_organism (str | None)
- MotifScanner.match(data, peak_selected=None, pseudocounts=0.0001, p_value=5e-05, background='even', threshold=0, modal='ATAC')#
Perform motif scanning on the selected peaks using stored motifs.
- Parameters:
data (
AnnData|MuData) – AnnData object with peak counts or MuData object with ‘ATAC’ modality.peak_selected (
Optional[list[str]] (default:None)) – List of selected peaks. If None, uses all peaks indata.uns["peak_gene_corrs"]["filtered_corrs"].pseudocounts (
float(default:0.0001)) – Pseudocounts for each nucleotide, by default 0.0001 moods-dna.py:0.01 pychromVAR:0.0001 motifmatchr:0.8p_value (
float(default:5e-05)) – P-value threshold for motif matching, by default 5e-05background (
str(default:'even')) – Background distribution of nucleotides for computing thresholds from p-value. Three options are available: “subject” to use the subject sequences, “genome” to use the whole genome (need to provide a genome file), or “even” using 0.25 for each base, by default “even”threshold (
float(default:0)) – Score threshold for motif matches, by default 0modal (str | None)
- Return type:
- Returns:
Union[AnnData, MuData] Updated AnnData or MuData object with motif scanning results. motif_score : pd.DataFrame
DataFrame containing motif scanning results. Columns:
- seqnamestr
Peak name.
- motif_idstr
Motif ID.
- scorefloat
Motif scanning score.
- MotifScanner.show_motif_databases(motif_dir='/home/docs/checkouts/readthedocs.org/user_builds/scmagnify/envs/stable/lib/python3.10/site-packages/scmagnify/data/motifs')#
Lists available motif databases in the specified directory.
- Parameters:
motif_dir (
str(default:'/home/docs/checkouts/readthedocs.org/user_builds/scmagnify/envs/stable/lib/python3.10/site-packages/scmagnify/data/motifs')) – Directory containing motif databases, by default MOTIF_DIR.- Return type:
- Returns:
pd.DataFrame DataFrame listing available motif databases.