scmagnify.tools.MotifScanner

scmagnify.tools.MotifScanner#

class scmagnify.tools.MotifScanner(motif_db=None, motif_objects=None, genome_file=None)#

A class for scanning DNA sequences for motifs using position frequency matrices (PFMs).

Parameters:

motif_db (Optional[str] (default: None)) – Path to motif database file. If a basename is provided, it is treated as a database name from the default motif directory (MOTIF_DIR) and loaded in PFM format.
motif_objects (Optional[list[PFM]] (default: None)) – List of PFM objects to initialize the scanner with.
genome_file (Optional[str] (default: None)) – Path to genome FASTA file. Defaults to path in scmagnify settings.

Methods table#

`add_custom_motif`(matrix_id, name, counts_df)	Adds a custom motif to the collection from a pandas DataFrame.
`export_motifs`(output_path, format, **kwargs)	Exports the motifs stored in the scanner to a specified format.
`import_motifs`([motif_db, format, ...])	Imports motifs from a file, skipping duplicates and optionally filtering by factors.
`match`(data[, peak_selected, pseudocounts, ...])	Perform motif scanning on the selected peaks using stored motifs.
`show_motif_databases`([motif_dir])	Lists available motif databases in the specified directory.

Methods#

MotifScanner.add_custom_motif(matrix_id, name, counts_df)#

Adds a custom motif to the collection from a pandas DataFrame.

Parameters:

matrix_id (str) – A unique identifier for the new motif. Must not already exist.
name (str) – The name of the transcription factor or motif.
counts_df (DataFrame) – A DataFrame with columns ‘A’, ‘C’, ‘G’, ‘T’ representing the PFM.

Return type:

None

MotifScanner.export_motifs(output_path, format, **kwargs)#

Exports the motifs stored in the scanner to a specified format.

Parameters:

output_path (str) – Path for the output file or directory.
format (Literal['meme', 'jaspar', 'pfm']) – The target motif format (‘meme’, ‘jaspar’, or ‘pfm’).
**kwargs – Additional arguments passed to the respective write function (e.g., pseudo_counts).

Return type:

None

MotifScanner.import_motifs(motif_db='HOCOMOCOv11_HUMAN', format=None, factor_file=None, factor_list=None, target_organism=None)#

Imports motifs from a file, skipping duplicates and optionally filtering by factors.

If motif_db is provided without an extension, it is treated as a database name from the default motif directory (MOTIF_DIR) and loaded in PFM format.

Parameters:

motif_db (str (default: 'HOCOMOCOv11_HUMAN')) – Path to the input motif file or a database name (e.g., “HOCOMOCOv11_HUMAN”).
format (Optional[Literal['meme', 'jaspar', 'pfm']] (default: None)) – The format of the input file (‘meme’, ‘jaspar’, or ‘pfm’). Defaults to ‘pfm’ if a basename is provided.
factor_file (Optional[str] (default: None)) – Path to the motif-to-factors mapping file. Required for ‘pfm’ format.
factor_list (Optional[list[str]] (default: None)) – A list of factor names. If provided, only motifs associated with these factors will be imported.
target_organism (str | None)

MotifScanner.match(data, peak_selected=None, pseudocounts=0.0001, p_value=5e-05, background='even', threshold=0, modal='ATAC')#

Perform motif scanning on the selected peaks using stored motifs.

Parameters:

data (AnnData | MuData) – AnnData object with peak counts or MuData object with ‘ATAC’ modality.
peak_selected (Optional[list[str]] (default: None)) – List of selected peaks. If None, uses all peaks in data.uns["peak_gene_corrs"]["filtered_corrs"].
pseudocounts (float (default: 0.0001)) – Pseudocounts for each nucleotide, by default 0.0001 moods-dna.py:0.01 pychromVAR:0.0001 motifmatchr:0.8
p_value (float (default: 5e-05)) – P-value threshold for motif matching, by default 5e-05
background (str (default: 'even')) – Background distribution of nucleotides for computing thresholds from p-value. Three options are available: “subject” to use the subject sequences, “genome” to use the whole genome (need to provide a genome file), or “even” using 0.25 for each base, by default “even”
threshold (float (default: 0)) – Score threshold for motif matches, by default 0
modal (str | None)

Return type:

AnnData | MuData

Returns:

Union[AnnData, MuData] Updated AnnData or MuData object with motif scanning results. motif_score : pd.DataFrame

DataFrame containing motif scanning results. Columns:

seqnamestr
Peak name.

motif_idstr
Motif ID.

scorefloat
Motif scanning score.

MotifScanner.show_motif_databases(motif_dir='/home/docs/checkouts/readthedocs.org/user_builds/scmagnify/envs/stable/lib/python3.10/site-packages/scmagnify/data/motifs')#

Lists available motif databases in the specified directory.

Parameters:: motif_dir (str (default: '/home/docs/checkouts/readthedocs.org/user_builds/scmagnify/envs/stable/lib/python3.10/site-packages/scmagnify/data/motifs')) – Directory containing motif databases, by default MOTIF_DIR.
Return type:: DataFrame
Returns:: pd.DataFrame DataFrame listing available motif databases.

scmagnify.tools.MotifScanner

Contents

scmagnify.tools.MotifScanner#

Methods table#

Methods#