scmagnify.tools.MotifScanner#

class scmagnify.tools.MotifScanner(motif_db=None, motif_objects=None, genome_file=None)#

A class for scanning DNA sequences for motifs using position frequency matrices (PFMs).

Parameters:
  • motif_db (Optional[str] (default: None)) – Path to motif database file. If a basename is provided, it is treated as a database name from the default motif directory (MOTIF_DIR) and loaded in PFM format.

  • motif_objects (Optional[list[PFM]] (default: None)) – List of PFM objects to initialize the scanner with.

  • genome_file (Optional[str] (default: None)) – Path to genome FASTA file. Defaults to path in scmagnify settings.

Methods table#

add_custom_motif(matrix_id, name, counts_df)

Adds a custom motif to the collection from a pandas DataFrame.

export_motifs(output_path, format, **kwargs)

Exports the motifs stored in the scanner to a specified format.

import_motifs([motif_db, format, ...])

Imports motifs from a file, skipping duplicates and optionally filtering by factors.

match(data[, peak_selected, pseudocounts, ...])

Perform motif scanning on the selected peaks using stored motifs.

show_motif_databases([motif_dir])

Lists available motif databases in the specified directory.

Methods#

MotifScanner.add_custom_motif(matrix_id, name, counts_df)#

Adds a custom motif to the collection from a pandas DataFrame.

Parameters:
  • matrix_id (str) – A unique identifier for the new motif. Must not already exist.

  • name (str) – The name of the transcription factor or motif.

  • counts_df (DataFrame) – A DataFrame with columns ‘A’, ‘C’, ‘G’, ‘T’ representing the PFM.

Return type:

None

MotifScanner.export_motifs(output_path, format, **kwargs)#

Exports the motifs stored in the scanner to a specified format.

Parameters:
  • output_path (str) – Path for the output file or directory.

  • format (Literal['meme', 'jaspar', 'pfm']) – The target motif format (‘meme’, ‘jaspar’, or ‘pfm’).

  • **kwargs – Additional arguments passed to the respective write function (e.g., pseudo_counts).

Return type:

None

MotifScanner.import_motifs(motif_db='HOCOMOCOv11_HUMAN', format=None, factor_file=None, factor_list=None, target_organism=None)#

Imports motifs from a file, skipping duplicates and optionally filtering by factors.

If motif_db is provided without an extension, it is treated as a database name from the default motif directory (MOTIF_DIR) and loaded in PFM format.

Parameters:
  • motif_db (str (default: 'HOCOMOCOv11_HUMAN')) – Path to the input motif file or a database name (e.g., “HOCOMOCOv11_HUMAN”).

  • format (Optional[Literal['meme', 'jaspar', 'pfm']] (default: None)) – The format of the input file (‘meme’, ‘jaspar’, or ‘pfm’). Defaults to ‘pfm’ if a basename is provided.

  • factor_file (Optional[str] (default: None)) – Path to the motif-to-factors mapping file. Required for ‘pfm’ format.

  • factor_list (Optional[list[str]] (default: None)) – A list of factor names. If provided, only motifs associated with these factors will be imported.

  • target_organism (str | None)

MotifScanner.match(data, peak_selected=None, pseudocounts=0.0001, p_value=5e-05, background='even', threshold=0, modal='ATAC')#

Perform motif scanning on the selected peaks using stored motifs.

Parameters:
  • data (AnnData | MuData) – AnnData object with peak counts or MuData object with ‘ATAC’ modality.

  • peak_selected (Optional[list[str]] (default: None)) – List of selected peaks. If None, uses all peaks in data.uns["peak_gene_corrs"]["filtered_corrs"].

  • pseudocounts (float (default: 0.0001)) – Pseudocounts for each nucleotide, by default 0.0001 moods-dna.py:0.01 pychromVAR:0.0001 motifmatchr:0.8

  • p_value (float (default: 5e-05)) – P-value threshold for motif matching, by default 5e-05

  • background (str (default: 'even')) – Background distribution of nucleotides for computing thresholds from p-value. Three options are available: “subject” to use the subject sequences, “genome” to use the whole genome (need to provide a genome file), or “even” using 0.25 for each base, by default “even”

  • threshold (float (default: 0)) – Score threshold for motif matches, by default 0

  • modal (str | None)

Return type:

AnnData | MuData

Returns:

Union[AnnData, MuData] Updated AnnData or MuData object with motif scanning results. motif_score : pd.DataFrame

DataFrame containing motif scanning results. Columns:

seqnamestr

Peak name.

motif_idstr

Motif ID.

scorefloat

Motif scanning score.

MotifScanner.show_motif_databases(motif_dir='/home/docs/checkouts/readthedocs.org/user_builds/scmagnify/envs/stable/lib/python3.10/site-packages/scmagnify/data/motifs')#

Lists available motif databases in the specified directory.

Parameters:

motif_dir (str (default: '/home/docs/checkouts/readthedocs.org/user_builds/scmagnify/envs/stable/lib/python3.10/site-packages/scmagnify/data/motifs')) – Directory containing motif databases, by default MOTIF_DIR.

Return type:

DataFrame

Returns:

pd.DataFrame DataFrame listing available motif databases.