Misha Basics (Short Guide)

This page gives a compact mental model for misha. Use it as the first quick read before the full Manual vignette.

The Core Idea

Most analyses follow the same pattern:

  1. Choose where to look (intervals / scope).
  2. Choose how to walk through it (iterator).
  3. Evaluate a track expression over those iterator intervals.

In misha this is usually one call to gextract, gscreen, or gsummary.

You are not limited to raw track names. You can pass full expressions, for example log(dense_track + 1), dense_track / (chip.sum + 1e-6), or pmin(dense_track, 2).

All examples below assume the bundled examples database:

library(misha)
gdb.init_examples()

Four Concepts You Need First

1) Track

A track is genomic signal organized over coordinates.

Useful starter commands:

gtrack.ls() # list tracks in the examples DB
gtrack.info("dense_track") # inspect type/metadata
gtrack.info("sparse_track")

For intuition, you can think of dense_track as a ChIP-seq-like coverage signal.

2) Intervals

An interval set defines genomic regions (chrom, start, end) where you want to work.

regions <- gintervals(1, c(0, 250000), c(100000, 260000))

3) Iterator

The iterator is the stepping policy inside the scope.

Think of it as: scope says where, iterator says in what chunks.

out <- gextract("dense_track", regions, iterator = 100)
log_out <- gextract("log(dense_track + 1)", regions, iterator = 100)

Create and use an intervals set as an iterator:

gintervals.save(regions, "my_intervals_set")
out2 <- gextract("dense_track", gintervals.all(), iterator = "my_intervals_set")

4) Virtual Track

A virtual track is a named on-the-fly transformation, not stored as a physical track file.

Examples:

gvtrack.create("chip.sum", "dense_track", "sum")
out <- gextract("chip.sum", regions, iterator = 200)

You can also shift the iterator window used by the virtual track:

gvtrack.create("chip.shifted", "dense_track", "sum")
gvtrack.iterator("chip.shifted", sshift = -100, eshift = 100)
out <- gextract("chip.shifted", regions, iterator = 200)

Here, each iterator interval is expanded by 100 bp on both sides before evaluating dense_track.

Virtual tracks are session objects (easy to list with gvtrack.ls and delete with gvtrack.rm).

Minimal Workflow

library(misha)
gdb.init_examples()

# 1) pick scope
regions <- gintervals(1, 0, 50000)

# 2) inspect available tracks
print(gtrack.ls())

# 3) extract signal with a chosen iterator
chip <- gextract("dense_track", regions, iterator = 100)

# 4) screen high-signal bins (as a simple peak-like filter)
hi_chip <- gscreen("dense_track > 0.6", regions, iterator = 100)

# 5) summarize distribution/coverage
stats <- gsummary("dense_track", regions, iterator = 100)

PWM in One Minute

A PWM/PSSM is a motif model over A/C/G/T. In misha, a common pattern is:

  1. Extract sequence from intervals.
  2. Score those sequences with a PWM.
regions <- gintervals(1, c(1000, 2000), c(1020, 2020))
seqs <- gseq.extract(regions)

pssm <- matrix(c(
    0.80, 0.05, 0.10, 0.05,
    0.10, 0.10, 0.70, 0.10,
    0.05, 0.80, 0.05, 0.10,
    0.10, 0.10, 0.10, 0.70
), ncol = 4, byrow = TRUE)
colnames(pssm) <- c("A", "C", "G", "T")

scores <- gseq.pwm(seqs, pssm, mode = "lse")

If your database has motif files under pssms/, you can create a genome-wide PWM-energy track with gtrack.create_pwm_energy(...).