Getting Started with ankiR

ankiR provides a tidy interface for reading Anki flashcard databases in R. This vignette shows common workflows for analyzing your Anki learning data.

Installation

# From CRAN
install.packages("ankiR")

# Or from GitHub for the development version
remotes::install_github("chrislongros/ankiR")

Opening a Collection

ankiR can automatically detect your Anki installation:

library(ankiR)

# Auto-detect (uses first profile found)
col <- anki_collection()

# Specify a profile
col <- anki_collection(profile = "User 1")

# Or provide a path directly
col <- anki_collection(path = "/path/to/collection.anki2")

The collection object provides methods to access different data:

notes <- col$notes()
cards <- col$cards()
reviews <- col$revlog()
decks <- col$decks()
models <- col$models()

# Always close when done
col$close()

Convenience Functions

For one-off queries, use the standalone functions. They handle connection cleanup automatically:

# These are equivalent to opening, querying, and closing
notes <- anki_notes()
cards <- anki_cards()
reviews <- anki_revlog()
decks <- anki_decks()
models <- anki_models()

Understanding the Data

Notes

Notes contain the actual content of your flashcards:

notes <- anki_notes()
# nid: Note ID
# mid: Model (note type) ID
# tags: Space-separated tags
# flds: Fields separated by \x1f character
# sfld: Sort field (usually the front)

Cards

Cards are generated from notes. One note can produce multiple cards:

cards <- anki_cards()
# cid: Card ID
# nid: Note ID (links to notes table)
# did: Deck ID
# type: 0=new, 1=learning, 2=review, 3=relearning
# queue: -1=suspended, 0=new, 1=learning, 2=review
# due: Due date/position
# ivl: Current interval in days
# reps: Number of reviews
# lapses: Number of times forgotten

Decks

decks <- anki_decks()
# did: Deck ID
# name: Deck name (includes parent::child hierarchy)

Review Log

Every review is recorded:

reviews <- anki_revlog()
# rid: Review ID (timestamp in milliseconds)
# cid: Card ID
# ease: Button pressed (1=Again, 2=Hard, 3=Good, 4=Easy)
# ivl: Interval after review
# time: Time taken in milliseconds
# review_date: Date of review

Working with FSRS

If you use FSRS (Free Spaced Repetition Scheduler), ankiR can extract the memory state parameters:

cards_fsrs <- anki_cards_fsrs()

# Additional columns:
# stability: Time in days for recall probability to drop to 90%
# difficulty: How hard the card is (1-10)
# desired_retention: Target recall probability
# decay: FSRS-6 decay parameter (w20)

Calculating Retrievability

Retrievability is the probability you’ll recall a card right now:

# For a card with 30-day stability, reviewed 15 days ago
fsrs_retrievability(stability = 30, days_elapsed = 15)
#> 0.946

# Using the per-card decay from FSRS-6
fsrs_retrievability(stability = 30, days_elapsed = 15, decay = 0.3)

Calculating Optimal Intervals

# When should I review for 90% retention?
fsrs_interval(stability = 30, desired_retention = 0.9)
#> 30

# For 85% retention (more reviews, better memory)
fsrs_interval(stability = 30, desired_retention = 0.85)
#> 21.3

Example Analysis: Review Patterns

library(ankiR)
library(dplyr)
library(ggplot2)

# Get data
reviews <- anki_revlog()
cards <- anki_cards()
decks <- anki_decks()

# Daily review count
daily_reviews <- reviews |>
  count(review_date, name = "reviews")

ggplot(daily_reviews, aes(review_date, reviews)) +
  geom_col(fill = "steelblue") +
  labs(title = "Daily Reviews", x = NULL, y = "Reviews") +
  theme_minimal()

# Card maturity by deck
cards |>
  left_join(decks, by = "did") |>
  filter(type == 2) |>  # Review cards only
  group_by(name) |>
  summarise(
    cards = n(),
    avg_interval = mean(ivl),
    mature = sum(ivl >= 21),  # Cards with 21+ day interval
    .groups = "drop"
  ) |>
  arrange(desc(cards))

Example: FSRS Memory Analysis

cards_fsrs <- anki_cards_fsrs()

# Distribution of stability values
cards_fsrs |>
  filter(!is.na(stability), stability > 0) |>
  ggplot(aes(stability)) +
  geom_histogram(bins = 50, fill = "steelblue") +
  scale_x_log10() +
  labs(
    title = "Distribution of Card Stability",
    x = "Stability (days, log scale)",
    y = "Count"
  ) +
  theme_minimal()

# Difficulty vs Stability
cards_fsrs |>
  filter(!is.na(stability), !is.na(difficulty)) |>
  ggplot(aes(difficulty, stability)) +
  geom_point(alpha = 0.3) +
  scale_y_log10() +
  labs(
    title = "Card Difficulty vs Stability",
    x = "Difficulty (1-10)",
    y = "Stability (days, log scale)"
  ) +
  theme_minimal()

Tips

  1. Close connections: Always call col$close() when using anki_collection() directly, or use the convenience functions which handle this automatically.

  2. Anki must be closed: The database is locked while Anki is running. Close Anki before reading the database.

  3. Backup first: While ankiR only reads data (never writes), it’s good practice to backup your collection before any analysis.

  4. Large collections: For very large collections, consider using SQL queries directly via DBI::dbGetQuery(col$con, "SELECT ...") for better performance.