| Title: | Dump 'R' Package Source, Documentation, and Vignettes into One File |
| Version: | 0.3.0 |
| Description: | Dump source code, documentation and vignettes of an 'R' package into a single file. Supports installed packages, tar.gz archives, and package source directories. If the package is not installed, only its source is automatically downloaded from CRAN for processing. The output is a single plain text file or a character vector, which is useful to ingest complete package documentation and source into a large language model (LLM) or pass it further to other tools, such as 'ragnar' https://github.com/tidyverse/ragnar to create a Retrieval-Augmented Generation (RAG) workflow. |
| License: | MIT + file LICENSE |
| URL: | https://github.com/e-kotov/rdocdump, https://www.ekotov.pro/rdocdump/ |
| BugReports: | https://github.com/e-kotov/rdocdump/issues |
| Suggests: | curl, pak, quarto, testthat (≥ 3.0.0), withr |
| VignetteBuilder: | quarto |
| Config/testthat/edition: | 3 |
| Encoding: | UTF-8 |
| Language: | en |
| RoxygenNote: | 7.3.3 |
| NeedsCompilation: | no |
| Packaged: | 2026-05-30 16:57:41 UTC; ek |
| Author: | Egor Kotov |
| Maintainer: | Egor Kotov <kotov.egor@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-05-30 17:10:02 UTC |
Throw an error if the package is a pre-built binary
Description
Throw an error if the package is a pre-built binary
Usage
check_if_binary(pkg_dir)
Arguments
pkg_dir |
Path to the package directory. |
Cleanup Temporary Files
Description
Clean up temporary package archive and extracted files according to a keep_files policy.
Usage
cleanup_files(pkg_info, keep_files)
Arguments
pkg_info |
A list returned by |
keep_files |
A
|
Value
Invisibly returns NULL. If there are any issues with file deletion,
warnings are issued.
Combine Rd files into a single character vector.
Description
This function reads the Rd files from a package source directory or an installed package and combines them into a single string.
Usage
combine_rd(pkg_path, is_installed = FALSE, pkg_name = NULL)
Arguments
pkg_path |
Path to the package source directory or the installed package. |
is_installed |
Logical indicating whether the package is installed
( |
pkg_name |
Optional package name if the package is installed. |
Value
A single string containing the combined Rd documentation.
Helper function to combine package vignettes
Description
Helper function to combine package vignettes
Usage
combine_vignettes(pkg_path)
Arguments
pkg_path |
Path to the package source directory. |
Value
A single string containing the combined vignettes from the package.
Extract code from an installed package using its namespace.
Description
This function retrieves all functions from the package namespace and
deparses them to get their source code. Note that extracting from an
installed package silently skips S4 classes, R6 classes, environment
objects, and datasets since it filters for is.function(). For more
complete code extraction, prefer extracting from source packages.
Usage
extract_code_installed(pkg_name)
Arguments
pkg_name |
The name of the installed package. |
Value
A single string containing the source code of all functions in the package.
Helper function to extract code from package source files.
Description
This function reads all .R files in the R directory and optionally
includes files from the tests directory. It can also exclude roxygen2
documentation lines.
Usage
extract_code_source(pkg_path, include_tests = FALSE, include_roxygen = FALSE)
Arguments
pkg_path |
Path to the package source directory. |
include_tests |
|
include_roxygen |
|
Value
A single string containing the source code from the package's R files.
Find Package Directory Within Extracted Bundle
Description
Find Package Directory Within Extracted Bundle
Usage
find_pkg_dir(extract_dir, subdir = NULL)
Arguments
extract_dir |
Directory where bundle was extracted |
subdir |
Optional subdirectory path |
Value
Path to package directory
Get Cache Directory for Remote Package
Description
Get Cache Directory for Remote Package
Usage
get_remote_cache_dir(parsed, cache_path)
Arguments
parsed |
Parsed reference |
cache_path |
Base cache path |
Value
Path to cache directory
Check if a directory contains a pre-built binary package
Description
Check if a directory contains a pre-built binary package
Usage
is_binary_pkg(pkg_dir)
Arguments
pkg_dir |
Path to the package directory. |
Value
Logical indicating if it is a binary package.
Check if String is a Remote Package Reference
Description
Check if String is a Remote Package Reference
Usage
is_remote_reference(pkg)
Arguments
pkg |
Character string to check |
Value
TRUE if it looks like a remote reference
Parse Remote Reference String
Description
Supports any format supported by pak. See ?pak::pak_package_sources
for details.
Usage
parse_remote_ref(ref)
Arguments
ref |
Character string reference |
Details
Ambiguous web URLs (e.g. branch names like feat/foo/pkg) are best
supplied as user/repo@ref/subdir; the URL heuristic is intentionally
limited.
Value
List with components: type, user, repo, ref, subdir
Parse GitHub or GitLab Web URL
Description
Parse GitHub or GitLab Web URL
Usage
parse_remote_url(url)
Arguments
url |
The full URL string |
Value
List with components: type, user, repo, ref, subdir
Extract R Source Code from a Package
Description
This function extracts the R source code from a package. For installed
packages, it retrieves the package namespace and deparses all functions found
in the package. For package source directories or archives (non-installed
packages), it reads all .R files from the R directory and, optionally,
from the tests directory. Optionally, it can include roxygen2 documentation
from these files.
Usage
rdd_extract_code(
pkg,
file = NULL,
include_tests = FALSE,
include_roxygen = FALSE,
force_fetch = FALSE,
version = NULL,
cache_path = getOption("rdocdump.cache_path"),
keep_files = "none",
repos = getOption("rdocdump.repos", getOption("repos"))
)
Arguments
pkg |
A
|
file |
Optional. Save path for the output text file. If set, the
function will return the path to the file instead of the combined text.
Defaults to |
include_tests |
|
include_roxygen |
|
force_fetch |
|
version |
Optional. A |
cache_path |
A |
keep_files |
A
|
repos |
A |
Details
For remote repositories, rdocdump uses pak for resolution. If pak
cannot find an R package at the root or the specified subdirectory, the
function will automatically fall back to downloading the full repository
and searching for the shallowest directory containing a DESCRIPTION file.
Value
A single string containing the combined R source code (and, optionally, roxygen2 documentation) from the package.
Examples
# Extract only R source code (excluding roxygen2 documentation) from an
# installed package.
code <- rdd_extract_code("splines")
cat(substr(code, 1, 1000))
# Extract R source code including roxygen2 documentation from a package
# source directory.
# set cache directory for `rdocdump`
rdd_set_cache_path(paste0(tempdir(), "/rdocdump_cache"))
local({
code_with_roxygen <- rdd_extract_code(
"ini",
include_roxygen = TRUE,
force_fetch = TRUE,
repos = c("CRAN" = "https://cran.r-project.org")
)
cat(substr(code_with_roxygen, 1, 1000))
})
# Extract R source code from a package source directory,
# including test files but excluding roxygen2 docs.
local({
code_with_tests <- rdd_extract_code(
"ini",
include_roxygen = TRUE,
include_tests = TRUE,
force_fetch = TRUE,
repos = c("CRAN" = "https://cran.r-project.org")
)
cat(substr(code_with_tests, 1, 1000))
})
# clean cache directory
unlink(getOption("rdocdump.cache_path"), recursive = TRUE, force = TRUE)
Get Current rdocdump Repository Options
Description
This function returns the current repository URLs used by rdocdump. The
default is set to the CRAN repository at "https://cloud.r-project.org". This
does not affect the repositories used by install.packages() in your current
R session and/or project. To set repository options, use
rdd_set_repos.
Usage
rdd_get_repos()
Value
A character vector of repository URLs.
Examples
# Get current rdocdump repository options
rdd_get_repos()
Set rdocdump Cache Path in the Current R Session
Description
This function sets the cache path used by rdocdump to store temporary
files (downloaded tar.gz archives and/or extracted directories) for the
current R session. The cache path is stored in the option
"rdocdump.cache_path", which can be checked with
getOption("rdocdump.cache_path"). The path is created if it does not
exist.
Usage
rdd_set_cache_path(path)
Arguments
path |
A |
Value
Invisibly returns the new cache path.
Examples
# set cache directory for `rdocdump`
rdd_set_cache_path(paste0(tempdir(), "/rdocdump_cache"))
# default cache directory
unlink(getOption("rdocdump.cache_path"), recursive = TRUE)
Set rdocdump Repository Options
Description
This function sets the package repository URLs used by rdocdump when
fetching package sources. May be useful for setting custom repositories or
mirrors. This does not affect the repositories used by install.packages()
in your current R session and/or project.
Usage
rdd_set_repos(repos)
Arguments
repos |
A character vector of repository URLs. |
Value
Invisibly returns the new repository URLs.
Examples
# Set rdocdump repository options
rdd_set_repos(c("CRAN" = "https://cloud.r-project.org"))
Dump Package Source, Documentation and Vignettes into Plain Text
Description
This function produces a single text output for an R package by processing its documentation (Rd files from the package source or the documentation from already installed packages), vignettes, and/or R source code.
Usage
rdd_to_txt(
pkg,
file = NULL,
content = "all",
force_fetch = FALSE,
version = NULL,
keep_files = "none",
cache_path = getOption("rdocdump.cache_path"),
repos = getOption("rdocdump.repos", getOption("repos"))
)
Arguments
pkg |
A
|
file |
Optional. Save path for the output text file. If set, the
function will return the path to the file instead of the combined text.
Defaults to |
content |
A character vector specifying which components to include in the output. Possible values are:
You can specify multiple options (e.g., |
force_fetch |
|
version |
Optional. A |
keep_files |
A
|
cache_path |
A |
repos |
A |
Value
A single string containing the combined package documentation,
vignettes, and/or code as specified by the content argument. If the
file argument is set, returns the path to the file.
Examples
# Extract documentation for built-in `stats` package (both docs and
# vignettes).
docs <- rdd_to_txt("splines")
cat(substr(docs, 1, 500))
## Not run:
# Extract from GitHub repository
docs <- rdd_to_txt("r-lib/rlang")
# Extract specific version from GitHub
docs <- rdd_to_txt("r-lib/rlang@v1.1.0")
# Extract from GitLab
docs <- rdd_to_txt("gitlab::user/repo")
# Auto-discovery of packages in subdirectories (e.g., if repo root is not the pkg)
docs <- rdd_to_txt("ipeaGIT/r5r")
# Manual subdirectory specification (useful for disambiguation)
docs <- rdd_to_txt("ipeaGIT/r5r/r-package")
## End(Not run)
## Not run:
# set cache directory for `rdocdump`
rdd_set_cache_path(paste0(tempdir(), "/rdocdump_cache"))
# Extract only documentation for rJavaEnv by downloading its source from CRAN
docs <- rdd_to_txt(
"rJavaEnv",
force_fetch = TRUE,
content = "docs",
repos = c("CRAN" = "https://cran.r-project.org")
)
lines <- unlist(strsplit(docs, "\n"))
# Print the first 3 lines
cat(head(lines, 3), sep = "\n")
# Print the last 3 lines
cat(tail(lines, 3), sep = "\n")
# clean cache directory
unlink(getOption("rdocdump.cache_path"), recursive = TRUE, force = TRUE)
## End(Not run)
Resolve the path to a package directory or tarball
Description
This function resolves the path to a package directory or tarball, handling both installed packages and source packages from CRAN.
Usage
resolve_pkg_path(
pkg,
cache_path = NULL,
force_fetch = FALSE,
version = NULL,
repos = getOption("rdocdump.repos", getOption("repos"))
)
Arguments
pkg |
A
|
cache_path |
A |
force_fetch |
|
version |
Optional. A |
repos |
A |
Value
A list containing:
-
pkg_path: Path to the package directory or tarball. -
extracted_path: Path to the extracted package directory (if applicable). -
tar_path: Path to the tarball if it was downloaded. -
is_installed: Logical indicating if the package is installed. -
pkg_name: Package name (always populated when known). -
pkg_version: Package version (NULL if not known).
Resolve Remote Package References (GitHub, GitLab, Bioconductor, etc.)
Description
Downloads package source from remote repositories without installing the
package. Uses the pak package for downloading.
If pak fails to resolve the reference (e.g., because the R package is in a
subdirectory and no subdir was provided), the function automatically
falls back to downloading the full repository and scanning for the
shallowest directory containing a DESCRIPTION file.
Usage
resolve_remote_pkg(pkg_ref, cache_path = NULL)
Arguments
pkg_ref |
A character string specifying the remote package reference.
Supports any format supported by
|
cache_path |
Optional path to cache directory. If NULL, uses temp directory. |
Details
The auto-discovery mechanism uses two fallback tiers if pak resolution
fails:
-
Archive Download: Attempts to download a
.tar.gzarchive of the repository for known hosts (GitHub, GitLab, Bitbucket). -
Git Clone: Uses
git clone --depth 1for arbitrary Git URLs or if the archive download fails (requires systemgit).
Once downloaded, it recursively searches for DESCRIPTION files and selects
the one closest to the repository root.
Value
A list containing:
-
pkg_path: Path to the package directory -
extracted_path: Path to the extracted bundle -
tar_path: Path to the downloaded tarball -
is_installed: FALSE (always FALSE for remote packages) -
pkg_name: Package name reported by pak (or repo slug fallback) -
pkg_version: Package version reported by pak, if available -
remote_info: Parsed remote reference information