openalexSnapshot

openalexSnapshot converts the OpenAlex bulk snapshot from gzipped newline-delimited JSON to Parquet, builds fast ID-lookup indexes, and extracts individual records by OpenAlex ID. The heavy lifting is done by a compiled Rust library (statically linked via extendr), with no external binary dependency. For API-based access to OpenAlex, see openalexPro.

Installation

Install from r-universe (precompiled binaries available for macOS and Linux — no Rust toolchain required):

install.packages(
  "openalexSnapshot",
  repos = c("https://rkrug.r-universe.dev", "https://cloud.r-project.org")
)

Install the development version from GitHub:

# install.packages("pak")
pak::pak("openalexPro/openalexSnapshot")

Hardware Requirements

Resource	Minimum	Recommended
Disk space	2.5 TB	3+ TB
RAM	16 GB	32+ GB
CPU	2 cores	4+ cores

Quick Start

library(openalexSnapshot)

root <- "/Volumes/openalex"

# 1. Convert the snapshot to Parquet
snapshot_to_parquet(
  root_dir     = root,
  workers      = 4,
  memory_limit = 15000   # MB
)

# 2. Build ID indexes
build_corpus_index(
  root_dir  = root,
  data_sets = "works",
  workers   = 4
)

# 3. Look up specific records by OpenAlex ID
out_dir <- file.path(root, "my_extract")
lookup_by_id(
  index_file = file.path(root, "parquet", "works_id_idx.parquet"),
  ids        = c("W2741809807", "W2100837269"),
  output_dir = out_dir
)

# 4. Read results
library(arrow)
works <- open_dataset(out_dir) |> collect()

Documentation

Full documentation and articles are available at https://openalexpro.github.io/openalexSnapshot.

Working with the OpenAlex Bulk Snapshot — download, convert, index, and query the full snapshot
Snapshot Conversion: From JSON to Parquet — detailed function reference

Related packages

openalexPro — API access, tidy data frames, and advanced OpenAlex workflows

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github/workflows		.github/workflows
R		R
man		man
src		src
tests		tests
tools		tools
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml
cleanup		cleanup
configure		configure
configure.win		configure.win

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

openalexSnapshot

Installation

Hardware Requirements

Quick Start

Documentation

Related packages

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

openalexSnapshot

Installation

Hardware Requirements

Quick Start

Documentation

Related packages

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages