The *fastverse* is a suite of complementary high-performance
packages for statistical computing and data manipulation in R. Developed
independently by various people, *fastverse* packages jointly
contribute to the objectives of:

- Speeding up R through heavy use of compiled code (C, C++, Fortran)
- Enabling more complex statistical and data manipulation operations in R
- Reducing the number of dependencies required for advanced computing in R

The `fastverse`

package integrates, and provides utilities
for easy installation, loading and management of these packages. It is
an extensible framework that allows users to (permanently) add or remove
packages to create a ‘verse’ of packages suiting their general needs.
Separate ‘verses’ can also be created.

*fastverse* packages are jointly attached with
`library(fastverse)`

, and several functions starting with
`fastverse_`

help manage dependencies, detect namespace
conflicts, add/remove packages from the *fastverse* and update
packages.

The *fastverse* consists of 6 core packages (7 dependencies in
total) which provide broad C/C++ based statistical and data manipulation
functionality and have carefully managed APIs. These packages are
installed and attached along with the `fastverse`

package.

**data.table**: Enhanced data frame class with concise data manipulation framework offering powerful aggregation, extremely flexible split-apply-combine computing, reshaping, joins, rolling statistics, set operations on tables, fast csv read/write, and various utilities such as transposition of data.**collapse**: Fast grouped & weighted statistical computations, time series and panel data transformations, list-processing, data manipulation functions, summary statistics and various utilities such as support for variable labels. Class-agnostic framework designed to work with vectors, matrices, data frames, lists and related classes including*xts*,*data.table*,*tibble*,*pdata.frame*,*sf*.**matrixStats**: Efficient row-and column-wise (weighted) statistics on matrices and vectors, including computations on subsets of rows and columns.**kit**: Fast vectorized and nested switches, some parallel (row-wise) statistics, and some utilities such as efficient partial sorting and unique values.**magrittr**: Efficient pipe operators for enhanced programming and code unnesting.**fst**: A compressed data file format that is very fast to read and write. Full random access in both rows and columns allows reading subsets from a ‘.fst’ file.*Additional dependency*: Package*Rcpp*is imported by*collapse*and*fst*.

Currently, there are 2 different versions of the *fastverse*
on CRAN and GitHub/R-universe. The GitHub/R-universe version is
recommended if you want to have *matrixStats* consistently
preserve attributes of your matrices: it modifies functions in the
*matrixStats* namespace making them preserve attributes
consistently (and by default) whenever the *fastverse* is
attached. This version was rejected by CRAN because it requires a call
to `unlockBinding`

. The CRAN version takes
*matrixStats* as it is, which means most functions do not
preserve attributes such as dimension names in computations.

```
# Install the CRAN version
install.packages("fastverse")
# Install (Windows/Mac binaries) from R-universe
install.packages("fastverse", repos = "https://fastverse.r-universe.dev")
# Install from GitHub (requires compilation)
::install_github("fastverse/fastverse") remotes
```

*Note* that the GitHub/R-universe version is not a development
version, development takes place in the ‘development’ branch.
*matrixStats* is slowly evolving towards greater consistency, but
it might take more than half a year until dimension names are handled
consistently by default - due to the large number of reverse
dependencies. Until then CRAN and GitHub/R-universe versions of the
*fastverse* are released together.

In addition, users have the option (via the
`fastverse_entend()`

function) to freely attach extension
packages offering more specific functionality. The *fastverse*
can by extended by any R package, either just for the current session or
permanently:

In addition to a global customization, separate *fastverse*’s
can be created for projects by adding a `.fastverse`

config
file in the project directory and listing packages there. Only these
packages will then be loaded and managed with
`library(fastverse)`

in the project.

High-performing packages for different data manipulation and
statistical computing topics are suggested below. Each topic has a
2-character topic-id, which can be used to quickly attach all available
packages with `fastvere_extend(topcis = c(..id's..))`

, and to
install missing packages by adding argument `install = TRUE`

.
The majority of these packages provide compiled code and have few
dependencies. The total (recursive) dependency count is indicated for
each package.

**xts**and**zoo**: Fast and reliable matrix-based time series classes providing fully identified ordered observations and various utilities for plotting and computations (1 dependency).**roll**: Very fast rolling and expanding window functions for vectors and matrices (3 dependencies).*Notes*:*xts*/*zoo*objects are preserved by*roll*functions and by*collapse*’s time series and data transformation functions^{1}. As*xts*/*zoo*objects are matrices, all*matrixStats*functions apply to them as well.*xts*objects can also easily be converted to and from*data.table*.

**lubridate**: Facilitates ‘POSIX-’ and ‘Date’ based computations (2 dependencies).**anytime**: Anything to ‘POSIXct’ or ‘Date’ converter (2 dependencies).**fasttime**: Fast parsing of strings to ‘POSIXct’ (0 dependencies).**nanotime**: Provides a coherent set of temporal types and functions with nanosecond precision -

based on the ‘integer64’ class (7 dependencies).**clock**: Comprehensive library for date-time manipulations using a new family of orthogonal date-time classes (durations, time points, zoned-times, and calendars) (6 dependencies).**timechange**: Efficient manipulation of date-times accounting for time zones and daylight saving times (1 dependency).*Notes*: Date and time variables are preserved in many*data.table*and*collapse*operations.*data.table*additionally offers an efficient integer based date class ‘IDate’ with some supporting functionality.*xts*and*zoo*also provide various functions to transform dates, and*zoo*provides classes ‘yearmon’ and ‘yearqtr’ for convenient computation with monthly and quarterly data. Package*mondate*also provides a class ‘mondate’ for monthly data.

**stringi**: Main R package for fast, correct, consistent, and convenient string/text manipulation (backend to*stringr*and*snakecase*) (0 dependencies).**stringr**: Simple, consistent wrappers for common string operations, based on*stringi*(3 dependencies).**snakecase**: Convert strings into any case, based on*stringi*and*stringr*(4 dependencies).**stringfish**: Fast computation of common (base R) string operations using the ALTREP system (2 dependencies).**stringdist**: Fast computation of string distance metrics, matrices, and fuzzy matching (0 dependencies).

**Rfast**and**Rfast2**: Heterogeneous sets of fast functions for statistics, estimation and data manipulation operating on vectors and matrices. Missing values and object attributes are not (consistently) supported (4-5 dependencies).**parallelDist**: Multi-threaded distance matrix computation (3 dependencies).**coop**: Fast implementations of the covariance, correlation, and cosine similarity (0 dependencies).**rsparse**: Implements many algorithms for statistical learning on sparse matrices - matrix factorizations, matrix completion, elastic net regressions, factorization machines (8 dependencies). See also package**MatrixExtra**.**rrapply**: The`rrapply()`

function extends base`rapply()`

by including a condition or predicate function for the application of functions and diverse options to prune or aggregate the result (0 dependencies).**dqrng**: Fast uniform, normal or exponential random numbers and random sampling (i.e. faster`runif`

,`rnorm`

,`rexp`

,`sample`

and`sample.int`

functions) (3 dependencies).**fastmap**: Fast implementation of data structures based on C++, including a key-value store (`fastmap`

), stack (`faststack`

), and queue (`fastqueque`

) (0 dependencies).**fastmatch**: A faster`match()`

function (drop-in repalcement for`base::match`

, and`base::%in%`

), that keeps the hash table in memory for much faster repeated lookups (0 dependencies).*Notes*:*Rfast*has a number of like-named functions to*matrixStats*. These are simpler but typically faster and support multi-threading. Some highly efficient statistical functions can also be found scattered across various other packages, notable to mention here are*Hmisc*(60 dependencies) and*DescTools*(17 dependencies). Package*vctrs*also provides some quite efficient functions to manipulate vectors and data frames (4 dependencies).

**sf**: Leading framework for geospatial computing and manipulation in R, offering a simple and flexible spatial data frame and supporting functionality (13 dependencies).**geos**: Provides an R API to the Open Source Geometry Engine (GEOS) C-library and a vector format with which to efficiently store ‘GEOS’ geometries, functions to extract information from, calculate relationships between, and transform geometries, and facilities to import/export geometry vectors to other spatial formats (2 dependencies).**stars**: Spatiotemporal data (raster and vector) in the form of dense arrays, with space and time being array dimensions (17 dependencies).**terra**: Methods for spatial data analysis with raster and vector data. Processing of very large (out of memory) files is supported (4 dependencies).*Notes*:*collapse*can be used for efficient manipulation and computations on*sf*data frames.*sf*also offers tight integration with*dplyr*.

**dygraphs**: Interface to ‘Dygraphs’ interactive time series charting library (11 dependencies).**lattice**: Trellis graphics for R (0 dependencies).**grid**: The grid graphics package (0 dependencies).**ggplot2**: Create elegant data visualizations using the Grammar of Graphics (30 dependencies).**scales**: Scale functions for visualizations (10 dependencies).*Notes:**latticeExtra*provides extra graphical utilities base on*lattice*.*gridExtra*provides miscellaneous functions for*grid*graphics (and consequently for*ggplot2*which is based on*grid*).*gridtext*provides improved text rendering support for*grid*graphics. Many packages offer*ggplot2*extensions, (typically starting with ‘gg’) such as*ggExtra*,*ggalt*,*ggforce*,*ggmap*,*ggtext*,*ggthemes*,*ggrepel*,*ggridges*,*ggfortify*,*ggstatsplot*,*ggeffects*,*ggsignif*,*GGally*,*ggcorrplot*,*ggdendro*, etc…

**tidytable**: A tidy interface to*data.table*that is*rlang*compatible. Quite comprehensive implementation of*dplyr*,*tidyr*and*purr*functions.*tidyverse*function names are appended with a`.`

e.g.`mutate.()`

. Package uses a class*tidytable*that inherits from*data.table*. The`dt()`

function makes*data.table*syntax pipeable (14 total dependencies).**tidyfast**: Fast tidying of data. Covers*tidyr*functionality,`dt_`

prefix, preserves*data.table*object. Some unnecessary deep copies (2 dependencies).**tidyfst**: Tidy verbs for fast data manipulation. Covers*dplyr*and some*tidyr*functionality. Functions have`_dt`

suffix and preserve*data.table*object. A cheatsheet is provided (7 dependencies).**tidyft**: Tidy verbs for fast data operations by reference. Best for big data manipulation on out of memory data using facilities provided by*fst*(7 dependencies).**maditr**: Fast data aggregation, modification, and filtering with pipes and*data.table*. Minimal implementation with functions`let()`

and`take()`

for most common data manipulation tasks. Also provides Excel-like lookup functions (2 dependencies).*Notes*: One could also mention Rstudio’s*dtplyr*and the*table.express*package here, but these packages import*dplyr*and thus have a around 20 dependencies.

**qs**provides a lightning-fast and complete replacement for the`saveRDS`

and`readRDS`

functions in R. It supports general R objects with attributes and references - at similar speeds to*fst*- but does not provide on-disk random access to data subsets like*fst*(4 dependencies).**arrow**provides both a low-level interface to the Apache Arrow C++ library (a multi-language toolbox for accelerated data interchange and in-memory processing) and some higher-level, R-flavored tools for working with it - including fast reading / writing delimited files and sharing data between R and Python (12 dependencies).*Notes*: Package*vroom*offers fast reading and writing of delimited files, but with 24 dependencies is not really a*fastverse*candidate.

Feel free to notify me of any other packages you think should be
included here. Such packages should be well designed, top-performing,
low-dependency, and, with few exceptions, provide own compiled code.
Please note that the *fastverse* focuses on general purpose
statistical computing and data manipulation, thus I won’t include fast
packages to estimate specific kinds of models here (of which R also has
a great many).

*collapse*functions can also handle irregular time series, but this requires passing an integer time variable to the`t`

argument which has consecutive integer steps for regular parts of the time series and non-consecutive integers for the irregular parts.↩︎