Rforce

Rforce implements the methodology described in Rforce: Random Forests for Composite Endpoints, which models composite endpoints consisting of non-fatal events and terminal events.

The method builds random forests using generalized estimating equations (GEE) and handles dependent censoring caused by terminal events using the concept of pseudo-at-risk duration.

This work received the 2024 Student Paper Competition Award from the American Statistical Association (ASA), jointly from the Section on Statistical Computing and Section on Statistical Graphics.

The paper is published in Statistics in Medicine:

PMID: 41640374
DOI: 10.1002/sim.70413

The software provides both:

R API
C API

Key features include:

High computational and memory efficiency
Parallel computation using OpenMP
Reproducible results (see the reproducibility example here)

Installation

Dependencies

cmake >= 3.16.0 – build system for the C API
OpenMP – parallel computing
R >= 4.3.3 – R interface

Install R API

# install.packages("devtools")
devtools::install_github("yuw444/Rforce")

C API

git clone https://github.com/yuw444/Rforce.git
cd Rforce
mkdir build
cd build
cmake ..
make

A CMakeLists.txt file is provided in the repository.

Usage

R Examples

Examples: Get Started.

Shell Scripts

./Rforce [subcommands] <options>

Available Subcommands

train — Train a composite endpoint forest
predict — Predict using a trained composite endpoint forest and new observations
-h, --help — Show help message and exit

C API Subcommands

Train

Train a composite endpoint forest model.

Rforce train <options>

Options:

Option	Description	Required/Optional	Default
`-d, --designMatrixY=<str>`	Path to design matrix	Required
`-a, --auxiliary=<str>`	Path to auxiliary features	Required
`-u, --unitsOfCPIU=<str>`	Path to unitsOfCPIU file	Required
`-o, --out=<str>`	Path to output directory	Optional	Current working directory
`-v, --verbose=<int>`	Verbosity level (0–3)	Optional	0
`-m, --maxDepth=<int>`	Maximum tree depth	Optional	10
`-n, --minNodeSize=<int>`	Minimum node size	Optional	2 × len(unitsOfCPIU) - 1
`-g, --gain=<float>`	Minimum gain for split	Optional	0.0 (likelihood-based) or 1.3 (GEE-based)
`-t, --mtry=<int>`	Number of variables to try during splitting	Optional	√(number of variables)
`-s, --nsplits=<int>`	Number of splits to try per variable	Optional	10
`-r, --nTrees=<int>`	Number of trees	Optional	200
`-e, --seed=<int>`	Random seed	Optional	926
`-p, --nPerms=<int>`	Number of permutations for variable importance	Optional	10
`-u, --nVars=<int>`	Number of variables in the design matrix	Optional	Number of columns
`-i, --pathVarIds=<str>`	Variable IDs (categorical variables supported via repeated IDs)	Optional
`-x, --iDot`	Output tree DOT files	Optional	False
`-k, --k=<int>`	Bayesian estimator parameter for leaf output	Optional	4
`-L, --long`	Use multiple rows per patient (RF-SLAM style)	Optional
`-N, --nopseudo`	Do not estimate pseudo risk time	Optional
`-P, --pseudorisk1`	Use original pseudo-risk time (population level)	Optional
`-B, --pseudorisk2`	Recalculate pseudo-risk time at each tree (default)	Optional
`-D, --dynamicrisk`	Dynamically estimate pseudo-risk time at each split	Optional
`-F, --nophi`	Fix φ = 1, do not estimate φ	Optional
`-P, --phi1`	Estimate φ at population level	Optional
`-H, --phi2`	Estimate φ at tree level (default)	Optional
`-Y, --dynamicphi`	Dynamically estimate φ at each split	Optional
`-G, --gee`	Use GEE approach	Optional
`-A, --padjust=<str>`	p-value adjustment method (`bonferroni`, `holm`, `hochberg`, `hommel`, `BH`, `BY`, `none`)	Optional	`BH`
`-I, --interaction`	Add interaction terms for GEE	Optional	NULL
`-S, --asym`	Use asymptotic approach	Optional
`-T, --threads=<int>`	Number of parallel computing threads	Optional	8

Predict

Predict using a trained model and test data.

Rforce predict <options>

Options:

Option	Description	Required/Optional	Default
`-m, --model=<str>`	Path to trained model	Required
`-t, --test=<str>`	Path to test data	Required
`-o, --out=<str>`	Path to output directory	Optional	Current working directory

Examples

Train a model:

Rforce train -d design_matrix.csv -a auxiliary_features.csv -u unitsOfCPIU.txt -o output_folder -v 1

Predict with a trained model:

Rforce predict -m output_folder/model.rforce -t test_data.csv -o prediction_results/

Notes

By default, pseudo-risk time and φ (phi) are re-estimated at each tree level.
Dynamic options (--dynamicrisk, --dynamicphi) allow estimates at each split for more flexibility.
Parallel computation is supported via the --threads option.
GEE-based splitting with p-value adjustment is available.
An R API is currently actively developing which includes:
- Classical survival data generation
- Composite endpoint data generation
- Wcompo methodology realization
- An R interface to Rforce

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
.github		.github
R		R
data		data
docs		docs
man		man
src		src
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.Rprofile		.Rprofile
.gitignore		.gitignore
.lintr		.lintr
CMakeLists.txt		CMakeLists.txt
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
Rforce.Rproj		Rforce.Rproj
_pkgdown.yml		_pkgdown.yml
temp.txt		temp.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rforce

Installation

Dependencies

Install R API

C API

Usage

R Examples

Shell Scripts

Available Subcommands

C API Subcommands

Train

Predict

Examples

Notes

About

Licenses found

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Rforce

Installation

Dependencies

Install R API

C API

Usage

R Examples

Shell Scripts

Available Subcommands

C API Subcommands

Train

Predict

Examples

Notes

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages