Skip to content

yuw444/Rforce

Repository files navigation

Rforce

Rforce implements the methodology described in Rforce: Random Forests for Composite Endpoints, which models composite endpoints consisting of non-fatal events and terminal events.

The method builds random forests using generalized estimating equations (GEE) and handles dependent censoring caused by terminal events using the concept of pseudo-at-risk duration.

This work received the 2024 Student Paper Competition Award from the American Statistical Association (ASA), jointly from the Section on Statistical Computing and Section on Statistical Graphics.

The paper is published in Statistics in Medicine:

The software provides both:

  • R API
  • C API

Key features include:

  • High computational and memory efficiency
  • Parallel computation using OpenMP
  • Reproducible results (see the reproducibility example here)

Installation

Dependencies

  • cmake >= 3.16.0 – build system for the C API
  • OpenMP – parallel computing
  • R >= 4.3.3 – R interface

Install R API

# install.packages("devtools")
devtools::install_github("yuw444/Rforce")

C API

git clone https://github.com/yuw444/Rforce.git
cd Rforce
mkdir build
cd build
cmake ..
make

A CMakeLists.txt file is provided in the repository.


Usage

R Examples


Shell Scripts

./Rforce [subcommands] <options>

Available Subcommands

  • train — Train a composite endpoint forest
  • predict — Predict using a trained composite endpoint forest and new observations
  • -h, --help — Show help message and exit

C API Subcommands

Train

Train a composite endpoint forest model.

Rforce train <options>

Options:

Option Description Required/Optional Default
-d, --designMatrixY=<str> Path to design matrix Required
-a, --auxiliary=<str> Path to auxiliary features Required
-u, --unitsOfCPIU=<str> Path to unitsOfCPIU file Required
-o, --out=<str> Path to output directory Optional Current working directory
-v, --verbose=<int> Verbosity level (0–3) Optional 0
-m, --maxDepth=<int> Maximum tree depth Optional 10
-n, --minNodeSize=<int> Minimum node size Optional 2 × len(unitsOfCPIU) - 1
-g, --gain=<float> Minimum gain for split Optional 0.0 (likelihood-based) or 1.3 (GEE-based)
-t, --mtry=<int> Number of variables to try during splitting Optional √(number of variables)
-s, --nsplits=<int> Number of splits to try per variable Optional 10
-r, --nTrees=<int> Number of trees Optional 200
-e, --seed=<int> Random seed Optional 926
-p, --nPerms=<int> Number of permutations for variable importance Optional 10
-u, --nVars=<int> Number of variables in the design matrix Optional Number of columns
-i, --pathVarIds=<str> Variable IDs (categorical variables supported via repeated IDs) Optional
-x, --iDot Output tree DOT files Optional False
-k, --k=<int> Bayesian estimator parameter for leaf output Optional 4
-L, --long Use multiple rows per patient (RF-SLAM style) Optional
-N, --nopseudo Do not estimate pseudo risk time Optional
-P, --pseudorisk1 Use original pseudo-risk time (population level) Optional
-B, --pseudorisk2 Recalculate pseudo-risk time at each tree (default) Optional
-D, --dynamicrisk Dynamically estimate pseudo-risk time at each split Optional
-F, --nophi Fix φ = 1, do not estimate φ Optional
-P, --phi1 Estimate φ at population level Optional
-H, --phi2 Estimate φ at tree level (default) Optional
-Y, --dynamicphi Dynamically estimate φ at each split Optional
-G, --gee Use GEE approach Optional
-A, --padjust=<str> p-value adjustment method (bonferroni, holm, hochberg, hommel, BH, BY, none) Optional BH
-I, --interaction Add interaction terms for GEE Optional NULL
-S, --asym Use asymptotic approach Optional
-T, --threads=<int> Number of parallel computing threads Optional 8

Predict

Predict using a trained model and test data.

Rforce predict <options>

Options:

Option Description Required/Optional Default
-m, --model=<str> Path to trained model Required
-t, --test=<str> Path to test data Required
-o, --out=<str> Path to output directory Optional Current working directory

Examples

Train a model:

Rforce train -d design_matrix.csv -a auxiliary_features.csv -u unitsOfCPIU.txt -o output_folder -v 1

Predict with a trained model:

Rforce predict -m output_folder/model.rforce -t test_data.csv -o prediction_results/

Notes

  • By default, pseudo-risk time and φ (phi) are re-estimated at each tree level.
  • Dynamic options (--dynamicrisk, --dynamicphi) allow estimates at each split for more flexibility.
  • Parallel computation is supported via the --threads option.
  • GEE-based splitting with p-value adjustment is available.
  • An R API is currently actively developing which includes:
    • Classical survival data generation
    • Composite endpoint data generation
    • Wcompo methodology realization
    • An R interface to Rforce

About

Rforce: Random forests for composite endpoints

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Packages

 
 
 

Contributors