nmathopencl is an OpenCL port of R's Mathlib (nmath) --- the C library
that powers the statistical and mathematical functions in R. Its primary
purpose is to serve as a reusable backend library for developers who want
to write GPU-accelerated R code that calls nmath functions from within their
own custom OpenCL kernels.
Suppose you are writing an R package that benefits from GPU computation. Your algorithm is embarrassingly parallel --- many independent evaluations, no data dependency between them --- and you want to dispatch that work to an OpenCL device. So far, so good.
The problem arises when the computation you need to parallelize is not just
arithmetic, but statistical math: log-likelihoods that call lgamma or
lbeta, sampling routines that call rgamma or rnorm, acceptance criteria
that evaluate pbeta or pnorm. These functions exist in R and in R's C
library, but they are designed for sequential host execution. They are not
available inside an OpenCL kernel. A GPU kernel cannot call stats::dgamma.
Before nmathopencl, a developer wanting GPU-accelerated statistical math had
two options: find a third-party GPU math library and translate their algorithm
into that library's API, or port the required nmath functions themselves. Both
options are substantial engineering work, and neither produces something that
other R developers can reuse.
nmathopencl solves this by providing the ported sources as a distributable
R package. Install nmathopencl, and the ported OpenCL C files are available
on disk at system.file("cl", package = "nmathopencl"). Any R package that
lists nmathopencl as a dependency can find those files at runtime and include
them in its own OpenCL program builds.
Downstream developers typically port or rewrite kernels from legacy CPU-oriented
sources, keep *.cl launchers inside their own package, annotate them with structured
dependency tags (@depends_nmath, @all_depends_nmath, broader @depends/@provides
where applicable), then:
-
Validate ---
load_library_for_kernel(..., depends_tag = "all_depends_nmath")concatenates the inferred minimal set of portedinst/cl/nmath/shards for a launcher from transitive@all_depends_nmathannotations (optionalwarning()hints mirrorinst/extdata/opencl_known_failures.jsonfor curated fragile subgraphs). -
Optional materialization ---
extract_library_subset()writes that subset (and index companions) next to kernels you intend to vendor; regeneratekernel_dependency_index.rdsalongside the library withwrite_kernel_dependency_index()after substantive port edits. -
Integrate --- either (a) remain linked to
nmathopenclviasystem.file(..., package = "nmathopencl")and concatenate prelude + shim layers +nmath+ your kernels yourself, or (b) ship only the extracted shards.glmbayesshows the reference runner + caches + host plumbing atop that stack; most authors still own compilation lifetimes and R exports until richer automation arrives for trivial launchers.
Cheap probes before assembling large sources or diagnosing workstation issues:
| Concern | R entry points |
|---|---|
| nmathopencl compile-time OpenCL / device selection | nmathopencl_has_opencl(), nmathopencl_opencl_fp64_available(), nmathopencl_opencl_device_info(), nmathopencl_opencl_reset_device_selection() |
| opencltools compute-unit probe | opencltools::get_opencl_core_count() |
| Combined host/runtime report | opencltools::diagnose_glmbayes() |
Host and workstation inventory --- GPU vendor detection, driver/ICD/PATH probes,
verify_opencl_runtime(), PATH helpers, and related Tier 3 tooling --- live in
opencltools (opencltools::…).
They are not re-exported from nmathopencl.
The R helpers above let you sanity-check @all_depends_nmath closures locally before mirroring the
full-layer OpenCL compilation pattern below (load_kernel_library / load_kernel_source stacking prelude,
shims, and nmath).
The most direct illustration of how nmathopencl is meant to be used is
glmbayes. Bayesian GLM sampling via accept-reject methods requires evaluating
likelihood envelope functions for every posterior draw. This construction is
embarrassingly parallel --- each candidate can be evaluated independently --- but
the inner computation involves distribution functions and special functions
from nmath: lgamma, lbeta, pgamma, dnorm, and related routines.
To accelerate this step on a GPU, glmbayes builds an OpenCL program at
runtime that includes the relevant ported nmath sources from nmathopencl and
adds its own kernel logic on top. The program compilation happens once; all
subsequent envelope evaluations within a session are dispatched to the GPU.
The speedup grows with model dimension, because a larger parameter space means
more independent envelope evaluations per draw.
EnvelopeEval() in glmbayes is the function that evaluates the negative
log-likelihood and its gradients across a full grid of parameter values --- the
step that feeds the rejection sampler. For a Bayesian binomial logit model
with, say, fourteen predictors, this grid can contain thousands of points, and
each point requires evaluating the binomial log-likelihood and its gradient
vector using nmath routines.
When use_opencl = TRUE, EnvelopeEval dispatches to its GPU backend
(f2_f3_opencl). That function assembles a complete OpenCL program at
runtime by concatenating source layers in a fixed dependency order:
- A global configuration header (
OPENCL.cl) that enables double-precision arithmetic and defines IEEE constants. - The full
nmathopencllayer ---libR_shims,R_ext_types,R_shims,R_ext_runtime,R_ext_internals,System, andnmath--- loaded in that order viaload_kernel_library(..., package = "nmathopencl"). This makes functions likelgamma,lbeta,dbinom, anddpoisavailable as device-side functions. - The model- and link-specific kernel (e.g.
f2_f3_binomial_logit.cl), which contains the actual__kernelentry point and calls freely into the nmath layer above it.
This assembled source is compiled once by the OpenCL driver and then
dispatched as a single kernel invocation. All grid points are evaluated
simultaneously on the GPU; the results come back as the NegLL vector and
cbars gradient matrix that the rejection sampler consumes.
Without nmathopencl, writing f2_f3_binomial_logit.cl would require either
re-implementing lgamma, lbeta, and dbinom from scratch inside the kernel
or finding a compatible GPU math library --- a significant engineering effort
that would also be package-specific and non-reusable. With nmathopencl, the
nmath layer is already there, annotated for dependency resolution, and
available to any package that lists nmathopencl as a dependency.
For a detailed walkthrough of the EnvelopeEval workflow --- including the
full program assembly sequence, the CPU and GPU backends, and the role of the
computed values in the rejection sampler --- see:
?EnvelopeEvalinglmbayesexample("EnvelopeEval")for a runnable isolated demonstration- Chapter A10 --- Accelerated EnvelopeBuild Implementation using OpenCL
- Chapter 12 --- Large Models: GPU Acceleration using OpenCL
This pattern --- borrow the nmath layer from nmathopencl, write your own
kernel logic that calls into it --- is exactly what this package is designed to
enable. glmbayes provides the reference implementation, but the same approach
applies to any package that needs statistical math inside an OpenCL kernel.
The assembly of an OpenCL program is done in C++ using load_kernel_source()
and load_kernel_library(). glmbayes ships reference call sites (f2_f3_opencl, etc.);
Kernel loading uses opencltools::load_kernel_library() / load_kernel_source() with package = "nmathopencl"
so sibling packages can Import nmathopencl directly while glmbayes gradually
drops duplicate loaders. Together they resolve file discovery under system.file("cl/", package = "nmathopencl"),
parse @provides / @depends, and concatenate each requested subtree in dependency order ---
you nominate a subdirectory name (nmath**, System, …), not every filename.
The full load order that glmbayes uses (reflected in f2_f3_opencl) is:
// 1. Global OpenCL configuration header (extensions, IEEE constants, macros)
std::string OPENCL_source = load_kernel_source("OPENCL.cl");
// 2-7. The nmathopencl layer, loaded in dependency order from nmathopencl's inst/cl/
std::string libr_shims_source = load_kernel_library("libR_shims", "nmathopencl", false);
std::string r_ext_types_source = load_kernel_library("R_ext_types", "nmathopencl", false);
std::string r_shims_source = load_kernel_library("R_shims", "nmathopencl", false);
std::string r_ext_runtime_source = load_kernel_library("R_ext_runtime", "nmathopencl", false);
std::string r_ext_internals_source = load_kernel_library("R_ext_internals","nmathopencl", false);
std::string system_source = load_kernel_library("System", "nmathopencl", false);
std::string nmath_source = load_kernel_library("nmath", "nmathopencl", false);
// 8. Your model-specific kernel source
std::string ksrc = load_kernel_source("src/my_kernel.cl");
// Concatenate into a single program string and compile
std::string all_src = OPENCL_source
+ "\n" + libr_shims_source
+ "\n" + r_ext_types_source
+ "\n" + r_shims_source
+ "\n" + r_ext_runtime_source
+ "\n" + r_ext_internals_source
+ "\n" + system_source
+ "\n" + nmath_source
+ "\n" + ksrc;The key point is steps 2-7: these are the layers from nmathopencl, loaded
via the package = "nmathopencl" argument so that load_kernel_library
finds them at system.file("cl/<subdir>", package = "nmathopencl"). Together
they satisfy all of nmath's type, macro, and runtime dependencies.
Step 8 is entirely your own code. Inside my_kernel.cl you can call any
nmath function --- lgamma, lbeta, pbeta, dnorm, rgamma, and so on ---
exactly as you would in regular C, because the nmath layer above has already
defined them as inline OpenCL device functions.
load_kernel_library performs a topological sort based on @provides and
@depends annotations in each .cl file, so the files within each
subdirectory are concatenated in the correct dependency order. You do not need
to enumerate individual files.
The result is that you can write kernel logic that looks essentially identical to what you would write in R --- using the same functions, the same parameter conventions, the same numeric behavior --- but executes on the GPU.
The inst/cl directory of nmathopencl is the distributable library tree.
It is organized into dependency layers that mirror how R itself structures
these components:
inst/cl/
R_ext_types/ --- type definitions (SEXP, Rboolean, etc.)
R_ext_runtime/ --- memory, error, and I/O interface shims
R_ext_internals/ --- internal R extension definitions
libR_shims/ --- host runtime compatibility shims (R_pow, R_pow_di, etc.)
R_shims/ --- additional R API shims
System/ --- system-level OpenCL prelude
nmath/ --- the ported Mathlib sources (~137 .cl files)
src/ --- kernel entry points for the included R wrappers
The nmath/ directory contains the translated sources for the following
function families:
| Category | Functions |
|---|---|
| Normal | dnorm, pnorm, qnorm, rnorm |
| Uniform | dunif, punif, qunif, runif |
| Gamma | dgamma, pgamma, qgamma, rgamma |
| Beta | dbeta, pbeta, qbeta, rbeta, lbeta |
| Log-Normal | dlnorm, plnorm, qlnorm, rlnorm |
| Chi-squared | dchisq, pchisq, qchisq, rchisq |
| Non-central Chi-squared | dnchisq, pnchisq, qnchisq, rnchisq |
| F | df, pf, qf |
| Non-central F | dnf, pnf, qnf |
| Student t | dt, pt, qt, rt |
| Non-central t | dnt, pnt, qnt |
| Binomial | dbinom, pbinom, qbinom, rbinom, dbinom_raw |
| Negative Binomial | dnbinom, pnbinom, qnbinom, rnbinom (and _mu variants) |
| Poisson | dpois, ppois, qpois, rpois, dpois_raw |
| Exponential | dexp, pexp, qexp, rexp |
| Weibull | dweibull, pweibull, qweibull, rweibull |
| Logistic | dlogis, plogis, qlogis, rlogis |
| Cauchy | dcauchy, pcauchy, qcauchy, rcauchy |
| Geometric | dgeom, pgeom, qgeom, rgeom |
| Hypergeometric | dhyper, phyper, qhyper, rhyper |
| Non-central Beta | dnbeta, pnbeta, qnbeta |
| Studentized Range | ptukey, qtukey |
| Wilcoxon Rank Sum | dwilcox, pwilcox, qwilcox, rwilcox |
| Wilcoxon Signed Rank | dsignrank, psignrank, qsignrank, rsignrank |
| Multinomial | rmultinom |
| Gamma/special | gammafn, lgammafn, psigamma, digamma, trigamma, tetragamma, pentagamma |
| Beta/choose | beta, lbeta, choose, lchoose |
| Bessel | bessel_i, bessel_j, bessel_k, bessel_y (and _ex variants) |
| Math support | fmax2, fmin2, imax2, imin2, sign, fprec, fround, fsign, ftrunc |
| Runtime math | log1pmx, log1pexp, log1mexp, lgamma1p, pow1p, logspace_add, logspace_sub, logspace_sum |
| RNG core | exp_rand, norm_rand, unif_rand |
Each .cl file is the translated equivalent of the corresponding nmath .c
source. Where the upstream source depends on host-only R runtime behaviors
(macros, inline utility functions, type aliases), those dependencies are
satisfied by the layered shim files in the other subdirectories.
Grouped by typical need for someone following § Audience and workflow. Generic
OpenCL plumbing (host diagnostics, kernel-library tagging, subset loaders) lives in
opencltools; nmathopencl
re-exports Tier A (subset loaders) and Tier D (annotation plumbing) helpers below and keeps
package-specific compile-time and device-selection probes (Tier C). glmbayes still ships parallel helpers
for transitional compatibility --- target Imports: nmathopencl (and
opencltools where needed) rather than sustaining duplicate loaders.
Re-exported from opencltools (same function objects as opencltools::…):
| Function | Purpose |
|---|---|
load_library_for_kernel() |
Minimal concatenated text for one launcher respecting @all_depends_nmath, assembler notes, curated warnings. |
extract_library_subset() |
Materialize shards + indexes into another directory for vendoring subset trees. |
write_kernel_dependency_index() |
Regenerate RDS/TSV index beside inst/cl/nmath/ after substantive port tooling runs. |
Implemented in nmathopencl (C++ kernel loaders in this package):
| Function | Purpose |
|---|---|
load_kernel_library() |
Recursive dependency-aware concatenation of an inst/cl/<subdir>/ subtree. |
load_kernel_source() |
Load a standalone header/device file (often OPENCL.cl-style prelude). |
| Function | Purpose |
|---|---|
nmathopencl_has_opencl(), nmathopencl_opencl_fp64_available(), nmathopencl_opencl_device_info(), nmathopencl_opencl_reset_device_selection() |
Compile-time flag and fp64/device probing tied to kernels shipped here. |
opencltools::get_opencl_core_count() |
OpenCL compute units on the default device (opencltools selection). |
Re-exported from opencltools; less common once bundles are annotated.
| Function | Purpose |
|---|---|
stage_kernel_dependency_sort() |
Offline ordering passes for regenerated trees. |
attach_kernel_call_tags(), attach_kernel_dependency_tags(), attach_cross_library_tags() |
Batch annotation tooling. |
Aside from authoring custom kernels (§ Audience and workflow), nmathopencl exposes
many distribution_opencl()-style helpers that enqueue pre-built GPU launchers wrapping
individual Mathlib calls. Treat them foremost as numeric regression harnesses verifying
parity with stats/base before you trust inlined device code.
library(nmathopencl)
# Gamma PDF sanity check on device
dgamma_opencl(n = 1e5, x = 2.5, shape = 3, scale = 1)
# RNG smoke on device
rnorm_opencl(n = 1e6, mean = 0, sd = 1)
# Poisson log-probability grid
dpois_opencl(n = 5e4, x = 3, lambda = 2.5, log = TRUE)They double as turnkey batched calculators when GPU dispatch already matches your bottleneck, yet the curated workflow remains assemble your kernel + stitched nmath subgraph.
Each wrapper honors fallback, default FALSE --- GPU faults surface loudly.
fallback = TRUE masks recoverable failures with CPU stats analogues whenever OpenCL
appears available (see nmathopencl_has_opencl()). Machines without OpenCL support always follow
CPU paths regardless of fallback. Begin runtime probing with § Diagnostics and runtime checks.
Porting R's Mathlib to OpenCL C is not a mechanical find-and-replace. The upstream nmath code was written for a C compiler with access to a full host runtime, POSIX headers, and R's own infrastructure. OpenCL device code has none of these. Making the port work required:
-
Dependency isolation. Each
.clfile specifies the other.clfiles it depends on. The shim layers must be loaded in an order that satisfies all declarations before any definitions reference them. The package includes tooling for managing and validating this ordering. -
Macro and type hygiene. R's headers define macros like
ML_ERR_return_NAN,MATHLIB_ERROR,R_FINITE, and many others. These assume a host C environment. The ported shim files replace them with OpenCL-compatible equivalents. -
Runtime symbol replacement. Host-only R runtime symbols ---
R_pow,R_pow_di,R_CheckUserInterrupt, and others --- are replaced by inline device implementations in thelibR_shimslayer. -
Linkage model differences. OpenCL C's compilation model differs from standard C in how it handles
static inlineversus external function definitions. Some upstream nmath functions required linkage adjustments to avoid "unresolved extern" failures during GPU program build.
When an OpenCL program is first built --- whether from nmathopencl's own
wrappers or from a downstream package like glmbayes --- the driver performs a
JIT compilation of the program source for your specific GPU hardware. This step
can take several seconds for programs that include substantial portions of the
nmath library. It happens once per program, per session.
On subsequent calls, the compiled kernel is cached by the driver. The overhead drops to kernel dispatch and buffer transfer costs, which for large batches are small relative to the computation.
The practical takeaway: build your OpenCL programs once and reuse them across
calls. Packages like glmbayes do this by caching the compiled program state
for the duration of a session. If you build your own package on nmathopencl,
the same strategy applies.
The port is active and expanding. Not every ported function runs on every GPU stack without caveats. The two main limitation classes are:
Some nmath routines maintain dynamic allocation caches that assume a host C runtime. These cannot be trivially shimmed for GPU device code:
-
Wilcoxon rank sum / signed rank (
wilcox.cl,signrank.cl): both routines allocate and cache coefficient arrays usingcalloc/freeorR_chk_calloc. These patterns have no direct GPU equivalent. The.clsources are included in the library tree but result in link failures when used directly as GPU kernels without a device-side allocator solution. CPU fallback works correctly. -
Bessel functions (
bessel_i.cletc.): upstream code usesR_allocfor temporary workspace. A shim returning a null pointer is in place but insufficient for production use; the runtime fails when the workspace is dereferenced.
Several noncentral quantile and CDF inversion routines involve deep iterative loops with nested distribution calls. On some GPU stacks, these hit driver watchdog limits or register pressure:
qf,qnbeta,qnchisq,qnt,pnfin some parameter regions
These functions work correctly via the CPU fallback path. Whether they run successfully as GPU kernels depends on the hardware and driver.
The current kernel runner assembles a conservative superset of source files for each kernel call. A planned dependency-analysis layer will allow building only the minimal set of source fragments actually needed by a given kernel, which will reduce first-call compilation time.
Browsing the package dashboard on R-universe lists binaries, DESCRIPTION, exported objects, rendered manual pages, vignettes, and dependency graphs---useful both for users and for ongoing development visibility.
Pick your mirror order (prefer R-universe first if you rely on nightly builds):
opts <- options(
repos = c(
knygren = "https://knygren.r-universe.dev",
CRAN = "https://cloud.r-project.org"
)
)
install.packages("nmathopencl")
options(opts) # restore prior repos if desiredFor a minimal one-off install combining both:
install.packages(
"nmathopencl",
repos = c("https://knygren.r-universe.dev",
"https://cloud.r-project.org")
)Maintainers: registering the GitHub source repo in your universe and what to expect from automated builds are summarized in R-UNIVERSE.md at the package root.
OpenCL support requires a GPU with an installed OpenCL runtime (NVIDIA CUDA,
AMD ROCm/OpenCL, Intel OpenCL, or Apple Metal via OpenCL compatibility). The
package installs and functions without a GPU; the ported .cl files are
available regardless, and all R wrappers fall back to CPU computation
automatically if OpenCL is unavailable.
- Loader consolidation. Host diagnostics and generic kernel-library tooling live in
opencltools;nmathopenclre-exports Tier A/D helpers and ownsload_kernel_*plus compile-time device probes. Longer term, C++openclPorthelpers may still migrate to a dedicated package. - Minimal program assembly. Dependency analysis to include only the source files actually required for each kernel, reducing JIT compilation cost on first call.
- Vectorized parameter API. Allow passing a vector of distinct parameter values to a single kernel dispatch, enabling more general parallelism patterns in the R wrappers.
- Rescue of allocation-dependent families. Device-side allocator shims or algorithmic rework for Wilcoxon and Bessel paths.
- Broader numeric validation. Systematic accuracy testing across distributions and parameter regions.
- Additional nmath coverage. Remaining nmath functions not yet ported.
The C++ OpenCL infrastructure shared between nmathopencl and opencltools
currently exists as duplicate source in both packages. Seven .cpp files
(opencl_detect.cpp, OpenCL_helper.cpp, configure_OpenCL.cpp,
glmbayes_getRegisteredNamespace.cpp, opencl_device_selection.cpp,
kernel_loader.cpp, and the infrastructure portion of export_wrappers.cpp)
are either byte-for-byte identical or differ only by a few nmathopencl-specific
additions. Consolidating them into opencltools as the single authoritative
source will eliminate the maintenance burden of applying fixes in two places.
opencltools already has the mechanism in place:
inst/include/opencltools/openclPort.his the installed public header (note at the top of the file: "Installed copy for LinkingTo: opencltools")- bundled
include/CLheaders were removed to avoid downstream header conflicts; downstream packages must supply OpenCL SDK/system include paths viaconfigureorsrc/Makevars* nmathopenclalready carriesLinkingTo: opencltoolsin itsDESCRIPTION(added in preparation for this consolidation)
Two small additions to inst/include/opencltools/openclPort.h
(and kept in sync with src/openclPort.h):
-
opencl_bind_selected_fp64_device_or_throw()declaration inside the existing#ifdef USE_OPENCLblock. This function binds the cached fp64 device into a caller-suppliedcl_platform_id/cl_device_idpair and is currently used bynmathopencl's kernel runners. -
Three inline error-handling helpers (
opencl_status_name,opencl_status_hint,opencl_make_context_error) that are alreadyinlineinnmathopencl/src/openclPort.h— moving them to the published header makes them available to any downstream package viaLinkingTo. -
An
LdFlags()R function (or equivalent) that returns the correct-llinker flag for linking againstopencltools' compiled library. This is the standard mechanism R packages use to expose their compiled symbols to downstreamLinkingToconsumers (analogous toRcppArmadillo::RcppArmadillo.package.skeleton()orBH::BH.package.skeleton()).
-
src/Makevars/src/Makevars.win/configure— add the linker flag returned byopencltools::LdFlags()toPKG_LIBSso that theopenclPortsymbols are resolved fromopencltools.dll/libopencltools.soat load time. -
src/openclPort.h— replace the full standalone header with a thin bridge:#include <opencltools/openclPort.h> // nmathopencl-specific additions only: namespace openclPort { std::string build_rmath_opencl_program(...); // nmath-specific assembly #ifdef USE_OPENCL void opencl_dbl_scalar_kernel_runner(...); void opencl_pq_tail_kernel_runner(...); void opencl_d_givelog_kernel_runner(...); void opencl_numeric_cols_kernel_runner(...); void opencl_pnorm_kernel_runner(...); #endif } // namespace openclPort
-
Delete from
src/—opencl_detect.cpp,OpenCL_helper.cpp,configure_OpenCL.cpp,glmbayes_getRegisteredNamespace.cpp, and the shared portions ofopencl_device_selection.cppandkernel_loader.cpp. The linker will resolve those symbols fromopencltoolsinstead.
build_rmath_opencl_program()— hard-codes thenmathopencl-specific layer assembly order (OPENCL.cl,libR_shims,R_ext_types, …,nmath) and has no meaning outside this package.- All kernel runner infrastructure (
opencl_kernel_runners.cpp,kernel_runners.cpp,kernel_wrappers.cpp) — these implement the distribution-function GPU backends. - The
ex_glmbayesexample suite.
The f2_f3_*.cl kernels are the core of the glmbayes GPU backend. Each one
evaluates the negative log-posterior and its gradient for a specific GLM family
and link function. Understanding exactly which nmath functions they require
is useful for auditing the dependency graph, validating the port, and reasoning
about the minimal OpenCL program needed for any given model.
| Kernel | nmath function(s) called |
|---|---|
f2_f3_binomial_logit.cl |
dbinom_raw |
f2_f3_binomial_probit.cl |
dbinom_raw, pnorm5, dnorm4 |
f2_f3_binomial_cloglog.cl |
dbinom_raw |
f2_f3_gamma.cl |
dgamma |
f2_f3_gaussian.cl |
dnorm4 |
f2_f3_poisson.cl |
lgamma (OpenCL built-in --- no nmath file required) |
The Poisson kernel is the simplest case: it evaluates the log-likelihood
entirely inline as -mu + y*log(mu) - lgamma(y+1), where lgamma is a
standard OpenCL double-precision built-in. It pulls in no ported nmath files.
The @all_depends metadata embedded in each .cl file records the full
transitive closure of its dependencies. Taking the union across dbinom.cl,
pnorm.cl, dnorm.cl, and dgamma.cl yields the minimal set required by
all six kernels.
CPU source (.c) |
GPU port (.cl) |
Provides / Description | In glmbayes/src/nmath/? |
|---|---|---|---|
dbinom.c |
dbinom.cl |
dbinom_raw, dbinom --- binomial density |
? |
pnorm.c |
pnorm.cl |
pnorm5, pnorm_both --- normal CDF |
? |
dnorm.c |
dnorm.cl |
dnorm4 --- normal density |
? |
dgamma.c |
dgamma.cl |
dgamma --- gamma density |
? |
dpois.c |
dpois.cl |
dpois_raw, dpois --- Poisson density (called by dgamma) |
? |
bd0.c |
bd0.cl |
bd0, ebd0 --- binomial deviance (called by dbinom, dpois) |
? |
stirlerr.c |
stirlerr.cl |
stirlerr --- Stirling error term; dispatches to the two fragments below |
? |
stirlerr.c (split) |
stirlerr_cycle_free.cl |
stirlerr_cycle_free --- table-lookup path for small arguments |
? split artifact |
stirlerr.c (split) |
stirlerr_cycle_dependent.cl |
stirlerr_cycle_dependent --- series path for large arguments |
? split artifact |
pgamma.c (extracted) |
pgamma_utils.cl |
log1pmx, lgamma1p --- utilities called by bd0 |
? split artifact |
lgamma.c |
lgamma.cl |
lgammafn_sign, lgammafn --- log-gamma function |
? |
gamma.c |
gamma.cl |
gammafn --- gamma function |
? |
lgammacor.c |
lgammacor.cl |
lgammacor --- series correction for large arguments |
? |
chebyshev.c |
chebyshev.cl |
chebyshev_init, chebyshev_eval --- called by lgammacor |
? |
cospi.c |
cospi.cl |
cospi, sinpi, tanpi --- sinpi called by gammafn for the negative-argument reflection formula |
? |
fmax2.c |
fmax2.cl |
fmax2 --- max of two doubles, called by gammalims |
? |
gammalims.c |
gammalims.cl |
gammalims --- gamma function overflow/underflow bounds |
? |
refactored.h |
refactored.cl |
Forward declarations for cycle-broken functions | N/A (header) |
pnorm.cl and dnorm.cl are self-contained. Their only dependencies are
the nmath.cl infrastructure shim and dpq-style macros. No additional math
function files are pulled in --- the algorithms (Cody's rational approximation for
pnorm, the standard Gaussian density formula for dnorm) close entirely on
primitive arithmetic.
dbinom.cl and dgamma.cl share almost the entire gamma function stack.
Both require lgamma, gamma, lgammacor, chebyshev, cospi, gammalims,
fmax2, pgamma_utils, stirlerr, and bd0. The only addition from
dgamma.cl is dpois.cl, because dgamma delegates to dpois_raw when the
shape parameter is less than 1.
Three .cl files have no direct .c counterpart in R's nmath.
stirlerr_cycle_free.cl, stirlerr_cycle_dependent.cl, and pgamma_utils.cl
are fragments split out of stirlerr.c and pgamma.c respectively. The split
is required to break mutual call cycles: OpenCL's single-translation-unit
compilation model requires that every symbol be defined before any reference to
it. Cycles that a standard C linker resolves at link time must instead be broken
structurally in OpenCL C.
The Poisson kernel requires no nmath .cl files. It expresses the entire
log-likelihood using OpenCL built-ins (exp, log, lgamma). This is both
the simplest kernel and the one with zero nmath dependency footprint.
Nygren, K.N. and Nygren, A. (2006), Likelihood Subgradient Densities. Journal of the American Statistical Association, 101, 1144-1156. DOI: 10.1198/016214506000000357