-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Problem
Within LightGBM, num_leaves is capped at 2 ^ max_depth. For example, if num_leaves is set to 1000 and max_depth is set to 5, then LightGBM will likely end up creating a full-depth tree with 32 (2 ^ 5) leaves per iteration.
{bonsai} / {parsnip} have no knowledge of the relationship between these parameters. As a result, during cross-validation, Bayesian optimization and other CV search methods will spend a significant amount of time exploring meaningless hyperparameter space where num_leaves > 2 ^ max_depth. This results in longer CV times, especially for large models with many parameters.
Idea
One potential solution is to explicitly link num_leaves and max_depth specifically for the LightGBM model spec. I implemented this link in my treesnip fork by essentially adding two engine arguments:
link_max_depth- Boolean. WhenFALSE,max_depthis equal to whatever is passed via engine/model arg. WhenTRUE,max_depthis equal to{floor(log2(num_leaves)) + link_max_depth_add.link_max_depth_add- Integer. Value added tomax_depth. For example, iflink_max_depthisTRUE,num_leavesis 1000, andlink_max_depth_addis 2, thenmax_depth = floor(log2(1000)) + 2, or 11.
This would improve cross-validation times by restricting the hyperparameter space that needs to be explored while leaving the default options untouched. Ideally, it could even be generalized (within {parsnip}) to other model types that have intrinsically linked hyperparameters. However, not sure if this fits with the Tidymodels way of doing things. If it's totally out-of-scope, then feel free to close this issue.