Skip to content

Feature idea - Linking hyperparameters during CV #49

@dfsnow

Description

@dfsnow

Problem

Within LightGBM, num_leaves is capped at 2 ^ max_depth. For example, if num_leaves is set to 1000 and max_depth is set to 5, then LightGBM will likely end up creating a full-depth tree with 32 (2 ^ 5) leaves per iteration.

{bonsai} / {parsnip} have no knowledge of the relationship between these parameters. As a result, during cross-validation, Bayesian optimization and other CV search methods will spend a significant amount of time exploring meaningless hyperparameter space where num_leaves > 2 ^ max_depth. This results in longer CV times, especially for large models with many parameters.

Idea

One potential solution is to explicitly link num_leaves and max_depth specifically for the LightGBM model spec. I implemented this link in my treesnip fork by essentially adding two engine arguments:

  1. link_max_depth - Boolean. When FALSE, max_depth is equal to whatever is passed via engine/model arg. When TRUE, max_depth is equal to {floor(log2(num_leaves)) + link_max_depth_add.
  2. link_max_depth_add - Integer. Value added to max_depth. For example, if link_max_depth is TRUE, num_leaves is 1000, and link_max_depth_add is 2, then max_depth = floor(log2(1000)) + 2, or 11.

This would improve cross-validation times by restricting the hyperparameter space that needs to be explored while leaving the default options untouched. Ideally, it could even be generalized (within {parsnip}) to other model types that have intrinsically linked hyperparameters. However, not sure if this fits with the Tidymodels way of doing things. If it's totally out-of-scope, then feel free to close this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featurea feature request or enhancement

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions