-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Labels
featurea feature request or enhancementa feature request or enhancement
Description
Thanks for creating this excellent package. I created a similar fork of treesnip but am planning to replace it with {bonsai} in all our production models.
One feature that I think would be incredibly useful in {bonsai} is the ability to provide custom validation sets during early stopping (instead of using a random split of the training data). This would have a few potential benefits:
- More training data. In many cases, you're already going to have a validation set set aside from a classic
train,validate,testsplit. Currently,{bonsai}will further split thetraindata intotrain subsetandvalidation specifically for early stoppingsets. Instead, it would be ideal to be able to pass thevalidateset directly. This would mean all oftrainwould be used for training. - Ability to do more complex cross-validation. Certain cross-validation techniques (rolling origin, spatial, etc.) don't rely on a random sample of the training data and instead use some sort of partitioning (time or geographic). Allowing custom validation data would let users use the "correct" validation set for early stopping when using these more complex methods.
- Better integration with tidymodels. Tidymodels supports k-fold and other types of cross-validation. Using the validation set created for each fold rather than splitting a separate validation set specifically for early stopping would be much simpler.
Let me know if this is out-of-scope for this project. If not, I'm happy to contribute if needed.
Metadata
Metadata
Assignees
Labels
featurea feature request or enhancementa feature request or enhancement