Skip to content

Explore latent thought training for data efficiency #55

@rlrs

Description

@rlrs

Ref: https://arxiv.org/abs/2503.18866

Why
We lack data, but have plenty of compute for the current scale of models we're interested in. Being able to generate more data in a smart way might be key to actually improving models. BoLT seems like a promising way to do that.

Approach
Implement the method and run experiments on e.g. 7B CPT w/ Dynaword or similar scale data. This method can soak up a lot of compute, so it'll be important to cap compute usage for experiments, maybe 20k MI250X hours to show positive results.

Metadata

Metadata

Assignees

No one assigned

    Labels

    experimentDescription of an experiment to be performed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions