LeanDojoWebsite/leanprogress.html at main · lean-dojo/LeanDojoWebsite · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <title>LeanProgress: Guiding Search for Neural Theorem Proving via Proof Progress Prediction</title>

    <link rel="apple-touch-icon" sizes="180x180" href="/apple-touch-icon.png">
    <link rel="icon" type="image/png" sizes="32x32" href="/favicon-32x32.png">
    <link rel="icon" type="image/png" sizes="16x16" href="/favicon-16x16.png">
    <link rel="manifest" href="/site.webmanifest">
    <meta name="msapplication-TileColor" content="#da532c">
    <meta name="theme-color" content="#ffffff">

    <script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
    <script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>

    <link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro"
          rel="stylesheet">

    <link rel="stylesheet" href="./static/css/bulma.min.css">
    <link rel="stylesheet" href="./static/css/bulma-carousel.min.css">
    <link rel="stylesheet" href="./static/css/bulma-slider.min.css">
    <link rel="stylesheet" href="./static/css/fontawesome.all.min.css">
    <link rel="stylesheet" href="./static/css/academicons.min.css">
    <link rel="stylesheet"
          href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
    <link rel="stylesheet" href="./static/css/index.css">

    <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
    <script defer src="./static/js/fontawesome.all.min.js"></script>
    <script>
        document.addEventListener('DOMContentLoaded', () => {
            const $navbarBurgers = Array.prototype.slice.call(document.querySelectorAll('.navbar-burger'), 0);
            if ($navbarBurgers.length > 0) {
                $navbarBurgers.forEach(el => {
                    el.addEventListener('click', () => {
                        const target = el.dataset.target;
                        const $target = document.getElementById(target);
                        el.classList.toggle('is-active');
                        $target.classList.toggle('is-active');
                    });
                });
            }
        });
    </script>
</head>
<body>

<!-- Navigation -->
<nav class="navbar is-light" role="navigation" aria-label="main navigation">
    <div class="navbar-brand">
        <a class="navbar-item" href="index.html">
            <strong>LeanDojo</strong>
        </a>

        <a role="button" class="navbar-burger" aria-label="menu" aria-expanded="false" data-target="navbarBasicExample">
            <span aria-hidden="true"></span>
            <span aria-hidden="true"></span>
            <span aria-hidden="true"></span>
        </a>
    </div>

    <div id="navbarBasicExample" class="navbar-menu">
        <div class="navbar-start">
            <a class="navbar-item" href="index.html">
                Home
            </a>
            <a class="navbar-item" href="leandojo.html">
                LeanDojo
            </a>
            <a class="navbar-item" href="leanagent.html">
                LeanAgent
            </a>
            <a class="navbar-item" href="leancopilot.html">
                LeanCopilot
            </a>
            <a class="navbar-item is-active" href="leanprogress.html">
                LeanProgress
            </a>
            <a class="navbar-item" href="leanide.html">
                LeanIDE
            </a>
            <a class="navbar-item" href="torchlean.html">
                TorchLean
            </a>
            <a class="navbar-item" href="bridge.html">
                BRIDGE
            </a>
        </div>
    </div>
</nav>

<section class="hero">
    <div class="hero-body">
        <div class="container is-max-desktop">
            <div class="columns is-centered">
                <div class="column has-text-centered">
                    <h1 class="title is-1 publication-title">LeanProgress: Guiding Search for Neural Theorem Proving via Proof Progress Prediction</h1>
                    <div class="is-size-5 publication-authors">
                        <span class="author-block">
                            <a href="https://www.robertj1.com/">Robert Joseph George</a> <a href="mailto:rgeorge@caltech.edu"><i class="fas fa-envelope"></i></a><sup>1</sup>,
                            <a href="https://hsz0403.github.io/">Suozhi Huang</a><sup>2</sup>,
                            <a href="https://peiyang-song.github.io/">Peiyang Song</a><sup>1</sup>,
                            <a href="https://tensorlab.cms.caltech.edu/users/anima/">Anima Anandkumar</a><sup>1</sup>
                        </span>
                    </div>

                    <div class="is-size-5 publication-authors">
                        <span class="author-block"><sup>1</sup>California Institute of Technology</span><br>
                        <span class="author-block"><sup>2</sup>Princeton University</span>
                    </div>

                    <div class="is-size-5 publication-authors">
                        <span class="author-block"><strong>Published:</strong> Transactions on Machine Learning Research (TMLR) 2025</span>
                    </div>

                    <div class="column has-text-centered">
                        <div class="publication-links">
                            <span class="link-block">
                                <a href="https://arxiv.org/abs/2502.17925" class="external-link button is-normal is-rounded is-dark">
                                <span class="icon">
                                    <i class="fas fa-file-pdf"></i>
                                </span>
                                <span>arXiv</span>
                                </a>
                            </span>
                            <span class="link-block">
                                <a target="_blank" href="https://github.com/lean-dojo/LeanDojo-v2"
                                class="external-link button is-normal is-rounded is-dark">
                                <span class="icon">
                                    <i class="fab fa-github"></i>
                                </span>
                                <span>Code</span>
                                </a>
                            </span>
                            <span class="link-block">
                                <a target="_blank" href="https://openreview.net/pdf?id=eTmOwvvRu9"
                                class="external-link button is-normal is-rounded is-dark">
                                <span class="icon">
                                    <i class="fas fa-file-pdf"></i>
                                </span>
                                <span>Paper</span>
                                </a>
                            </span>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </div>
</section>

<section class="section">
    <div class="container is-max-widescreen">
        <div class="rows">
            <div class="rows is-centered">
                <div class="row is-full-width">
                    <h2 class="title is-3"><span class="lean-infer">LeanProgress Visualization</span></h2>
                    <img src="images/leanprogress_overview.png" class="interpolation-image" alt="LeanProgress Visualization" style="display: block; margin-left: auto; margin-right: auto; width: 90%;"/>
                    <br />
                    <p style="font-size: 125%">LeanProgress predicts how many steps remain to complete a proof. We collect proof trees, balance the data distribution, and train a model to estimate remaining steps. The model takes the current proof state as input and outputs the number of remaining steps, which we use to guide search. We also built a <code>predict_steps_with_suggestion</code> tactic for LeanCopilot that shows both tactic suggestions and progress estimates.</p>
                    <br />
                    <h2 class="title is-3"><span class="lean-infer">Overview</span></h2>
                    <p style="font-size: 125%">LLMs struggle with mathematical reasoning because they hallucinate. Formal proof assistants like Lean can verify everything, but LLMs still have trouble with long proofs and complex formalizations.</p>
                    <br />
                    <p style="font-size: 125%">Existing tools help retrieve lemmas, suggest tactics, or generate complete proofs. But they don't tell you how much progress you've made or how many steps are left. This is a problem for large formalization projects where you need to know if you're on the right track.</p>
                    <br />
                    <p style="font-size: 125%">LeanProgress predicts the number of remaining steps in a proof. When we integrate this into proof search, it helps prioritize promising paths and avoid dead ends. We show a 3.8% improvement on Mathlib4 (from 41.4% to 45.2%).</p>
                </div>
            </div>
        </div>
    </div>
</section>

<section class="section">
    <div class="container is-max-widescreen">
        <div class="rows">
            <div class="rows is-centered">
                <div class="row is-full-width">
                    <h2 class="title is-3"><span class="dvima">How It Works</span></h2>
                    <div class="content">
                        <p style="font-size: 125%">We fine-tune a model to predict remaining steps from proof states. Our DeepSeek Coder 1.3B model achieves <strong>75.8% accuracy</strong> with <strong>MAE of 3.15</strong>. When we use these predictions to guide search, we see a <strong>3.8% improvement on Mathlib4</strong> (from 41.4% to 45.2%).</p>
                        <br />
                        <div class="columns">
                            <div class="column">
                                <h3 class="title is-4">Step Prediction</h3>
                                <p style="font-size: 125%">We fine-tune DeepSeek Coder 1.3B to predict the exact number of remaining steps from intermediate proof states. The model takes the current state and optional proof history as input.</p>
                            </div>
                            <div class="column">
                                <h3 class="title is-4">Progress-Guided Search</h3>
                                <p style="font-size: 125%">We combine step predictions with tactic log probabilities in a best-first search. This helps prioritize paths that are likely to complete faster and avoid unproductive branches.</p>
                            </div>
                            <div class="column">
                                <h3 class="title is-4">LeanCopilot Integration</h3>
                                <p style="font-size: 125%">We built <code>predict_steps_with_suggestion</code> into LeanCopilot. It shows tactic suggestions along with how many steps each one would leave, making it easier to choose tactics.</p>
                            </div>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </div>
</section>

<section class="section">
    <div class="container is-max-widescreen">
        <div class="rows">
            <div class="rows is-centered">
                <div class="row is-full-width">
                    <h2 class="title is-3"><span class="lean-infer">Dataset and Training</span></h2>
                    <div class="content">
                        <h3 class="title is-4">Data Collection</h3>
                        <p style="font-size: 125%">We collected about <strong>80,000 proof trajectories</strong> from Lean Workbook Plus and Mathlib4 using best-first search with Reprover. The original data was heavily skewed toward short proofs (average 2.47 steps). We balanced it using different sampling ratios based on proof length:</p>
                        <ul style="font-size: 125%; margin-left: 2em;">
                            <li><strong>1-5 steps:</strong> 0.01 sampling ratio (most proofs are short)</li>
                            <li><strong>6-10 steps:</strong> 0.3 sampling ratio</li>
                            <li><strong>11-15 steps:</strong> 0.5 sampling ratio</li>
                            <li><strong>16-20 steps:</strong> 0.7 sampling ratio</li>
                            <li><strong>21+ steps:</strong> 1.0 sampling ratio (use all long proofs)</li>
                        </ul>
                        <p style="font-size: 125%">After balancing, the average proof length is 10.1 steps, which better represents longer proofs.</p>

                        <h3 class="title is-4">Model Training</h3>
                        <p style="font-size: 125%">We fine-tuned DeepSeek Coder 1.3B using MSE loss and AdamW optimizer. The model takes the current proof state and optional proof history, then predicts remaining steps. Training uses batch size 4, learning rate 1e-5, and weight decay 0.01.</p>

                        <h3 class="title is-4">Using Proof History</h3>
                        <p style="font-size: 125%">Including proof history helps a lot. Models with history get 75.8% accuracy (MAE 3.15) vs 61.8% accuracy (MAE 5.22) without it. The history tells the model where you've been, which helps predict where you're going.</p>
                    </div>
                </div>
            </div>
        </div>
    </div>
</section>

<section class="section">
    <div class="container is-max-widescreen">
        <div class="rows">
            <div class="rows is-centered">
                <div class="row is-full-width">
                    <h2 class="title is-3"><span class="dvima">Results</span></h2>
                    <div class="content">
                        <div class="columns">
                            <div class="column">
                                <h4 class="title is-5">Prediction Accuracy</h4>
                                <p style="font-size: 125%">Our model gets <strong>75.8% accuracy</strong> at predicting remaining steps (MAE 3.15). It works especially well on longer proofs.</p>
                            </div>
                            <div class="column">
                                <h4 class="title is-5">Proof Search</h4>
                                <p style="font-size: 125%">When we use step predictions to guide search, we get <strong>45.2% pass rate</strong> on Mathlib4 vs <strong>41.4%</strong> baseline—a 3.8% improvement. This works by combining step predictions with tactic log probabilities.</p>
                            </div>
                            <div class="column">
                                <h4 class="title is-5">Tool Integration</h4>
                                <p style="font-size: 125%">The <code>predict_steps_with_suggestion</code> tactic shows both tactic suggestions and how many steps remain after each one. This makes it easier to pick tactics during interactive proving.</p>
                            </div>
                        </div>
                        <br />
                        <h3 class="title is-4">Why This Helps</h3>
                        <p style="font-size: 125%">Existing tools suggest the next tactic, but they don't tell you if you're making progress. LeanProgress gives you a global view: instead of just knowing what to try next, you know how far you are from the goal. This helps you avoid dead ends and focus on promising paths.</p>

                        <h3 class="title is-4">Future Work</h3>
                        <p style="font-size: 125%">Step predictions could work as reward signals for reinforcement learning. Instead of only getting feedback at the end (success or failure), you'd get continuous feedback throughout the proof. This might enable better RL training for theorem proving.</p>
                    </div>
                </div>
            </div>
        </div>
    </div>
</section>

<section class="section">
    <div class="container is-max-widescreen">
        <div class="rows">
            <div class="rows is-centered">
                <div class="row is-full-width">
                    <h2 class="title is-3"><span class="lean-infer">Main Contributions</span></h2>
                    <div class="content">
                        <div style="font-size: 125%;">
                            <p><strong>1. Balanced dataset:</strong> We built a dataset of ~80k proof trajectories from Lean Workbook Plus and Mathlib4. Since most proofs are short, we used different sampling ratios to balance the data so longer proofs are well represented.</p>
                            <br />
                            <p><strong>2. Step prediction model:</strong> We fine-tuned DeepSeek Coder 1.3B to predict remaining steps from proof states. It achieves 75.8% accuracy (MAE 3.15). Unlike tactic suggestion tools that only look at the next step, this gives a global view of proof progress.</p>
                            <br />
                            <p><strong>3. Progress-guided search:</strong> We combine step predictions with tactic log probabilities in best-first search. This improves Mathlib4 pass rate from 41.4% to 45.2%.</p>
                            <br />
                            <p><strong>4. Practical tool:</strong> We integrated step prediction into LeanCopilot as the <code>predict_steps_with_suggestion</code> tactic. It shows both tactic suggestions and remaining steps, helping users choose better tactics during interactive proving.</p>
                            <br />
                            <p><strong>5. Foundation for RL:</strong> Step predictions provide continuous reward signals throughout the proof. This could enable better reinforcement learning for theorem proving, since you'd get feedback at every step rather than just at the end.</p>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </div>
</section>

<footer class="footer">
    <div class="container">
        <div class="columns is-centered">
            <div class="column">
                <div class="content has-text-centered">
                    <p>
                        Website template borrowed from <a
                            href="https://vimalabs.github.io/">VIMA</a>.
                    </p>
                </div>
            </div>
        </div>
    </div>
</footer>

</body>
</html>