-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Open
Description
Hi,
As per title.
In
| description: "Here are some example questions from experts. Answer the final question yourself, following the format of the previous questions exactly.\n" | |
| doc_to_text: "Question: {{Question}}\nChoices:\n(A) {{choice1}}\n(B) {{choice2}}\n(C) {{choice3}}\n(D) {{choice4}}\nLet's think step by step: " | |
| doc_to_target: answer | |
| filter_list: | |
| - name: "strict-match" | |
| filter: | |
| - function: "regex" | |
| regex_pattern: "(?<=The answer is )(.*)(?=.)" | |
| - function: "take_first" |
lm-evaluation-harness/lm_eval/tasks/gpqa/generative/_gpqa_generative_n_shot_yaml
Lines 9 to 17 in 68b0365
| description: "Here are some example questions from experts. Answer the final question yourself, following the format of the previous questions exactly.\n" | |
| doc_to_text: "Question: {{Question}}\nChoices:\n(A) {{choice1}}\n(B) {{choice2}}\n(C) {{choice3}}\n(D) {{choice4}}\nAnswer:" | |
| doc_to_target: answer | |
| filter_list: | |
| - name: "strict-match" | |
| filter: | |
| - function: "regex" | |
| regex_pattern: "(?<=The answer is )(.*)(?=.)" | |
| - function: "take_first" |
Answer the final question yourself, following the format of the previous questions exactly, with the expected output regex_pattern: "(?<=The answer is )(.*)(?=.)".
However, when using e.g. --num_fewshot 5, the answers in the prompt are formatted as follow:
Which is not the format suggested in the regex, as The answer is is missing. The ends up not incentivizing the model to use the regex format, and eventually the strict-match is 0.
Only flexible-extract is decent.
One can reproduce with:
CUDA_VISIBLE_DEVICES=0 nohup lm_eval \
--model hf \
--model_args '{"pretrained":"openai/gpt-oss-20b","dtype":"auto","chat_template_args":{"reasoning_effort":"low"},"enable_thinking": true}' \
--device "cuda" \
--gen_kwargs max_gen_toks=4048 \
--tasks gpqa_diamond_generative_n_shot \
--apply_chat_template \
--fewshot_as_multiturn \
--limit 1 \
--num_fewshot 5 \
--batch_size 1Thank you!
Metadata
Metadata
Assignees
Labels
No labels