GPQA `strict-match` regex pattern does not match the fewshot response template

Hi,

As per title.

In https://github.com/EleutherAI/lm-evaluation-harness/blob/68b03658ace40fa93221518b30096485a2387c58/lm_eval/tasks/gpqa/cot_n_shot/_gpqa_cot_n_shot_yaml#L9-L17 and https://github.com/EleutherAI/lm-evaluation-harness/blob/68b03658ace40fa93221518b30096485a2387c58/lm_eval/tasks/gpqa/generative/_gpqa_generative_n_shot_yaml#L9-L17 we have the instruction `Answer the final question yourself, following the format of the previous questions exactly`, with the expected output `regex_pattern: "(?<=The answer is )(.*)(?=.)"`.

However, when using e.g. `--num_fewshot 5`, the answers in the prompt are formatted as follow: 

<img width="1178" height="1170" alt="Image" src="https://github.com/user-attachments/assets/d248ea56-57f4-4f45-8dcd-196fa5350b0e" />

Which is not the format suggested in the regex, as `The answer is` is missing. The ends up not incentivizing the model to use the regex format, and eventually the `strict-match` is 0.

Only `flexible-extract` is decent.

One can reproduce with:

```bash
CUDA_VISIBLE_DEVICES=0 nohup lm_eval \
  --model hf \
  --model_args '{"pretrained":"openai/gpt-oss-20b","dtype":"auto","chat_template_args":{"reasoning_effort":"low"},"enable_thinking": true}' \
  --device "cuda" \
  --gen_kwargs max_gen_toks=4048 \
  --tasks gpqa_diamond_generative_n_shot \
  --apply_chat_template \
  --fewshot_as_multiturn \
  --limit 1 \
  --num_fewshot 5 \
  --batch_size 1
```

Thank you!

	description: "Here are some example questions from experts. Answer the final question yourself, following the format of the previous questions exactly.\n"
	doc_to_text: "Question: {{Question}}\nChoices:\n(A) {{choice1}}\n(B) {{choice2}}\n(C) {{choice3}}\n(D) {{choice4}}\nLet's think step by step: "
	doc_to_target: answer
	filter_list:
	- name: "strict-match"
	filter:
	- function: "regex"
	regex_pattern: "(?<=The answer is )(.*)(?=.)"
	- function: "take_first"

	description: "Here are some example questions from experts. Answer the final question yourself, following the format of the previous questions exactly.\n"
	doc_to_text: "Question: {{Question}}\nChoices:\n(A) {{choice1}}\n(B) {{choice2}}\n(C) {{choice3}}\n(D) {{choice4}}\nAnswer:"
	doc_to_target: answer
	filter_list:
	- name: "strict-match"
	filter:
	- function: "regex"
	regex_pattern: "(?<=The answer is )(.*)(?=.)"
	- function: "take_first"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GPQA `strict-match` regex pattern does not match the fewshot response template #3404

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GPQA strict-match regex pattern does not match the fewshot response template #3404

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

GPQA `strict-match` regex pattern does not match the fewshot response template #3404