-
Notifications
You must be signed in to change notification settings - Fork 10
swe-bench #280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
swe-bench #280
Conversation
...ounts__pyroworks__models__swe-1-mtp#accounts__pyroworks__deployments__r5dfiiwp.eval-run.json
Outdated
Show resolved
Hide resolved
…we-1-mtp#accounts__pyroworks__deployments__r5dfiiwp.eval-run.json
xzrderek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as other pr, let dylan / benny take a look before merging
| remote_base_url="http://127.0.0.1:3000", | ||
| model_base_url="https://tracing.fireworks.ai", | ||
| timeout_seconds=1800, | ||
| output_data_loader=default_fireworks_output_data_loader, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need to specify this? I thought there is a default one
examples/swebench/tracing_model.py
Outdated
| @@ -0,0 +1,159 @@ | |||
| """ | |||
| TracingFireworksModel - Routes through tracing using OpenAI SDK. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you give a more comprehensive explanation for this file. current this is confusing for somebody who doesn't know SWE-bench
- Why do we need it?
- What is it doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM besides some other nits! Do you mind sharing a screenshot of the local UI running in this PR if its easy to collect. I am just curious what the output looks like.
| max_dataset_rows=2, | ||
| rollout_processor=RemoteRolloutProcessor( | ||
| remote_base_url="http://127.0.0.1:3000", | ||
| model_base_url="https://tracing.fireworks.ai", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't think you need this
| if message.startswith("EVAL_RESULT:"): | ||
| result_json = message.replace("EVAL_RESULT:", "") | ||
| row.evaluation_result = EvaluateResult.model_validate_json(result_json) | ||
| break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm i don't quite get this logic here. i thought we should be reading from the tracing.fireworks.ai, check out default_fireworks_output_data_loader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe i am misunderstanding?
SWE-bench integration into eval-protocol, run swe-bench locally and use our RemoteRolloutProcessor to interact with the server.py