Skip to content

Evaluate WebWand on the WebArena dataset #154

@lingjiefeng

Description

@lingjiefeng

Use WebArena benchmark.

  1. Setup the standalone environment of WebArena
  2. Configurate the urls for each website.
  3. Generate config file for each test example and obtain the auto-login cookies for all websites
  4. Write script to use WebArena's environment based on its run.py
  5. Save task execution results and evaluate.
  6. Analyze the evaluation results

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions