Skip to content

Issue replicate Vlaser-8b VLM on Embodiedbench EB-ALFRED #5

@estherhan1

Description

@estherhan1

Dear Authors,

I hope you are doing well.

I am currently trying to reproduce the reported results for Vlaser-8B. So far, my EB-Habitat result is relatively close to the reported number, but my EB-ALFRED performance is much lower than expected.

For reference, the reported and reproduced results are:

Expected

  • EB-ALFRED average success rate: 0.50
  • EB-Habitat success rate: 0.40

Mine

  • EB-ALFRED average success rate: 0.10
  • EB-Habitat success rate: 0.42

My setup is as follows:

  • Cluster: SLURM + Singularity (H100)
  • Model: OpenGVLab/Vlaser-8B
  • Code: official repository, with only launcher/runtime adaptations for SLURM/Singularity (mainly display, environment, and path handling)

For EB-ALFRED, the detailed task_success results are:

  • base: 0.16
  • common_sense: 0.14
  • complex_instruction: 0.12
  • visual_appearance: 0.10
  • spatial: 0.08
  • long_horizon: 0.00
  • mean over the 6 subsets: 0.10

Other aggregate statistics (mean over 6 subsets) are:

  • task_progress: 0.1678
  • num_invalid_actions: 8.68
  • planner_output_error: 0.46

I was wondering whether you might be able to share your evaluation setup used for the published Vlaser-8B EB-ALFRED result.

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions