Skip to content

Run Fluid with EDL #35

@typhoonzero

Description

@typhoonzero

Tasks

  • full fault-tolerant training
  • dynamic trainer count in the pserver side so that we will be able to average gradients according to current trainer count.
  • Upgrade EDL controller to CRD so that we can support Kubernetes higher than v1.8
  • a tutorial to run distributed lookup sparse table with EDL
  • update experiment report, https://github.com/PaddlePaddle/cloud/tree/develop/doc/edl/experiment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions