Running into "Internal error occurred for the current attempt" problem

I am using CloudTuner for TFX project, but I keep getting `Internal error occurred for the current attempt` error, and it doesn't show me what is the actual problem under the hood.

Below is the JSON passed to the CloudTuner, and this is my [repository](https://github.com/deep-diver/complete-mlops-system-workflow/blob/3281f6b8aaf2a1eb016c6dc9465c96835c005e31/training_pipeline/models/model.py#L185).

The `imageUri`, I passed the TFX docker image.

```json
{
  "scaleTier": "CUSTOM",
  "masterType": "standard",
  "workerType": "standard",
  "workerCount": "2",
  "region": "us-central1",
  "masterConfig": {
    "imageUri": "gcr.io/gcp-ml-172005/img-classification",
    "containerCommand": [
      "python",
      "-m",
      "tfx.scripts.run_executor",
      "--executor_class_path",
      "tfx.extensions.google_cloud_ai_platform.tuner.executor._WorkerExecutor",
      "--inputs",
      "{\"examples\": [{\"artifact\": {\"id\": \"302652664909979029\", \"uri\": \"gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/img-classification/874401645461/img-classification-20220725145617/Transform_-7372794461505454080/transformed_examples\", \"properties\": {\"split_names\": {\"string_value\": \"[\\\"train\\\", \\\"eval\\\"]\"}}, \"custom_properties\": {\"tfx_version\": {\"struct_value\": {\"__value__\": \"1.9.0\"}}}}, \"artifact_type\": {\"name\": \"Examples\", \"properties\": {\"span\": \"INT\", \"version\": \"INT\", \"split_names\": \"STRING\"}, \"base_type\": \"DATASET\"}, \"__artifact_class_module__\": \"tfx.types.standard_artifacts\", \"__artifact_class_name__\": \"Examples\"}], \"transform_graph\": [{\"artifact\": {\"id\": \"7122557137885461129\", \"uri\": \"gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/img-classification/874401645461/img-classification-20220725145617/Transform_-7372794461505454080/transform_graph\", \"custom_properties\": {\"tfx_version\": {\"struct_value\": {\"__value__\": \"1.9.0\"}}}}, \"artifact_type\": {\"name\": \"TransformGraph\"}, \"__artifact_class_module__\": \"tfx.types.standard_artifacts\", \"__artifact_class_name__\": \"TransformGraph\"}]}",
      "--outputs",
      "{\"best_hyperparameters\": [{\"artifact\": {\"id\": \"6837211415839241726\", \"uri\": \"gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/img-classification/874401645461/img-classification-20220725145617/Tuner_6462263593776709632/best_hyperparameters\"}, \"artifact_type\": {\"name\": \"HyperParameters\"}, \"__artifact_class_module__\": \"tfx.types.standard_artifacts\", \"__artifact_class_name__\": \"HyperParameters\"}]}",
      "--exec-properties",
      "{\"custom_config\": \"{\\\"ai_platform_tuning_args\\\": {\\\"masterConfig\\\": {\\\"imageUri\\\": \\\"gcr.io/gcp-ml-172005/img-classification\\\"}, \\\"project\\\": \\\"gcp-ml-172005\\\", \\\"region\\\": \\\"us-central1\\\", \\\"scaleTier\\\": \\\"STANDARD_1\\\"}, \\\"masterConfig\\\": {\\\"imageUri\\\": \\\"gcr.io/gcp-ml-172005/img-classification\\\"}, \\\"project\\\": \\\"gcp-ml-172005\\\", \\\"region\\\": \\\"us-central1\\\", \\\"remote_trials_working_dir\\\": \\\"gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/img-classification/trials\\\", \\\"scaleTier\\\": \\\"STANDARD_1\\\"}\", \"eval_args\": \"{\\n  \\\"num_steps\\\": 4\\n}\", \"train_args\": \"{\\n  \\\"num_steps\\\": 160\\n}\", \"tune_args\": \"{\\n  \\\"num_parallel_trials\\\": 3\\n}\", \"tuner_fn\": \"models.model.cloud_tuner_fn\"}"
    ]
  }
}
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running into "Internal error occurred for the current attempt" problem #387

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Running into "Internal error occurred for the current attempt" problem #387

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions