Skip to content

ERROR:ssh: Could not resolve hostname deephealth_compss-worker_2: Name or service not known #2

@kinow

Description

@kinow

Hi,

I followed the instructions from the README.md file, and got the Docker compose cluster working after #1

But running the example results in the error below.

kinow@ranma:/tmp/docker-compss-runtime$ docker-compose exec compss-master bash
(eddl_onnx_last) root@c6e014895899:~# cd pyeddl/third_party/compss_runtime/
(eddl_onnx_last) root@c6e014895899:~/pyeddl/third_party/compss_runtime# runcompss --lang=python --python_interpreter=python3 --project=linux-based/project.xml --resources=linux-based/resources.xml eddl_train_batch_compss.py
[  INFO] Using default execution type: compss

----------------- Executing eddl_train_batch_compss.py --------------------------

WARNING: COMPSs Properties file is null. Setting default values
[(778)    API]  -  Starting COMPSs Runtime v2.6.rc2003 (build 20200408-1126.rcbac84bafe556637e165de38764868ac68a8a75e)
Sleeping 30 seconds...
E:  uname_result(system='Linux', node='c6e014895899', release='5.4.0-120-generic', version='#136-Ubuntu SMP Fri Jun 10 13:40:48 UTC 2022', machine='x86_64', processor='x86_64')
Generating Random Table
---------------------------------------------
---------------------------------------------

None
CS with low memory setup
Model training...
Number of epochs:  1
Number of epochs for parameter syncronization:  1
Training epochs [ 1  -  1 ] ...
Num workers:  4
Num images per worker:  15000
Workers batch size:  250
[ERRMGR]  -  WARNING: There was an exception when initiating worker deephealth_compss-worker_4.
[ERRMGR]  -  WARNING: There was an exception when initiating worker deephealth_compss-worker_2.
                      Stack trace:
                      Stack trace:
                      es.bsc.compss.exceptions.InitNodeException: [START_CMD_ERROR]: Could not start the NIO worker in resource deephealth_compss-worker_4 through user .
                      es.bsc.compss.exceptions.InitNodeException: [START_CMD_ERROR]: Could not start the NIO worker in resource deephealth_compss-worker_2 through user .
                      OUTPUT:
                      OUTPUT:
                      ERROR:ssh: Could not resolve hostname deephealth_compss-worker_2: Name or service not known
                      
                      	at es.bsc.compss.nio.master.starters.WorkerStarter.startWorker(WorkerStarter.java:90)
                      	at es.bsc.compss.nio.master.starters.WorkerStarter.startWorker(WorkerStarter.java:142)
                      	at es.bsc.compss.nio.master.NIOWorkerNode.start(NIOWorkerNode.java:153)
                      	at es.bsc.compss.types.resources.ResourceImpl.start(ResourceImpl.java:119)
                      	at es.bsc.compss.scheduler.types.allocatableactions.StartWorkerAction$1.run(StartWorkerAction.java:109)
[ERRMGR]  -  ERROR:   [START_CMD_ERROR]: Could not start the NIO worker in resource deephealth_compss-worker_2 through user .
                      OUTPUT:
                      ERROR:ssh: Could not resolve hostname deephealth_compss-worker_2: Name or service not known
[ERRMGR]  -  Shutting down COMPSs...
                      ERROR:ssh: Could not resolve hostname deephealth_compss-worker_4: Name or service not known
                      
                      	at es.bsc.compss.nio.master.starters.WorkerStarter.startWorker(WorkerStarter.java:90)
                      	at es.bsc.compss.nio.master.starters.WorkerStarter.startWorker(WorkerStarter.java:142)
                      	at es.bsc.compss.nio.master.NIOWorkerNode.start(NIOWorkerNode.java:153)
                      	at es.bsc.compss.types.resources.ResourceImpl.start(ResourceImpl.java:119)
                      	at es.bsc.compss.scheduler.types.allocatableactions.StartWorkerAction$1.run(StartWorkerAction.java:109)
[(163161)    API]  -  Execution Finished
Shutting down the running process

Error running application

(eddl_onnx_last) root@c6e014895899:~/pyeddl/third_party/compss_runtime#

Thanks!
-Bruno

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions