ndslabs-knoweng

An experiment in running KnowEnG platform, pipelines, and IDE under Kubernetes.

Prerequisites

Docker (preferrably 1.10.x - 1.13.x)
Mount propagation must be enabled
- See https://docs.portworx.com/knowledgebase/shared-mount-propogation.html#ubuntu-configuration-and-shared-mounts

NOTE: On my Ubuntu AWS VM, I had to run the following command (or else the kubelet container would fail to start):

sudo mount --make-shared /

Clone the Source

git clone https://github.com/nds-org/ndslabs-knoweng
cd ndslabs-knoweng

Building all Docker Images

To quickly build up all of the pipeline images:

./compose.sh build

NOTE: If your Docker version differs, you may need to adjust the version in ./compose.sh

To push all images to DockerHub (required for multi-node cluster):

./compose.sh push

NOTE: You will need to docker login (and probably change the image/tags in the docker-compose.yml) before you can push

New to Kubernetes?

For some introductory slides to Kubernetes terminology, check out https://docs.google.com/presentation/d/1VDYrSlwLY_Efucq_n75m9Rf_euJIOIACh27BfOmh-ps/edit?usp=sharing

Multi-Node Deployment Options

Vagrant: https://github.com/kubernetes-incubator/kubespray/blob/master/docs/vagrant.md
Terraform / Kubespray: https://github.com/BenGalewsky/kubespray
ndslabs-deploy-tools (deprecated): https://github.com/nds-org/ndslabs-deploy-tools

Hyperkube (single-node, containerized)

To run a development Kubernetes cluster (via Docker):

./kube.sh

This will start a series of Docker containers that allow you to run Kubernetes on a single machine!

NOTE: You'll need to manually add the path to the kubectl binary to your $PATH.

Minikube (single-node, single VM)

See https://github.com/kubernetes/minikube

Vagrant (single-node, multiple VMs)

See https://github.com/kubernetes-incubator/kubespray/blob/master/docs/vagrant.md

Running the Platform

To run the KnowEnG platform, you need only kubectl access to a Kubernetes cluster.

To start up the platform in dev mode, with a Cloud9 IDE:

./knoweng.sh

For an example of the running platform, see: knoweng.org/analyze

Behind the Scenes

The ./knoweng.sh helper script does several things:

Ensures that the user has a basic-auth secret set up for Cloud9 to consume
Generates self-signed SSL certs if they are not found for the given domain
Ensures that certificates have been imported as Kubernetes secrets
Create ingress rules (referencing our TLS secrets) that will route /ide.html to Cloud9, and route everything else to the KnowEnG Dashboard
Starts up a Pod running the Kubernetes NGINX Ingress Controller and its default-backend
Starts up a Pod running the 4 containers comprising KnowEnG Dashboard
Starts up a Pod running the Cloud9 IDE

Viewing Pod Logs

To view the logs of an individual pod (where your work items are being executed):

root@knowdev2:/home/ubuntu/ndslabs-knoweng# kubectl get pods                                                                                                     
NAME                         READY     STATUS    RESTARTS   AGE
cloud9-984f9                 1/1       Running   0          4h
default-http-backend-blrqv   1/1       Running   0          4h
nest-1442377537-3hhbf        4/4       Running   0          4h
nginx-ilb-rc-02fzw           1/1       Running   0          4h

# View the logs of a single-container Pod
kubectl logs -f nginx-ilb-rc-02fzw

# For multi-container pods, you must specify a container with -c
kubectl logs -f nest-1442377537-3hhbf -c flask

Running Pipelines

To run the Data Cleanup pipeline:

kubectl create -f pipelines/dc.job.yaml

To run the Gene Prioritization pipeline:

kubectl create -f pipelines/gp.job.yaml

To run the Gene Set Characterization pipeline:

kubectl create -f pipelines/gsc.job.yaml

To run the Samples Clustering pipeline:

kubectl create -f pipelines/sc.job.yaml

This will create the Job objects on your Kubernetes cluster. Job objects themselves don't execute anything (and therefore don't keep logs), but they will spawn Pods (groups of containers) to execute the desired work item(s).

Monitoring Jobs

To view the status of your jobs and their pods:

core@nds842-master1 ~/ndslabs-knoweng $ kubectl get jobs,pods -a
NAME           DESIRED   SUCCESSFUL   AGE
jobs/gp-test   1         1            1m

NAME                            READY     STATUS      RESTARTS   AGE
po/gp-test-hzcq8                0/1       Completed   0          1m

This will list off all running jobs and their respectives pods (replicas).

NOTE: The -a flag tells to Kubernetes to include pods that have Completed in the returned list.

Viewing Pod logs

Use the pod name to check the log output:

# View the logs of a Pod spawned by a Job
kubectl logs -f gp-test-hzcq8

Viewing output files

Check the contents of your shared storage to view output files:

core@nds842-node1 ~ $ du -h -a /var/glfs/global/jobs
512	/var/glfs/global/jobs/data-cleanup/results/TEST_1_gene_expression_UNMAPPED.tsv
0	/var/glfs/global/jobs/data-cleanup/results/TEST_1_gene_expression_MAP.tsv
1.0K	/var/glfs/global/jobs/data-cleanup/results/log_gene_prioritization_pipeline.yml
5.5K	/var/glfs/global/jobs/data-cleanup/results
9.5K	/var/glfs/global/jobs/data-cleanup
512	/var/glfs/global/jobs/gene-prioritization/results/drug_A_bootstrap_net_correlation_pearson_Mon_03_Jul_2017_23_37_33.461636543_viz.tsv
512	/var/glfs/global/jobs/gene-prioritization/results/drug_B_bootstrap_net_correlation_pearson_Mon_03_Jul_2017_23_37_33.502016782_viz.tsv
512	/var/glfs/global/jobs/gene-prioritization/results/drug_C_bootstrap_net_correlation_pearson_Mon_03_Jul_2017_23_37_33.491072654_viz.tsv
512	/var/glfs/global/jobs/gene-prioritization/results/drug_E_bootstrap_net_correlation_pearson_Mon_03_Jul_2017_23_37_33.497779369_viz.tsv
512	/var/glfs/global/jobs/gene-prioritization/results/drug_D_bootstrap_net_correlation_pearson_Mon_03_Jul_2017_23_37_33.502823591_viz.tsv
512	/var/glfs/global/jobs/gene-prioritization/results/ranked_genes_per_phenotype_bootstrap_net_correlation_pearson_Mon_03_Jul_2017_23_37_33.781181335_download.tsv
512	/var/glfs/global/jobs/gene-prioritization/results/top_genes_per_phenotype_bootstrap_net_correlation_pearson_Mon_03_Jul_2017_23_37_33.786688327_download.tsv
7.5K	/var/glfs/global/jobs/gene-prioritization/results
12K	/var/glfs/global/jobs/gene-prioritization
512	/var/glfs/global/jobs/gene-set-characterization/results/net_path_ranked_by_property_Mon_03_Jul_2017_23_37_33.879306793.df
512	/var/glfs/global/jobs/gene-set-characterization/results/net_path_sorted_by_property_score_Mon_03_Jul_2017_23_37_33.885757923.df
512	/var/glfs/global/jobs/gene-set-characterization/results/net_path_droplist_Mon_03_Jul_2017_23_37_33.891902446.tsv
5.5K	/var/glfs/global/jobs/gene-set-characterization/results
2.5K	/var/glfs/global/jobs/gene-set-characterization/job-parameters.yml
12K	/var/glfs/global/jobs/gene-set-characterization
365K	/var/glfs/global/jobs/samples-clustering/results/consensus_matrix_cc_net_nmf_Mon_03_Jul_2017_23_39_04.368379831_viz.tsv
512	/var/glfs/global/jobs/samples-clustering/results/silhouette_average_cc_net_nmf_Mon_03_Jul_2017_23_39_04.496203184_viz.tsv
4.0K	/var/glfs/global/jobs/samples-clustering/results/samples_label_by_cluster_cc_net_nmf_Mon_03_Jul_2017_23_39_04.502247810_viz.tsv
1.5K	/var/glfs/global/jobs/samples-clustering/results/clustering_evaluation_result_Mon_03_Jul_2017_23_39_05.487591743.tsv
56M	/var/glfs/global/jobs/samples-clustering/results/genes_by_samples_heatmap_cc_net_nmf_Mon_03_Jul_2017_23_39_09.115032196_viz.tsv
594K	/var/glfs/global/jobs/samples-clustering/results/genes_averages_by_cluster_cc_net_nmf_Mon_03_Jul_2017_23_39_17.276435136_viz.tsv
404K	/var/glfs/global/jobs/samples-clustering/results/genes_variance_cc_net_nmf_Mon_03_Jul_2017_23_39_17.396065711_viz.tsv
305K	/var/glfs/global/jobs/samples-clustering/results/top_genes_by_cluster_cc_net_nmf_Mon_03_Jul_2017_23_39_17.440115690_download.tsv
58M	/var/glfs/global/jobs/samples-clustering/results
4.5K	/var/glfs/global/jobs/samples-clustering/job-parameters.yml
58M	/var/glfs/global/jobs/samples-clustering
58M	/var/glfs/global/jobs

We can see that no matter which node ran our jobs, all of our output files are available in one place.

NOTE: On multi-node clusters, you will need to SSH to a compute node or start a container mounting the shared storage. The shared storage is not currently mounted on the master.

Cleaning Up Jobs

Kubernetes leaves it up to the user to delete their own Job objects, which stick around indefinitely to ease debugging.

To delete a job (and trigger clean up of its corresponding pods):

kubectl delete jobs/gp-test

To delete all jobs:

kubectl delete jobs --all

NOTE: Supposedly, Kubernetes can be configured to clean up completed/failed jobs automatically, but I have not yet experimented with this feature.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
autoscaling		autoscaling
efs		efs
manifests		manifests
pipelines		pipelines
platform		platform
rbac		rbac
.gitignore		.gitignore
README.md		README.md
compose.sh		compose.sh
docker-compose.yml		docker-compose.yml
intro.md		intro.md
knoweng.sh		knoweng.sh
kube.sh		kube.sh
lab-ide-build.sh		lab-ide-build.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ndslabs-knoweng

Prerequisites

Clone the Source

Building all Docker Images

New to Kubernetes?

Multi-Node Deployment Options

Hyperkube (single-node, containerized)

Minikube (single-node, single VM)

Vagrant (single-node, multiple VMs)

Running the Platform

Behind the Scenes

Viewing Pod Logs

Running Pipelines

Monitoring Jobs

Viewing Pod logs

Viewing output files

Cleaning Up Jobs

About

Uh oh!

Releases

Packages

Languages

bodom0015/knoweng-startup

Folders and files

Latest commit

History

Repository files navigation

ndslabs-knoweng

Prerequisites

Clone the Source

Building all Docker Images

New to Kubernetes?

Multi-Node Deployment Options

Hyperkube (single-node, containerized)

Minikube (single-node, single VM)

Vagrant (single-node, multiple VMs)

Running the Platform

Behind the Scenes

Viewing Pod Logs

Running Pipelines

Monitoring Jobs

Viewing Pod logs

Viewing output files

Cleaning Up Jobs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages