In order to work as intended, the docker-compose stack requires some setup:
-
A docker network named
www. Use the following command to create it:docker network create www -
A Traefik service working on the
wwwnetwork.Traefik is a service that is capable of routing requests to web sub-domain to services built using docker. We are using it just for this purpose, although it can also perform other tasks.
To create this service, check the file
extra/docker-compose.traefik.yaml. -
A
.envfile need to be created first. This file is not included in the repository since it is server-dependant.The content is the following:
DOMAIN=<domain of the machine (used only for traefik labels)> CELERY_BROKER_URL=pyamqp://rabbitmq/ CELERY_BACKEND_URL=redis://redis/ CELERY_QUEUE= DATABASE_SCHEMA=mlpdb DATABASE_USER=mlp DATABASE_PASS=mlp DATABASE_HOST=database DATABASE_URL=postgresql://${DATABASE_USER}:${DATABASE_PASS}@${DATABASE_HOST}/${DATABASE_SCHEMA} GRAFANA_ADMIN_PASS=grafanaRemember that these password are written in a non-encripted way. This is not a safe solution.
Then launch the docker through the docker compose, execute the following command from the root directory of this repository:
docker-compose up -d
This proof-of-concept software use synthetic data generated by sampling some distributions. To generate these data, just rund the following command and it will populate the /dataset folder with TSV (Tab Separated Value) files.
python dataset_generator.py
In order to simulate the use the application from of external users, the script traffic_generator.py can be used.
Basic command to execute with default parameters is
python traffic_generator.py
Some parameters can be used to control the behavior of the users:
-
--config <path>is a path to a configuration file. A configuration file is a.tsv(Tab Separated Value) file that contains all the parameters for theUserDataandUserLabellerbehavior. See the filesconfig/user.tsvandconfig/user_noise.tsvfor some examples. -
-pnumber of parallel thread to run. Each thread will contact the application independently. -
-dprobability to have a response. If set to 1.0, it is certain that there will always be a response. If set to 0.0, the user will never set a response. -
To control the waiting time use the
-tminand-tmaxparameters. The number is expressed in seconds. For less than a second use decimals (i.e. 100ms is written as 0.1).-tminis the minimum amount of time to wait after a request to the application.-tmaxmaximum amount of time to wait after a request to the application. The wait is randomly choosed between the-tminand-tmaxvalues. Higher values mean a slow generation of new cdata. Bigger is the difference between these two parameters and higher is the variance in the waiting time.
To develop this application, a Python virutal environmnet is highly recommended. If a development machine with Docker is not available, it is possible to use the three requirements.txt file to create a fully working environment:
requirements.api.txtcontains all the packages for the API service,requirements.worker.txtcontains all the packages for the Celery worker service,requirements.txtcontains extra packages and utilities required by scripts or for the development.
To create a virtual environment using the python-venv package, use the following command:
python -m venv MLPenv
Then remember to activate the environment before launching the scripts:
source ./MLPenv/bin/activate
- SQL (Relational) Databases
- Python ML in Production - Part 1: FastAPI + Celery with Docker
- First Steps with Celery
- Next Steps
- Serving ML Models in Production with FastAPI and Celery
- Multi-stage builds #2: Python specifics
- SQLAlchemy ORM — a more “Pythonic” way of interacting with your database
- Events: startup - shutdown
- Overview | Prometheus
- Instrumentation | Prometheus
- prometheus/client_python | GitHub
- kozhushman/prometheusrock | GitHub
This software was build as proof-of-concept and as a support material for the course Machine Learning in Production.
It is not intended to be used in a real production system, although some state-of-the-art best practice has been followed to implement it.