Skip to content

Development Environment

Theo edited this page Jul 24, 2020 · 23 revisions

This page aims to illuminate new contributors on important details concerning how the project has been developed thus far. Various platforms, services, and tools were used including: Docker, Airflow, and AWS. This page will be focusing on setting up and using Airflow via Docker containerization.

Note:
The production environment utilizes an Airflow cluster featuring numerous nodes, each with different responsibilities, whereas the development environments do not. This is an important discrepancy to keep in mind.

Directory Tree

The production directory tree:

usr
    local
        airflow
            dags
                efs
                    admintools
                        dags
                        scripts
                        sql
                        resources
                    redb
                        dags
                        scripts
                        sql
                        resources
                    211dashboard
                        dags
                        scripts
                        sql
                        resources

Project files in local Windows machine:
c:/Users/tim0416/Projects/Regional-Data-Exchange/REDB-Workflows

Project files in Ubuntu container or Ec2:
/usr/local/airflow/dags/efs/redb

Below, you will find further details as to how Docker and Airflow have been set up and/or used.

Docker / Airflow

The link below will take you to a tutorial on Airflow in Docker. Our setup follows these steps closely:
Getting Started with Airflow using Docker

In summary, running the command below will create an Airflow container in your Docker environment in which all of the contents from your local directory (where you ran the command) is reflected in the Docker container (at the path specified after the colon):

docker run -d -p 8080:8080 -u root -v {local directory}:{directory inside container} --name webserver puckel/docker-airflow:latest webserver

Check out Docker's "volume" documentation for more details: https://docs.docker.com/storage/volumes/

Note: We were using Docker Engine v19.03.8 on a Windows 10 Pro machine.
Note: If running your Docker commands from GitBash, you may experience path issues with the above command. In our experience, GitBash has its own way of interpreting Windows paths. We've encountered a solution. When referencing your local machine in the above Docker command, use a path with the following format: c:/Users/Username/Directory/Regional-Data-Exchange/REDB (the key being the lowercase "c", the ":" at the beginning, and forward slashes as opposed to backslashes).

Relaunching Airflow after shutdown

A new fernet key is generated each time your Airflow container is restarted. This renders anything that was encrypted using the previous fernet key undecryptable. To remedy this, you'll have to recreate anything that used the previous fernet key, a prime example being any Airflow "connections".

Clone this wiki locally