How to install Apache Airflow

Apache Airflow

What is AirFlow

Apache Airflow is a server side open source workflow management platform for data engineering pipelines. Started by Airbnb, Airflow is written in Python and the workflows are created in Python Scripts. It follows the principles of configuration-as-a-code. Airflow uses something called as Directed Acyclic Graphs (DAGs) to manage workflow orchestration.

Steps to Install Apache AirFlow

This tutorial is divided into 3 parts — Installing VirtualBox, Setting up the Virtual Machine and Installing AirFlow

Part 1: Download and Install VirtualBox

If you have the VB installed, skip to Part 2

Go to the virtualbox page: https://www.virtualbox.org/wiki/Downloads

Double click the downloaded package and follow the instructions to install

Once installed, open Virtual Box and you should obtain the following output

Virtual Box is ready

Part 2: Setting up the Virtual Machine

Download the VM file — AirflowVM.ova from here

The file is about 2.3 GB and may take time to download

Double click on it

Import the AirFlow VM in Virtual Box

You might NEED to uncheck Import hard drives as VDI if case you get an error after importing it related to ‘medium’

Click on “Start” and wait for the VM to start until you get the following output from it

AirFlow VM has started

If you get the error like below on starting the VM then follow these steps.

System Preferences >Security & Privacy (General tab)

Make sure App Store and identified developers option is selected in Allow apps downloaded from section. Do not forget to Restart the laptop. The error should go away.

Kernel Driver Error -1908

Part 3: Install Apache AirFlow

Download and install VS Code editor. From the Extensions tab, install the Remote SSH plugin.

Open the Terminal in VS Code and install the Python Virtual Environment. This is because we are trying to avoid the any potential mess-up of the Python packages

airflow@airflowvm: python3 -m venv sandbox

This creates a python virtual environment called sandbox

(sandbox) airflow@airflowvm: source sandbox/bin/activate

This activates the sandbox

(sandbox) airflow@airflowvm: pip install wheel

Installs the wheel package

Now you can install Apache AirFlow (we are installing version 2.1.0)

(sandbox) airflow@airflowvm:~$ pip install apache-airflow==2.1.0 — constraint https://gist.githubusercontent.com/marclamberti/742efaef5b2d94f44666b0aec020be7c/raw/21c88601337250b6fd93f1adceb55282fb07b7ed/constraint.txt

Note constraint option is important. It is followed by the location of the constraint file which ‘fixes’ the version of the python dependencies. Otherwise, when new versions of the dependencies are released, it may or may not work with airflow

If everything goes well, you should get the (sandbox) airflow@airflowvm: command prompt

This step may take about 10 mins.

(sandbox) airflow@airflowvm: airflow db init

This initialises the db of airflow and creates some files that are needed for airflow to run. You should get Initialization done message if everything goes well

Now start the web interface of airflow via

(sandbox) airflow@airflowvm: airflow webserver

AirFlow SignIn page

At this stage, you have successfully installed airflow and are ready to create DAG workflows

Follow me on LinkedIn

--

--

Engineer and Water Color Artist @toashishagarwal

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store