Guide to Installing and Configuring Apache Airflow 3.2.0 with PostgreSQL and Running Your First DAG
Source: Dev.to
Introduction
As a data engineer, you may have recently learned about Apache Airflow, what it is, and how it orchestrates and automates data workflows.
The next step is gaining hands‑on experience by setting it up in your own environment.
This article provides a step‑by‑step guide to:
- Installing and configuring Apache Airflow
- Connecting it to PostgreSQL
- Running your first DAG
By the end, you will have a fully functional Airflow environment ready for building and managing data pipelines.
We will follow the official installation guide.
Prerequisites
- A Linux environment (e.g., a Linux VPS)
- Python 3 installed
sudo apt install python-is-python3 # makes `python` point to Python 3
1. Set the Airflow home directory
Airflow uses ~/airflow by default, but you can choose another location.
Set the environment variable before installing Airflow:
export AIRFLOW_HOME=~/airflow
2. Create a project folder and virtual environment
cd ~ # go to your home directory
mkdir airflow && cd airflow
python -m venv airflow_venv
source airflow_venv/bin/activate
Upgrade pip:
pip install --upgrade pip
3. Install Apache Airflow
Specify the Airflow version you want (example: 3.2.0) and the matching Python‑version constraints:
pip install apache-airflow[celery]==3.2.0 \
--constraint https://raw.githubusercontent.com/apache/airflow/constraints-3.2.0/constraints-3.12.txt
Wait a few seconds for the installation to finish, then verify:
airflow version
4. Starting Airflow
4.1 Using Airflow Standalone (quick start)
airflow standalone
This command starts all components, but logs are printed to the terminal, blocking further use.
Run it in the background and redirect logs:
nohup airflow standalone > airflow.log 2>&1 &
Check the processes:
ps aux | grep airflow
Open the web UI at http://<your-ip>:8080 (e.g., http://102.209.32.65:8080).
4.2 Running components manually
If you prefer to start each service yourself:
airflow db migrate
airflow users create \
--username admin \
--firstname Peter \
--lastname Parker \
--role Admin \
--email spiderman@superhero.org
airflow api-server --port 8080
airflow scheduler
airflow dag-processor
airflow triggerer
Note: In Airflow 3+ the above commands require the Flask‑AppBuilder (FAB) auth manager.
Enable FAB auth manager
Edit airflow.cfg (default location: $AIRFLOW_HOME/airflow.cfg):
nano $AIRFLOW_HOME/airflow.cfg
# add/ensure:
auth_manager = airflow.providers.fab.auth_manager.fab_auth_manager.FABAuthManager
If you encounter
ModuleNotFoundError: No module named 'airflow.providers.fab'
install the missing provider:
pip install apache-airflow-providers-fab
Run the migration again:
airflow db migrate
Create the admin user (if not already done) and start the services in the background:
nohup airflow api-server --port 8080 > api-server.log 2>&1 &
nohup airflow scheduler > scheduler.log 2>&1 &
nohup airflow dag-processor > dag-processor.log 2>&1 &
nohup airflow triggerer > triggerer.log 2>&1 &
Airflow is now reachable via the browser UI.
5. Adjusting Airflow configuration
Before editing, stop any running Airflow processes:
pkill -9 airflow
Open the configuration file:
nano $AIRFLOW_HOME/airflow.cfg
Common changes (optional)
| Setting | Desired value | Comment |
|---|---|---|
dags_folder | /root/workflows | Location where you will store your DAG files |
default_timezone | your/local/timezone (e.g., Europe/Paris) | Align timestamps with your region |
executor | LocalExecutor | Use when running locally and you need parallel tasks |
sql_alchemy_conn | postgresql+psycopg2://user:password@localhost:5432/airflowdb | Point to an external PostgreSQL instance |
load_examples | False | Disable the example DAGs that ship with Airflow |
Save (Ctrl+S) and exit (Ctrl+X).
6. Install database drivers
Inside the activated virtual environment:
pip install psycopg2-binary # PostgreSQL driver
pip install asyncpg # Async PostgreSQL driver (optional but recommended)
Run the migration to create Airflow tables in the new database:
airflow db migrate
7. Add your first DAG
Create the directory you pointed dags_folder to (e.g., /root/workflows) and add a simple DAG:
mkdir -p /root/workflows
cd /root/workflows
nano simple.py
Paste the following Python code and save the file:
from airflow import DAG
from datetime import datetime, timedelta
from airflow.providers.standard.operators.python import PythonOperator
def say_hello():
print("Hello from Airflow!")
default_args = {
"owner": "airflow",
"depends_on_past": False,
"retries": 1,
"retry_delay": timedelta(minutes=5),
}
with DAG(
dag_id="hello_world",
start_date=datetime(2024, 1, 1),
schedule_interval="@daily",
default_args=default_args,
catchup=False,
) as dag:
hello_task = PythonOperator(
task_id="say_hello",
python_callable=say_hello,
)
After saving, refresh the Airflow UI – the hello_world DAG should appear and be ready to run.
🎉 You now have a fully functional Apache Airflow installation, connected to PostgreSQL, and running your first DAG! 🎉
Feel free to explore more complex DAGs, integrate additional providers, and scale your executor as needed. Happy data engineering!
Simple Airflow DAG Example
from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.python import PythonOperator
def say_hello():
print("Hello from Airflow!")
def say_goodbye():
print("Goodbye from Airflow!")
with DAG(
dag_id="simple_dag",
start_date=datetime(2026, 1, 1),
schedule_interval=timedelta(minutes=5),
catchup=False,
) as dag:
hello_task = PythonOperator(
task_id="hi",
python_callable=say_hello,
)
goodbye_task = PythonOperator(
task_id="bye",
python_callable=say_goodbye,
)
hello_task >> goodbye_task
Note: Airflow automatically picks up the DAG and loads it. DAGs are listed in the DAGs section of the UI.
Viewing the DAG
- Click on the DAG name to see its details, including run history and success/failure status.
Recap
In this article you have:
- Successfully installed and configured Apache Airflow 3.2.0.
- Connected Airflow to a PostgreSQL backend.
- Explored two ways of launching Airflow:
- The simplified standalone approach.
- A production‑style setup with manually created users and individual Airflow services.
- Made essential configuration changes in the
airflow.cfgfile. - Deployed your first DAG into the Airflow environment.
With this foundation you now have a functional orchestration platform capable of scheduling, monitoring, and managing data workflows. As you continue learning Airflow, you can dive into more advanced topics such as:
- Complex task dependencies
- Advanced scheduling strategies
- Integrations with cloud platforms
- Building production‑grade ETL and data‑engineering pipelines
Happy orchestrating!