Airflow - Daemonic processes are not allowed to have children
I am using the docker-compose.yml provided by Airflow to run Airflow on Docker. It is a fairly complete setup with a CeleryExecutor so you are not limited to 1 DAG run at a time.
Most of our data pipeline scripts use Python’s multiprocessing
to take advantage of the billion cores we have on our servers. I realized the docker-compose.yml provided by Airflow doesn’t really work with ProcessPoolExecutor
I am using from concurrent.futures
(which is really just a wrapper for the multiprocessing library). This is because the default “celery worker” is using processes in the same way as ProcessPoolExecutor
does.
If you Google the problem, you will see two things:
- People suggest using swapping
multiprocessing
withbilliard
: https://github.com/apache/airflow/issues/14896#issuecomment-908516288 - And people trying celery worker’s
-P threads
option. https://github.com/apache/airflow/issues/14896#issuecomment-866768004
I like the second one more since first one means depending on a package I don’t know.
In order to set the -P threads
flag for your Airflow’s celery workers, add the following to your docker-compose.yml
:
AIRFLOW__CELERY__POOL: threads
Do not try to add it to the command:
in airflow-worker
as I did, since that command becomes airflow celery worker -P threads
which is not valid.
Problem solved!