Exploring and creating workflows in Apache Airflow workflows
Exploring and creating workflows in Apache Airflow workflows
Apache Airflow is python-based workflow based management system developed by Airbnb.
Workflows can be used to automate the pipelines, ETL process. It uses Directed Acyclic Graphs (DAGs) to define worfklows.
A brief understanding of Airflow DAGs.
Note:
Further reading can be done here -
[1]: https://airflow.apache.org/docs/stable/index.html
[2]: Airflow use case from Lyft
[3]: Airflow Operators and Hooks
[4]: Snowflake connector
[5]: Connecting to Snowflake using Airflow
6: Airflow SubDAGs
[7]: Slack integration
Airflow can send metrics to StatsD.
StatsD can send data to backend service for further visualisation and analysis (ex. Datadog). StatsD is composed of three components - client, server and backend.
It sends metrics in UDP packets, if metrics are very important one needs to use TCP connection/client for sending metrics (recently added to StatsD).
Useful commands:
To listen to StatsD connection on port 8125
while true; do nc -l localhost 8125; done
Integrating the Datadog with Airflow:
Datadog is a monitoring service. It gets data from StatsD daemon of Airflow and DatadogD daemon sends these data to cloud host.
We can use Datadog for viewing/visualising the metric data and enhancing querying on the metric data.
Setup -
Config and mapping files:
Further reading on StatsD -
[1]: Setup Metrics for Airflow using StatsD
[2]: https://thenewstack.io/collecting-metrics-using-statsd-a-standard-for-real-time-monitoring/
[3]: Python StatsD documentation
[4]: https://sysdig.com/blog/monitoring-statsd-metrics/
[5]: https://www.scalyr.com/blog/statsd-measure-anything-in-your-system/