Results
We provided a turnkey solution based on a combination of cloud products. Using Apache Airflow, we enabled arbitrary modularity in the logic of each data/ML pipeline. The platform provides several additional benefits when it comes to scalability, resource allocation and scheduling of tasks.
The entire solution runs on a managed Kubernetes cluster. Apart from the fact that this is a great way to host Apache Airflow, it provides additional benefits. During the execution of some specialised machine learning workflows we use dedicated PODs that are essentially fully customised containers fine-tune for executing specific algorithms.
Finally we provided a CI/CD solution for streamlining the delivery of new versions of the algorithms via a combination of Git webhooks and automated building and deployment of new versions of the docker images that are used by the PODs.