Data Engineering/Machine Learning Pipelines for Structural Health Monitoring
API / Business Logic / Cloud / HPC / Software Development
Our client needed to implement complex flows of data engineering and machine-learning-based predictions. Due to the nature of the underlying product (bleeding edge technology in the field), it was important to provide a streamlined solution for updating and modifying the deployed workflows on a regular basis, while they execute around-the-clock on a large pool of data.
We provided a turnkey solution based on a combination of cloud products. Using Apache Airflow, we enabled arbitrary modularity in the logic of each data/ML pipeline. The platform provides several additional benefits when it comes to scalability, resource allocation and scheduling of tasks.
The entire solution runs on a managed Kubernetes cluster. Apart from the fact that this is a great way to host Apache Airflow, it provides additional benefits. During the execution of some specialised machine learning workflows we use dedicated PODs that are essentially fully customised containers fine-tune for executing specific algorithms. Finally we provided a CI/CD solution for streamlining the delivery of new versions of the algorithms via a combination of Git webhooks and automated building and deployment of new versions of the docker images that are used by the PODs.