Version 1.0

Vortrag: Industrial Machine Learning

Building scalable distributed machine learning pipelines with Python

Iml

This talk will provide useful and practical understanding on how to build industry-ready machine learning frameworks in python through distributed, horizontally-scalable architectures. This talk will also go into detail on the motivations for such architectures, the technologies required, best practices, caveats and practical use-cases in industry.

# Industrial Machine Learning
This talk will provide useful and practical understanding on how to build industry-ready machine learning models in python through distributed, horizontally-scalable architectures. This talk will also go into detail on the motivations for such architectures, the technologies required, best practices, caveats and practical use-cases in industry. We will use a practical implementation of a distributed machine learning pipeline to process predictions of the most popular cryptocurrencies using celery (and rabbitmq) for the distributed processing, and Docker plus Kubernetes to manage the scalable infrastructure in AWS.

# Why
Industry-ready Machine Learning systems have to be bullet-proof. Some of the biggest challenges in Machine Learning involve the heavy RAM usage, varied machine learning model library, heavy computation, security, devops complexity, scalability, deployment and many many more. It is important to understand some of the key challenges that most large scale projects, startups and companies bump into when developing and expanding their Machine Learning capabilities, and what are some best practices, reliable frameworks and tips/tricks to address these.

# How
There are multiple ways to address the challenges that a fast-growing project, startup or company will face in their journey. Luckily there are several open source technologies to address these. Python of course comes with a massive library of machine learning toolboxes that allow us to benefit from the most brilliant minds contributing to top performing algorithms - scipy, sklearn, tensorflow, numpy, pandas are but a few key tools in your machine learning toolbox. For distributed computing, celery is certainly a great contender, which allows for easy creation of a manager server architecture with RabbitMQ. Docker allows us to containerise our applications to ensure they can be deployed in a consistent environment. Finally Kubernetes allows us to manage our DevOps infrastructure with quite a lot of the hard work managed automatically.

# Crypto
In this talk I will go into detail on an example using the price data of the top 100 most traded cryptocurrencies out there. After covering the theoretical reasoning behind the importance of the tools above, I will show a hands on deep dive building a surprisingly simple but impactful program that performs ML predictions of these cryptocurrencies with multiple models across a distributed architecture in multiple AWS nodes managed by Kubernetes.