Lecture: How To: Big Data Pipeline
Using Open Source Software and Exclusively European Servers
Big Data is not only a buzzword, it's also reality for many companies nowadays - from predicting customer behaviour to analysing log events, Big Data usually means having the right pipeline that involves costly subscriptions to tools and cloud services where the data resides on US servers. This talk shows an architecture example for a Big Data pipeline that runs on open source tools and European-only servers.
A Big Data pipeline recipe requires the following ingredients:
- actual Big Data
- a data warehouse
- ETL tools
- analysis & visualisation tools
I will bring examples of projects I worked with in the past years as well as tips on how to design the data models and infrastructure needed. The architecture example features exclusively open source software running on either home-grown clouds or European cloud services instead of AWS and Google Cloud - including options of how to deal with (and in some parts embrace) their limitations.