Lecture: Reproducible data science using virtual environments
Günther Doppelbauer, Clemens Zauchner Data Scientists @ The unbelievable Machine Company
One of the pillars of science is reproducibility. If a paper is published, peers need to be able to review and reenact the research and should come to the same conclusions. Data science can be defined as the application of the scientific method to business data. Regardless of the circumstances, if you try to answer the same question using the same techniques and data, your answer shouldn’t change over time. There are a number of ways to achieve reproducibility, some are organisational, some are more technical. In this talk, we want to present why Python’s virtual environments are useful to that end, how to set them up, and when to use the various containerisation / environment tools to foster reproducibility.
The talk is aimed at data scientists, data engineers, or developers who know data science workflows.
Type of talk
Blend of theory and hands-on.
The talk will give an overview on the topics and will include a hands-on demonstration of working with virtual environments.