Lecture: Teaching AI about Human Knowledge

Supervised learning is great — it’s data collection that’s broken

Most AI systems today rely on supervised learning: you provide labelled input and output pairs, and get a program that can perform analogous computation for new data. This allows an approach to software engineering Andrej Karpathy has termed "Software 2.0": programming by example data. This is the machine learning revolution that's already here, which we need to be careful to distinguish from more futuristic visions such as Artificial General Intelligence. If "Software 2.0" is driven by example data, how is that example data created – and how can we make that process better?

The talk is suitable for everyone with an interest in AI technology, from industry professionals to AI-curious Python developers with little or no previous Machine Learning experience. Ideas in this talk were developed while working on spaCy, our open-source library for Natural Language Processing in Python, as well as our new data annotation tool Prodigy.