Lecture: Analyzing GitHub, how developers change programming languages over time
Have you ever been struggling with an nth obscure project, thinking : “I could do the job with this language but why not switch to another one which would be more enjoyable to work with” ? Based on GitHub repositories, it is possible to build a transition matrix by solving the flow optimization problem. The results reflect the history of programming language competition in the open source world.
This project starts with Erik Bernhardsson's blog post :
The eigenvector of "Why we moved from language X to language Y where the data scientist generated an N*N contingency table of all Google queries related to changing languages. However, what is the proportion of people who effectively switched ? Indeed, it is possible to deepen this idea and see how the popularity of languages changes among GitHub users.
The open dataset used for this project includes the metadata of:
- 4.5 Million GitHub users
- 393 different languages
- 10 TB of source code in total