Version 1.0

Lecture: Unicode

Or why py3k was necessary


Unicode is important. We want to enable people to use characters of all scripts within a single string of text. Without any doubt, this is a tough challenge and 30 years later we still experience issues. Python used a backwards incompatible change to review its Unicode model. In this talk, I want to discuss this model and raise awareness for Unicode issues.

Unicode was introduced to encode all scripts in modern use in a single 16-bit character model. However, this limitation was found broken soon and Unicode includes a lot more characters these days than the ones in modern use. Today, UTF-8 is the most popular Unicode character encoding and used on the majority of websites in the WWW.

With Python release 2, Guido van Rossum recognized the importance of Unicode to include programmers of all cultures and scripts. However, because of a backwards-compatible API, stdlib was often troublesome. Python 3 was a backwards-incompatible change and the Unicode model was completely refactored. Today, Python can be proud of its very good Unicode support.

In this talk, I want to summarize python's development and issues in the context of Unicode. I want to show general issues and possible solutions.


Day: 2018-05-05
Start time: 11:30
Duration: 00:30
Room: F0.01
Track: PyDays
Language: en



Click here to let us know how you liked this event.

Concurrent Events