Version 1.1

Workshop: Free Stuff For Devs

Use Images, Text, Webarchive and Catalogue Data from the Austrian National Library in Jupyter Notebooks

Onb custom logo

Do you want to analyse historical newspapers with Python? Does training your CNN on historical postcard images sound nifty to you? Do you want to search within the Austrian Webarchive from the comfort of your home? We got you covered!


We use prepared (and pre-shared) Jupyter Notebooks to illustrate:



  • The data the Austrian National Library has to offer (for free)

  • Which Python libraries make accessing and processing these data easier

  • Some example applications using these data within Jupyter


Individual participants are invited to either follow along the guided tour through some of the shared Notebooks with the rest, or they can work at their own pace through the provided material, asking questions as they arise.


We'll publish a requirements.txt and the selected Notebooks 1 week before the workshop, the slides 1 day before the workshop here:
https://labs.onb.ac.at/gitlab/labs-team/pydays19

Preliminary Rough Outline



  • Overview Workshop

  • Metadata & Catalogue

    • Overview data formats, container formats, protocols

    • Example SRU

    • Example data harvesting OAI-PMH

    • Example SPARQL



  • Images & Text

    • Overview IIIF

    • Overview OCR formats

    • Example download OCR text

    • Example download pre-resized images for machine learning

    • Example create IIIF collection from SPARQL query result



  • Webarchive

    • Overview Webarchive, API and content

    • Example Wayback search via API

    • Example full text search via API



Requirements for Participants



  • Laptop

  • Connectivity

  • Python 3

  • Working Jupyter Notebook installation

Material


We'll publish a requirements.txt and the selected Notebooks 1 week before the workshop, the slides 1 day before the workshop here:
https://labs.onb.ac.at/gitlab/labs-team/pydays2019

Language


Slides and Notebooks in English, Workshop in English (or German, if all participants prefer that)

Presenters



Georg Petz is the senior software developer of the Austrian National Library's R&D department. Stefan Karner is the software developer of the ONB Labs project.

Links


https://labs.onb.ac.at

Info

Tag: 03.05.2019
Anfang: 10:00
Dauer: 02:00
Raum: F4.07
Track: PyDays Workshops
Sprache: en

Links:

Feedback

Uns interessiert deine Meinung! Wie fandest du diese Veranstaltung?

Concurrent Events