Navigation auf uzh.ch

Suche

URPP Language and Space Language and Space Lab

The ArchiMob Corpus

The ArchiMob corpus represents German linguistic varieties spoken within the territory of Switzerland. This corpus is the first electronic resource containing long samples of transcribed text in Swiss German, intended for studying the spatial distribution of morphosyntactic features and for natural language processing.

This corpus is available under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Release 2 (2019)

The second new version of the ArchiMob corpus is now out featuring: 

  • Newly transcribed documents (9 more than in the first release)
  • Speech-to-text alignment at the level of utterance (4-10 seconds)
  • Improved normalisation
  • Improved part-of-speech tagging

 

You can find more information on new features of the corpus in the Release 2 notes.

Access 

 

Publications

Scherrer, Y., T. Samardžić, E. Glaser (2019). "Digitising Swiss German -- How to process and study a polycentric spoken language". Language Resources and Evaluation. (First online) 

Scherrer, Y., T. Samardžić, E. Glaser (2019). "ArchiMob: Ein multidialektales Korpus schweizerdeutscher Spontansprache". Linguistik Online98(5), 425-454. https://doi.org/10.13092/lo.98.5947

 

Release 1 (2016)

Details of the corpus composition, formatting, and annotation  can be found in the ArchiMob Release 1 Documentation.   

Access

Publications

Samardžić, T., Y. Scherrer, E. Glaser (2016) “ArchiMob - A Corpus of Spoken Swiss German”. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). Portorož, Slovenia.

 

Samardžić, T., Y. Scherrer, E. Glaser (2015) "Normalising orthographic and dialectal variants for the automatic processing of Swiss German", In Proceedings of the 7th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, Poznan, Poland.

DOI  https://doi.org/10.5281/zenodo.1158572

Map by Yves Scherrer

Weiterführende Informationen

Kaldi for Swiss German

Trained by Iuliia Nigmatulina and Tannon Kew on ArchiMob.

 

Iuliia's GitHub 

Tannon's GitHub

 

ArchiMob @ VarDial 2017-2019

Part of the ArchiMob Corpus was used for the German Dialect Identification (GDI) task of the VarDial evaluation campaign 2017-2019.

 

►  GDI data set download

 

VarDial workshops

 

Evaluation results:

 •   2019

 •   2018

 •   2017

Getting started with ANNIS and Sketch Engine

A short tutorial on searching the ArchiMob corpus online with corpus query engines

1. riliis vom archimob korpus

Read about the ArchiMob corpus in Swiss German.
 

ArchiMob Corpus in UZH News

UZH News article about the ArchiMob Corpus (17.2.2017)

ArchiMob on Mundartforum

The ArchiMob project on Mundartforum.ch

Digitised dialect maps

On this website you can find digitised maps of the Sprachatlas der deutschen Schweiz (SDS).