Navigation auf uzh.ch

Suche

URPP Language and Space Language and Space Lab

Swisscom Dictionary of spoken and written Swiss German

The dictionary described here maps Standard German words to Swiss German pronunciations and spontaneous writings. It includes a total of 11'248 standard German words and their representations in six Swiss regional varieties:  Zurich, Basel, Bern, Visp, and Stans. Each regional variety is represented in two ways: a) as it is pronounced (SAMPA annotation)  and b) as it is typically written in a non-standard, spontaneous fashion. The non-standard writing is partly generated manually by native speakers and  partly automatically (using character-level sequence-to-sequence methods).


This dictionary was compiled within a research service provided from the University Research Priority Program (URPP) ’Language and Space’ to the Swisscom Company. Contributors from the University were three students Raphael Tandler, Alina Mächler, Larissa Schmidt. They were under the supervision of  Dr. Tanja Samardžić (Language and Space Lab, Text Group Leader). Collaborators from Swisscom AG were Lucy Linder, Sandra Djambazovska, Alexandros Lazaridis, supervised by Dr. Claudio Musat (Director of Research, Data, Analytics & AI). 

This corpus is available upon request under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Release 1 (2020)

Details of the dictionary's composition, formatting, and annotation can be found in the following two reports:

1st Report, Oct. 2018 - Jan. 2019: Mapping Standard German to Swiss German Pronunciations (PDF, 3 MB)
2nd Report, Feb. – July 2019: Mapping Swiss German pronunciations with spontaneous writings (PDF, 208 KB)

Access

The data set is distributed by Swisscom AG (contact details on the right).

Publications

Schmidt, Larissa, Linder, Lucy, Djambazovska, Sandra, Lazaridis, Alexandros, Samardžić, Tanja, Musat, Claudiu (forthcoming): "A Swiss German Dictionary: Variation in Speech and Writing", In Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020). Marseille, France.

 

Map by Larissa Schmidt and Yves Scherrer

Weiterführende Informationen

Contact for data distribution

Dan Tomozei

Swisscom AG
Director of Research for
Data, Analytics & AI

Dan-Cristian.Tomozei@swisscom.com

Contact for methods and technical issues

Tanja Samardžić

URPP Language and Space
Text Group Leader
tanja.samardzic@uzh.ch