Navigation auf uzh.ch
Description | Contact between speakers of two or more languages can leave traces in the linguistic record and reveal geographic areas of past human interaction. The sBayes algorithm finds these areas in cultural data. |
Publication | https://doi.org/10.1098/rsif.2020.1031 |
Code & Data |
Python package and case study:https://github.com/NicoNeureiter/sBayes |
Description | In this study, peoples with similar genes are found to also share similar grammar, though not necessarily similar words or sounds, suggesting that grammar may serve as a cultural marker of population connections beyond recent contact or descent. |
Publication | https://www.science.org/doi/10.1126/sciadv.abd9223 |
Code & Data |
case study: https://github.com/derpetermann/music_languages_genes |
Media |
UZH: https://www.news.uzh.ch/en/articles/2021/Grammar.html Video: https://youtu.be/bcE3-Xm9CIY |
Description | Phylogenetic trees show how languages have diversified over time but ignore a central aspect of language evolution — contact. The contacTrees model can infer a phylogenetic tree and contact events, where one language borrowed linguistic traits from another one. |
Publication | https://doi.org/10.1057/s41599-022-01211-7 |
Code & Data |
Beast 2 package and case study: https://github.com/NicoNeureiter/contacTrees |
Description | In this paper, a Bayesian clustering method from evolutionary biology was applied to Swiss German dialect data, revealing five distinct morphosyntactic populations that align with traditional dialect regions and supporting a gradual dialect continuum in Swiss German. |
Publication | https://doi.org/10.1017/jlg.2021.12 |
Media | Department of Geography, UZH https://www.geo.uzh.ch/en/news/papers/2022/2022-07-schweizerdeutsche-grammatik.html |
Description | Glottography is a geodata platform for mapping the world’s languages. Glottography represents the geographic locations of languages as polygons, along with relevant metadata, including Glottocodes that uniquely identify each language. |
Publication | Not yet published |
Code & Data | Data and web mapping service: https://github.com/Glottography |
Description | This study shows that Bayesian phylogeography struggles to reconstruct human migrations, such as relocations due to conflict, but effectively captures gradual language expansions, which produce distinct phylogenetic patterns that are absent when speakers migrate. |
Publication | https://doi.org/10.1098/rsos.201079 |
Code & Data | Case study, https://github.com/NicoNeureiter/drifting_into_nowhere/ and https://zenodo.org/records/4279082 |
Description | This study presents a Bayesian model to reconstruct the historical spread of phenomena with known distributions at two points in time, applied here to the spread of Indo-European languages in South America. The model infers possible evolutionary histories, offering a general approach for analysing diffusion processes from incomplete data. |
Publication | https://doi.org/10.4230/LIPIcs.GIScience.2023.71 |
Code & Data | R and C++ package and case study, https://github.com/takuya-tkhs/sBread |
Description | This study combines qualitative and quantitative methods to study linguistic area formation, showing that languages in Britain and Ireland exhibit significant linguistic similarity, regardless of ancestry, across space, time, and sociocultural settings. |
Publication | https://muse.jhu.edu/article/733280 |
Description | Why are near things more similar than distant ones? And why can this be a blessing and a nuisance? Three videos explain the first law of Geography. |
Media |
Video 1: https://youtu.be/6T1A4l0pcWE?si=b4_XLuANMNZV9bHX |
Description | The contacTrees model is a Bayesian phylogenetic method that incorporates language contact, addressing the limitations of traditional phylogenetic methods that assume languages evolve independently. By accounting for horizontal transfer, it improves the accuracy of reconstructing language family trees and contact events, offering a more nuanced approach to studying language and cultural evolution. |
Publication | https://www.nature.com/articles/s41599-022-01211-7 |
Code & Data |
Beast 2 package, https://github.com/NicoNeureiter/contacTrees case study, https://github.com/NicoNeureiter/contacTrees-IndoEuropean simulation study, https://github.com/NicoNeureiter/contacTrees-SimulationStudy |
Description | Today, over 7,000 languages are spoken worldwide, cataloged and partially described in linguistic resources like Glottolog and WALS. This study develops a quantifiable and reproducible method for describing languages using text data, offering new insights into mapping linguistic diversity. |
Publications |
https://aclanthology.org/2021.eacl-main.302/ https://aclanthology.org/2022.lrec-1.123/ https://aclanthology.org/2022.conll-1.18/ |
Code & Data |
TeDDi tools https://github.com/MorphDiv/TeDDi_sample Analysis of language spaces https://github.com/MorphDiv/transfer-lang Information theory measures over BPE merges |
Description | This project enhances neural sequence-to-sequence models for NLP preprocessing tasks by incorporating structural signals from multiple text layers, improving performance in tasks like machine translation and speech recognition. |
Publication | |
Code & Data |
Subword segmentation with synchronised decoding https://github.com/tatyana-ruzsics/uzh-corpuslab-morphological-segmentation Interpretable reinflection https://github.com/tatyana-ruzsics/interpretable-inflection |
Description | The ArchiMob corpus represents German varieties spoken on the territory of Switzerland. It is the first electronic resource containing long samples of transcribed text in Swiss German, intended to be used for studying spatial distribution of morphosyntactic features and for natural language processing. |
Publication |
https://link.springer.com/article/10.1007/s10579-019-09457-5 |
Code & Data |
Code: https://github.com/Christof93/archimob_tools https://github.com/yunigma/Kaldi-for-ASR-of-Swiss-German https://github.com/tannonk/two-headed-master https://github.com/tatyana-ruzsics/uzh-corpuslab-normalization https://github.com/tatyana-ruzsics/uzh-corpuslab-pos-normalization Data: |
Description | Resources created and shared through transnational cooperation started by the ReLDI institutional partnership (Link to SNSF data portal: https://data.snf.ch/grants/grant/160501). |
Publication | |
Code & Data |
Code: https://github.com/clarinsi/classla https://github.com/clarinsi/tweetcat Data: |
Description | The course, "Revisiting research training in linguistics: theory, logic, method," is part of a pilot program funded by Movetia and offered by the Universities of Zurich, Geneva, and Belgrade. It aims to enhance scientific and research skills, particularly for BA and MA students in linguistics and language-related fields, though it may also benefit a wider audience interested in these areas. |
Open EdX |
https://apps.elearn.mnf.uzh.ch/learning/course/course-v1:PHIL+Movetia101+2022/home |
Description |
This learning block is a guide on how to acquire the core set of notions in machine learning that are necessary for students of language or linguists who plan to work with engineers and scientists, or anyone with a similar background and interests.. Its intended use is supervised study, whereby a student learns actively under the supervision of a teacher. |
Access link |
Description | A gentle introduction to the process of analysing corpora, containing information on which South Slavic corpora are available on the CLARIN.SI repository, and how to find comparable corpora; how to explore corpora through the noSketchEngine and KonText concordancers; how to query the corpora using the CQL (Corpus Query Language) syntax; how to analyse gender marking in each South Slavic corpus and how the results can be interpreted to analyse gender bias in society. |
Materials | https://github.com/clarinsi/workshop_reg_mark |