Data-driven Approaches to Ancient Languages (DAAL), Ghent University, June 27, 2024
Premodern or historically attested languages are invaluable resources of both the study of diachronic linguistics and their contemporary culture. Although these languages might be from various language families or have a different script, researchers face common challenges, among which illegible or lost text (parts), inexistent gold standards and, very important these days, scarcity of data. Luckily, more and more texts become available, but the language of those texts might be so different from their modern pendant — should that modern pendant exist — that it considerably impacts the performance of existing tools. This workshop aims to provide a platform to a broad field of researchers engaged in digital approaches to pre-modern languages.
Relevant topics for the workshop include, but are not limited to:
- Transcription
- Transliteration
- Unicode
- Tokenisation
- Morphological analysis & Part-of-Speech Tagging
- Syntactic or semantic analysis of textual data
- Lemmatisation
- Data-driven approaches to ancient languages
- Uncertain readings & ambiguity
- Language variation (diachronic, synchronic, …)
- Data Quality & Data Veracity
- Data science approaches for text analysis
- Multilingual code-switching
- Creation and annotation of digital resources
- Machine Learning for Lower-Resourced Languages
- Machine learning approaches for text analysis
- Evaluation of NLP tools
- Editorial choices (word and sentence splitting)
- Inclusion of apparatus
- Incomplete or damaged texts
We welcome two types of submissions:
- Long papers: description of original and unpublished work in any topic area of the workshop. A long paper is limited to 8 pages for content, with unlimited number of pages for references.
- Short papers: description of either work in progress or a focussed contribution (point that can be made in a few pages with sufficient level of detail). Short papers can also be in the style of a position paper that surveys and criticizes existing literature. Submissions of this type are limited to 4 pages for content, with unlimited number of pages for references.