Felső-Víziváros and Matthias Church seen from the Danube
Felső-Víziváros és Mátyás-templom látványa a Dunáról · Thaler Tamas, Wikimedia Commons, CC BY-SA 3.0

13th Web-as-Corpus Workshop

Workshop @ EMNLP 2026 · Budapest, Hungary · 24–29 October 2026

The 13th edition of the Web-as-Corpus (WaC-13) workshop will be co-located with EMNLP 2026 in Budapest, Hungary. The workshop is endorsed by the SIGWAC ACL Special Interest Group in Web-as-Corpus and Common Crawl.

Call for Papers

Budapest, Hungary
Budapest, Hungary · Wilfredor, Wikimedia Commons, CC BY-SA 3.0

The World Wide Web has evolved from a resource for building linguistic corpora into the central data infrastructure powering modern natural language processing and Large Language Models (LLMs). As web-scale data increasingly shapes AI systems' knowledge and capabilities, understanding its quality, representativeness, and ethical implications has become critical.

At the same time, the "more is better" paradigm is being challenged by issues such as machine-generated content, data toxicity, limited metadata, and the under-representation of many languages and domains. These challenges call for a shift toward Data-Centric AI, focusing on the curation, analysis, and responsible use of web-derived data.

WaC-13 provides a multidisciplinary forum for research addressing the full lifecycle of web data. We invite submissions on methods, resources, and applications related to web corpora, with special emphasis on multilingual data and less-resourced languages.

Topics of Interest

Submissions are invited on (but not limited to):

Important Dates

Submission

Note: All deadlines are 11:59PM "anywhere on Earth".

Following the ACL and ARR policies, there is no anonymity period requirement. Submissions should follow all of the ARR submission requirements.

Submissions will be possible through ARR commitment and through openreview.net (link to be shared soon).

Workshop Organizers

Nikola Ljubešić
Jožef Stefan Institute, Slovenia
Yves Scherrer
University of Oslo, Norway
Laurie Burchell
Common Crawl
Veronika Laippala
University of Turku, Finland
Pedro Ortiz Saurez
Common Crawl
Jen English
Common Crawl
Vuk Dinić
Jožef Stefan Institute, Slovenia