The second national Swe-Clarin workshop: Research collaborations for the digital age

Du är här

Hem / The second national Swe-Clarin workshop: Research collaborations for the digital age
2016-11-16 10:30 till 16:45

NB: Program change

There has been a change in schedule to accomodate participants flying into Umeå in the morning.

The workshop now starts with coffee at 10.30 and goes on until 16.45.


The workshop will be held in Sliperiet.


Please register for the workshop using the SLTC registration form.

Workshop program

10.30–11.00 Coffee/Tea
11.00–11.15 Opening
11.15–12.00 Keynote presentation 1 —

BiographyNet: Text mining for enhancing historical research on bibliographical data / Antske Fokkens (VU Amsterdam)

BiographyNet is an interdisciplinary project that aims at enhancing the potential of bibliographical data for historical research. This is done by applying deep semantic analysis to a collection of biographical dictionaries, representing the outcome of this analysis in RDF and providing it to historians through a user interface. In this talk, I will focus on the methodological aspects of the project and, in particular, the importance of registring provenance.

12.00–13.30 Lunch
13,30–14.15 Keynote presentation 2 —

Language technology in the service of the humanities / Eetu Mäkelä (Aalto University)

Scholarship in the humanities is a particular beast, providing interesting constraints to anyone seeking to apply their tools there. In this talk, I will go through multiple projects where I’ve integrated language technology (ranging from NER through dialectical morphological analysis to OCR error handling) as part of technological support for a larger humanities end goal. I will particularly focus on the demands these scenarios have placed on the tools used, and the often surprising implications this has for what makes a tool good.

14.15–15.00 Poster session 1 —

Anonymization of personal stories / Lars Ahrenberg, Niklas Blomstrand, Marinette Fogde and Andreas Nilsson / abstract

Investigating public discourse with Swe-Clarin / Lars Borin, Markus Forsberg, Richard Johansson, Tomas Kosiński and Jon Viklund / abstract

“Reuse” of biblical quotes in Swedish 19th century fiction / Dimitrios Kokkinakis and Mats Malm / abstract

The Uppsala corpus of student writings – corpus creation, annotation, and analysis / Beáta Megyesi, Jesper Näsman and Anne Palmér / abstract

Instant Swedish dialect maps / Robert Östling and Mats Wirén / abstract

15.00–15.30 Coffee/Tea
15.30–16.15 Poster session 2 —

The project TillTal: Making spoken cultural heritage accessible for research / Johanna Berg, Rickard Domeij, Jens Edlund, Gunnar Eriksson, David House, Zofia Malisz, Susanne Nylund Skog, Jenny Öqvist / abstract

Introducing SAPIS – an API service for text analysis and simplification / Daniel Fahlborg and Evelina Rennes / abstract

Swe-Clarin research collaborations at the Humanities Lab, Lund University / Johan Frid / abstract

Constructing a corpus of August Strindberg’s collected works / Mats Wirén, Kristina Nilsson Björkenstam, Gintarė Grigonytė and Sofia Gustafson Capková / abstract

16.15–16.45 General discussion and closing of the workshop


Background and aims of the workshop

CLARIN (Common Language Resources and Technology Infrastructure; <>) was established by the European Commision as an ERIC (European Research Infrastructure Consortium) in 2012. Sweden joined the CLARIN ERIC in October 2014, as the first new member since the establishment of the ERIC, which now has 20 members (countries and one NGO). The Swe-Clarin consortium has a broad national membership – the 9 institutions represented in the workshop program committee – providing a wide expertise to this e-science infrastructure.

CLARIN aims at making language-based material available as primary research data to the humanities and social sciences (HSS) research communities with the help of the sophisticated language and speech processing tools that have been developed over many years through research in language technology (LT), and taking advantage of the fact that increasing amounts of text and speech material – including historical material – are available in digital form, thus allowing for the utilization of unprecedented volumes of text and speech data in HSS research. The expectation is that this LT-based e-HSS paradigm will lead to completely new kinds of research as well as to new ways of addressing old research questions.

Following the first succesful workshop held at SLTC in Uppsala in 2014, the second national Swe-Clarin workshop aims to present current projects in which LT and HSS research unite in producing interesting results through novel methods, to bring together research in search of LT solutions with LT solutions in search of research questions, and offer space for discussing how the research challenges of HSS in the digital age can be tackled within the e-HSS paradigm.

This full-day workshop will feature three presentations by invited speakers, two poster/demo sessions, and a concluding general discussion.

Note that on the day preceding the workshop (15th November), Swe-Clarin will organize a user day under the umbrella Swe-Clarin on tour, focusing on the Swedish Government Official Reports (Statens offentliga utredningar, SOU), in the version digitized by the National Library of Sweden, comprising more than 400 million words covering the years 1922–1998. If you are interested in participating in this event, please inquire at <info ä>.

Submission of proposals

We invite proposals for poster presentations and/or demos. Submissions should be in the form of an abstract (not anonymous; up to 1 page of text, plus an optional second page for references and/or illustrations). The abstracts must be submitted through the workshop EasyChair submission page: <>

We welcome submissions describing completed, ongoing, and planned e-HSS work – crucially involving the use of LT and language resources – on (but not limited to):

  • specific e-HSS projects
  • digitization efforts involving intangible cultural heritage
  • concrete efforts as well as general methodology development aiming at adaptation of LT and language resources to, e.g., historical or non-standard language varieties and genres (e.g., social media)
  • multilingual aspects of LT-based e-HSS
  • efforts to create workflows and effective user interfaces
  • research questions relevant to the LT-based e-HSS paradigm
  • how to reconcile large-scale quantitative and "close-reading" qualitative research methods

Important dates

  • First call for papers/Submission opens: 17th September 2016
  • Early submission deadline: 16th October 2016
  • Notification of acceptance of early submissions: 19th October 2016
  • Late submission deadline: 31st October 2016
  • Notification of acceptance of late submissions: 3rd November 2016
  • Workshop: 16th November 2016, 9.30am–4pm (09h30–16h00)

Proposals submitted no later than Sunday 16th October will receive notification by 19th October, two days before the SLTC early-bird registration deadline. We will continue to accept submissions until the second and final deadline (Monday 31st October). Note that participation in the Swe-Clarin workshop is free of charge but requires registration. Please use the SLTC registration page to register.

Invited speakers

  • Antske Fokkens, VU University Amsterdam
  • Eetu Mäkelä, Aalto University of Technology

Workshop organizers

  • Lars Borin, Språkbanken, University of Gothenburg (contact person): <lars.borin ä>
  • Nina Tahmasebi, Språkbanken, University of Gothenburg
  • Elena Volodina, Språkbanken, University of Gothenburg

Program committee

  • Lars Ahrenberg, Linköping University
  • Johanna Berg, Digisam
  • Lars Borin, Språkbanken, University of Gothenburg
  • Rickard Domeij, the Swedish Language Council
  • Marianne Gullberg, Lund University
  • David House, KTH
  • Daniel Knezević, SND, University of Gothenburg
  • Joakim Nivre, Uppsala University
  • Mats Wirén, Stockholm University