The Swe-Clarin centre at Linköping University (LiU) is hosted by NLPLab, Department of Computer and Information Science. In this bulletin we would like to tell you something about two Swe-Clarin-related activities which are aimed at the creation of new infrastructure for e-HSS research.
Digitization of personal stories
Arbetets Museum (The Museum of Work) in Norrköping has in its archive more than 2600 personal stories from individuals about work and everyday life. The archive of the museum holds documentations of working life, depictions of life in industrialized societies, on housework in the 1950s and 1960s, and professional memories of miners, social workers, retail employees, and many other professions. Efforts to collect stories are ongoing and take place in various socially relevant areas.
The main target group are scientists, but a selection of stories are presented in anthologies, in exhibitions and on the museum's website. In these cases the informants have given their consent to publication. Many older stories of the archive do not have consents which has so far made it impossible to make them public. The most appropriate channel to make the reports public would be the Internet. This would also make them searchable in a much more flexible manner than has been possible so far. But the publication of the personal stories via the Internet must ensure that the stories are sufficiently anonymized so that the authors and people referenced in them cannot not be identified.
The LiU Swe-Clarin centre and the Museum of Work have met in 2016 to discuss the possibilities of developing a computerized system for the anonymisation of the reports. We believe Swe-Clarin's tools and various known techniques for name recognition would be key components in such a system. The system must obviously be interactive and support a natural work flow at the museum. Ultimately, it is the museum that guarantees that publication does not violate Swedish law.
The ultimate goal of the collaboration is to develop a complete workflow for publishing the museum's personal stories that could also be adapted to the needs of other archival institutions. The reports that are now stored as separate files in different formats will be published on the Internet and made searchable. A concrete result of the work so far is an application to an external funding agency for the development of a workflow support system for anonymization, markup and digitization of personal stories. Such stories are part of the Swedish cultural heritage. They reflect their times in many ways: historically, linguistically and culturally. They are therefore of interest to researchers with different specializations: ethnologists, historians, and linguists.
Infrastructure for readability and understandability
In several projects we have work on issues relating to digital inclusion, the right of everyone to gain access to digitized data. In connection with this research we have collected texts from various sources, primarily from the Web, and evaluated their readability with both automatic metrics and reader reviews. The automatic metrics can be based on simple features such as word and sentence length or on syntactic and lexical analysis of the text. They are implemented in a system that quickly provides data on characteristics of the text that contribute to its complexity (see the figure).
The collected texts, often in pairs of comparable versions on the same topic, contributes to a research infrastructure for readability. The need is greater than that, however. Properties such as readability and comprehensibility are elusive, and the availability of empirical data is limited. For this reason we are organizing a workshop together with the Swedish Language Council focusing on infrastructures for the study of understandability. It will be held in Linköping on November 30, 2016. There we will take stock of ongoing research and infrastructure initiatives in this area with a focus on Swedish and the communication of government authorities with citizens. The purpose of the workshop is to try to map out a way forward to create the infrastructure for the future of research and education in this area. Information about the workshop are available here (in Swedish).