One of Swe-Clarins centres is located at the Language Council at the Institute for Language and Folklore (ISOF). In the Institute's collections are tens of thousands of hours of recorded speech. Currently, the large amounts of recorded speech available on ISOF and other Swedish memory institutions are rarely used because due to the lack of effective methods for handling archival sounding material.
In the project Tilltal, we examine how speech technology methods can make the speech recordings more accessible to researchers, in collaboration with KTH and Digisam. The project uses speech recordings in the Institute's archives and explores how speech technology methods and tools can be adapted and developed to process large amounts of historical voice recordings.
The oldest recordings in the institution's collections are from the 1890s and recording technology has changed several times since then. This photo documents one of the many interviews in the 1960s with Swedish immigrants in Chicago (archive record ULMA En-Am 1962: 1: 77).
The Swedish Foundation for Humanities and Social Sciences has decided to provide SEK 9.7 million in grants for the project from 2017 to 2020. The project is a collaboration between ISOF, the Royal Institute of Technology (KTH) and Digisam. From the Institute, both language technologists and researchers in dialects and folklore participates in the project.
Three studies in different research areas
The project includes three sub-studies and a user study. The three sub-studies examine how speech technology can be used to investigate research questions in various disciplines (ethnology, linguistics and conversation research). The user study use activity theory to examine research activities surrounding the archival materials. Considering the needs of the researchers, we will propose language technology solutions and assess their usefulness in practice will be assessed by means of use cases.
From experience stories of cultural heritage
The first sub-study is the examination of Karl Gösta Gilstring’s collection at the Archives for Dialect and Folklore Uppsala. The collection consists of tens of thousands of letters and records, as well as 250 hours of recorded interviews. Language technology methods will be used to handle the large and varied material. It is hoped that the various categories of materials can be combined so that, for example, links can be made from an interview situation to when the same subject or story is mentioned in a letter or other written material.
Karl Gösta Gilstring recording an interview with Judith Johansson.
Linguistic variation in time and space
The aim of this study is to develop new approaches to language variation and change in speech materials. In the study, investigations of linguistic variation will be conducted at different levels: phonetic/phonological, prosodic and syntactic. Previous studies in this area are important starting points, and the use of speech technology methods for the same type of research will give us an idea of what these methods can bring. An example is that of using speech technology methods to automatically find and collect all the instances of a particular sound. The methods will be adapted and developed to analyze spoken language with dialectal character.
Interaction Patterns in space and time
Speech technology has created models for answering research questions in conversation research: how speakers take turns, how feedback works, how backing up or questioning is expressed, and how common understanding is reached. The third sub-study aims to develop such an interaction model for material that consists of various types of conversations, and comparable materials from different times. This will result in a description of the similarities and differences between types (synchronous comparisons) and time (diachronic comparisons; language development) from an interaction perspective.