KTH SPEECH, MUSIC AND HEARING; CLARIN-SPEECH; and Swe-Clarin’s working group for speech
The Department of Speech, Music and Hearing at KTH is the host of the Clarin knowledge center CLARIN SPEECH, which focuses on the dissemination of information and research on speech, spoken language, and so-called speech technology - an umbrella term for various methods used to analyze, understand, and produce speech automatically or semi-automatically. The department is one of the oldest active speech research centres in the world and was founded in 1951 by the pioneer phonetician, Gunnar Fant. Fant remained an active researcher in the department up until his death in 2009. Within the Swedish part of Clarin (Swe-Clarin), KTH Speech, Music and Hearing; ISOF (Institute for Language and Folklore); and the National Archives comprise the working group for speech, which acts together to promote speech research.
THE TILLTAL PROJECT (2017-2020)
RE-SPEAKING to FACILITATE TRANSCRIPTION of DIFFICULT SPEECH and for TRANSLITERATION of TEXT
Within speech research, as in many other areas, large quantities of data are becoming increasingly important with each passing day. In related areas, such as language technology (which deals mainly with text), large and relatively well-organized resources are already available. Speech data is inherently much more complicated to structure and organize in such a way that it can comprise the object of research studies. Among other things, it is more diverse in itself, but it is also affected by everything in its environment, unlike text data. From a purely scientific perspective, text data is much more predictable than speech data. Roughly speaking, one can say that it costs between 100 and 1,000 times as much to collect a certain number of words in speech as it does in text. Fortunately, this can be changed. The more representative data we come across, the better we are able to build tools to handle this data. Tools which in part will simply help us carry out speech and speech technology research, but will also additionally improve and simplify the management and analysis of more speech data.
During 2015 and 2016, KTH Speech, Music and Hearing spent a considerable amount of time finding ways and solutions to improve the availability of Swedish speech data. These efforts have taken place both as a part of SweClarin, and partly in collaboration with VINNOVA (Sweden’s innovation agency) and with PTS (The Swedish Post and Telecom Authority) which was given a government commission with the aim of promoting Swedish speech technology for accessibility and for industry. The result is a series of projects and proposals all of which have an element of data collection and technology for accessibility. In order to reduce the initial costs one objective has been to prioritize projects that have more positive outcomes than solely the creation and accessibility of speech data.
One of the projects we are working on to get started is to use spoken input as an aid in the transcription of speech (e.g. speech of poor recording quality) and the transliteration of text (such as handwriting). The basic idea is that the transcriber first reads the text or repeats the speech using a high-quality microphone. A speech recognition system is then used that is familiar with the speaker's voice and the recording environment. This ensures recognition results having a relatively high percentage of correct words. Finally, the automatically recognized text is corrected manually.
Initial tests indicate that this method is also an improvement for the transcriber. It is both faster and more ergonomical. In other words, one can improve both the working conditions and the effectiveness of those currently performing this type of work. From a speech technology perspective, the method will also result in a large number of recordings of known text (in that the automatic speech recognition is manually corrected), which can comprise a very important basic resource in the area.
At this writing we are currently deliberating with PTS, the National Swedish Archives, The National Library of Sweden, ISOF, Uppsala University Library and developers of speech technology applications about how best to proceed.