A number of scientists in the field of speech technology have envisioned the development of a procedure and infrastructure that can support historians, linguists and social scientists with creating transcripts of their interviews. With the support of the funding program <https://www.clarin.eu/content/factsheet-clarin-plus> CLARIN we have organized a number of workshops to explore how to realize the envisioned resource. You have been identified as a scholar who might be interested in collaborating with us, and therefore you are receiving this invitation for a first workshop that will be held in Arezzo, Italy on 10-12 May 2017 (Department of Education, human sciences and intercultural communication – Siena University, Campus ‘Il Pionta’). We would be really pleased if you are willing to contribute to the development of such a resource. It speaks for itself that CLARIN will pay for your travel and accommodation costs.
The approach that we envision is to start with three languages, English, Italian and Dutch, and to split the entire process up into three building blocks. The first entails the conversion of analog into digital recordings, (the digitization), the second consists of using automatic speech recognition to create automatic transcriptions with an inevitable amount of errors, the third deals with organising the manual correction of these errors via crowdsourcing platforms.
At present we are developing modules to further develop these three building blocks, and we would very much welcome feedback and comments on the feasibility of our approach. In other words, there is some preparatory work that has to be done prior to the workshop. This consists in trying out some tools and reviewing a number of documents that we will send you. Our estimate is that this will take 5-10 hours of your time. Given your expertise in Oral History we would like to ask you for your input on the module ‘Transcription Guidelines’ (under the responsibility of Stef Scagliola). No specific technical knowhow is required to work with the modules. We will give instructions on how to use them, and refine the instructions upon your feedback if needed.
An outline of the provisional agenda of the workshop is shown below. It also shows the involvement we expect from our participants during and before the workshop:
Oral History & Technology Arezzo workshop May, 10-12
Day 1
Information exchange, exploring the options and alternatives
Morning
travel time
14:00
Welcome
14:15
Overview of the workshop
General overview of the envisaged transcription building blocks
Presentations about suitable AD-conversion (tools)
Presentations about ASR (tools) (Full recognition & Alignment)
Participants are asked to share their experiences with using the tools
17:00
Hand-on experience: “bring your own data” and together we use the software (ASR, Alignment, and AD/DD-conversion)
19:00
Dinner
Day 2
Information exchange, settling the building blocks
9:15
Summary of day 1 and overview of day 2
Presentations about manual transcription correction services
Presentations about Crowdsourcing strategies and platforms
Transcription guidelines
Participants are asked to share their experiences with using the tools
12:00
Hand-on experience
13:00
Lunch
14:00
Data management and metadata
Presentation on data management in NL, UK, IT ((persistent) archiving options)
17:00
Hand-on experience
19:00
Dinner
Day 3
Proposal preparation
9:15
Wrapping up:
* Which improvements are needed for the documentation on the various topics
* Which software improvements are needed and should be included in the implementation plan
Plenary: concluding actions for finalising the implementation proposal
Setup of the time schedules for the next months
Plan for a publication
13:00
Lunch
14:00
Adjourn
We will pay for accommodation, dinners and lunch breaks, and we can remunerate your travel costs (up to €100).
We look forward to your response and hope you will join us in making this project a success. Upon accepting our invitation we will send you further information as to exact venue, travel information and accommodation, and about the further preparations for the workshop.
The organizers,
Henk van den Heuvel; CLST Radboud University
Arjan van Hessen; Utrecht University / University of Twente
Silvia Calamai; DSFUCI, University of Siena
Louise Corti; UK Data Archive, University of Essex
Stef Scagliola; University of Luxembourg
Martin Wynne; IT Services, University of Oxford