Named Speaker Identification

Named speaker identification tries to answer a very simple question: who spoke when?

Such a simple question for more than 3 years of work. You'll find below all my papers (french and english) on the subject and the final version of my PhD thesis (in french).

Abstract

The automatic processing of speech is an area that encompasses a large number of works: speaker recognition, named entities detection or transcription of the audio signal into words. Automatic speech processing techniques can extract number of information from audio documents (meetings, shows, etc..) such as transcription, some annotations (the type of show, the places listed, etc..) or even information concerning speakers (speaker change, gender of speaker). All this information can be exploited by automatic indexing techniques that will allow indexing of large document collections.

The work presented in this thesis are about the automatic indexing of speakers in french audio documents. Especially we try to identify the various contributions of the speakers and nominate them by their first and last name. This process is known as named identification of the speaker. The particularity of this work lies in the joint use of audio and its transcript to name the speakers of a document. The first and last name of each speaker is extracted from the document itself (from its rich transcription to be precised), before being assigned to one of the speakers of the document.

We start by describing the context and previous work on the speaker named identification process before submitting Milesin, the system developed during this thesis. The contribution of this work lies firstly in the use of an automatic detector of named entities (LIA_NE) to extract the first name / last name of the transcript. Afterwards, they rely on the theory of belief functions to perform the assignment to the speakers of the document and thus take into account the various conflicts that may arise. Finally, an optimal assignment algorithm is proposed.

This system gives an error rate between 12% and 20% on reference transcripts (done manually) based on the corpus used. We then present the advances and limitations highlighted by this work. We propose an initial study of the impact of the use of fully automatic transcriptions on Milesin.

Bibliography - 10 articles

Jousse V. Identification nommée du locuteur : exploitation conjointe du signal sonore et de sa transcription. PhD thesis. 4th of May 2011. Source code available on github. Download as PDF
Estève Y, Deléglise P, Meignier S, Petitrenaud S, Schwenk H, Barrault L, Bougares F, Dufour R, Jousse V, Laurent A, Rousseau A . Some recent research work at LIUM based on the use of CMU Sphinx. CMU SPUD Workshop, Dallas(Texas), march 13, 2010. Short article, 6 pages. Download as PDF
Petitrenaud S, Jousse V, Meignier S, Estève Y . Speaker identification using belief functions. Information Processing and Management of Uncertainty (IPMU'10), Dortmund(Germany), 28 june- 2 july 2010. Short article, Part I, pp. 179-188. Download as PDF
Petitrenaud S, Jousse V, Meignier S, Estève Y . Reconnaissance Automatique de Locuteurs à l'aide de Fonctions de Croyance. 17e congrès francophone Reconnaissance des Formes et Intelligence Artificielle (RFIA'10), Caen(France), 20-22 janvier 2010. Short article, 7 pages. Download as PDF
Dufour R, Jousse V, Estève Y, Béchet F, Linarès G . Spontaneous Speech Characterization and Detection in Large Audio Database. 13-th International Conference on Speech and Computer (SPECOM 2009), St Petersburg(Russia), 21-25 june 2009. Short article, 6 pages. Download as PDF
Jousse V, Petitrenaud S, Meignier S, Estève Y, Jacquin C . Automatic named identification of speakers using diarization and ASR systems. ICASSP 2009, Taïpei(Taïwan), 19-24 april 2009. Short article, pp. 4557 - 4560. Download as PDF
Jousse V, Meignier S, Jacquin C, Petitrenaud S, Estève Y, Daille B . Analyse conjointe du signal sonore et de sa transcription pour l'identification nommée de locuteur. In Traitement automatique des langues, 50(1), edited by ATALA: Association pour le Traitement Automatique des Langues, 2009. Long article, pp. 201-225. Download as PDF
Jousse V, Estève Y, Béchet F, Bazillon T, Linarès G . Caractérisation et détection de parole spontanée dans de larges collections de documents audio. JEP/TALN 2008, Avignon(France), 9-13 juin 2008. Short article, 4 pages. Download as PDF
Jousse V, Jacquin C, Meignier S, Estève Y, Daille B . Etude pour l'amélioration d'un système d'identification nommée du locuteur. JEP/TALN 2008, Avignon(France), 9-13 juin 2008. Short article, 4 pages. Download as PDF
Bazillon T, Jousse V, Béchet F, Estève Y, Linarès G, Luzzati D . La parole spontanée : transcription et traitement. In Traitement Automatique des Langues, 49(3), edited by ATALA: Association pour le Traitement Automatique des Langues, 200. Article long, pp. 47-76. Download as PDF