WO2007110551A1

WO2007110551A1 - System for hearing-impaired people

Info

Publication number: WO2007110551A1
Application number: PCT/FR2007/051031
Authority: WO
Inventors: Panagiotis Pavlopoulos; Samuel Deberles; Konstantin-Léo PAVLOPOULOS
Original assignee: Panagiotis Pavlopoulos; Samuel Deberles; Pavlopoulos Konstantin-Leo
Priority date: 2006-03-28
Filing date: 2007-03-28
Publication date: 2007-10-04
Also published as: FR2899097A1; FR2899097B1; EP1998729A1

Abstract

The present invention relates to a system (1) for helping hearing-impaired people, comprising; a device (30) for capturing sounds emitted by a speaker addressing the person, a head-up display device (10), a processing system (20) for analyzing in real time sound data transmitted by the acquisition device and transmitting to the display device an at least partial phonetic transcription of this sound data, to be displayed in the field of vision of the person, so that he can observe both the movement of the lips and/or the movements of the speaker and the phonetic transcription.

Description

Hearing impaired system

The present invention relates to hearing aid systems for persons with hearing loss and especially those which can advantageously come in addition to the medical aids already provided to these persons (prostheses, implants.) Which, sometimes, do not allow a complete reception of speech.

It is established that the perception of facial expressions greatly increases the comprehension and the learning of the oral language.

The understanding of speech increases with lab reading in cochlear implanted patients from 45% to 85% in one month of rehabilitation to reach almost 100% after one year, according to the publication S. Lagleyre, ENT Department, Hospital Purpan, "Role of the visuo-auditory integration in speech understanding in deaf subjects with cochlear implants ", 6th Annual Meeting of the International Multisensory Research Forum, June 5-8, 2005 - University of Trento, Department of Cognitive Sciences and Education, Trento, Italy. For the deaf, the lip reading is not enough to have a visual representation of the whole phono logical system of the French, because there are 36 sounds to which correspond only 12 labial images. For example {pa}, {ba}, {ma} have the same lip image.

Adults who have become deaf know the language and with their auditory memory can mentally supplement the uncertainties of receiving the speech of their interlocutor. For young deaf children who have not acquired language and who do not have a phonological model, lip reading is a very random exercise.

To address this problem, Completed Speech Language (LPC) associates five hand positions with the face to distinguish vowels and eight finger patterns to discriminate consonants. These gestures make it possible to eliminate ambiguities due to labial look-alikes and provide a means for the deaf, and especially children, to apprehend the French language by sight, as the hearer receives it by rouie. For example, the three labial look-alikes {pa} {ba} {ma} correspond to three different keys of the LPC. Like sign language, the alphabet of kinemes assisted, the French signed, the LPC imposes an apprenticeship that is not trivial, both on the part of the deaf person and his private, professional or administrative entourage. The implication of an apprenticeship risks rejecting the hearing impaired from the rest of the well-hearing. According to Mr. Molander, "Experiment with asynchrony in multimodal speech communication", Master thesis, Department of Speech, Music and Hearing (TMH) at the

Royal Institute of Technology (KTH), Stockholm, Sweden, June 2003, the gap between the perception of visual and auditory information should not exceed 100 ms, at the risk of impairing the understanding of speech and producing sensory illusions.

US Pat. No. 5,029,216 discloses an aid system comprising spectacles intended to be worn by the hearing-impaired person, provided with microphones and a processing system making it possible to indicate to the person the direction from which the sounds are heard. captured emanate. The processing system is also arranged to indicate the intensity of the sounds emitted.

US Patent 6,975,991 discloses a help system for enabling a hearing-impaired person to receive information regarding the location of a speaker in an assembly.

The patent application US 2002/0101537 discloses an assistance system comprising glasses intended to be worn by the hearing-impaired person, and a processing system for displaying on these subtitle glasses corresponding for example to a television program watched. by the person with hearing loss.

Patent Application JP 08-160366 discloses a similar aid system.

The use in these last two requests of a speech recognition or dictation system to "subtitle" the audio-visual scene has the disadvantage of using the context of the complete sentence to reconstruct its grammar and its content. syntax, resulting in a significant and variable delay of the written text compared to the audio-visual scene and the written phase may include errors. In the latter case, the listener will have to go back to a phonetic representation of the sentence to understand it. As a result, such a system is unsuited to the exploitation of lip reading for the reception of the message and therefore to the appropriation of the oral language and then written. In addition, this system requires the person to be able to read, which is not the case for young children.

In addition, the aid system according to the patent application US 2002/0101537 is not intended to help a hearing-impaired person face any interlocutor that may be in a relatively noisy environment and initiate a spontaneous dialogue. No. 4,972,486 discloses a device for transcription into visual symbols of sound information, in which each symbol is associated with a particular group of consonants.

Finally, nothing is planned to indicate to the person with hearing loss a danger or an alert, in the event of the issuance of a fire alarm or a cry of a baby, for example.

It was proposed in the article "Accessibility of the deaf to the means of audio-visual communication by Spoken Complete Language, accompanied by subtitling", to order a virtual hand coding in LPC superimposed either on a video image, or on a head of synthesis 3D. This solution requires the learning of LPC coding and involves an image inlay or a display on a video screen large enough to be visible to students in a classroom, for example.

There is a need to further improve the hearing aid systems for the hearing impaired and, in particular, to enable them to benefit from a system of assistance with lip reading which aims at the acquisition and transmission of the oral language and the understanding of any interlocutor in an environment that can be relatively noisy, if any.

There is a need to improve the reception of the oral message by the deaf child by a help with lip reading, and the acquisition and transmission of the oral language, in order to give young deaf people an easier time using lip reading. and to exercise a mental substitute.

There is still a need to help a hearing person communicate with the deaf person without requiring the hearing person to develop specific skills to communicate with the deaf person.

There is also a need for a help system that can help people with hearing loss in their daily lives to warn them of hazards, for example.

The invention aims to meet all or part of these needs.

The subject of the invention is a hearing aid system, comprising: a device for acquiring sounds emitted by an interlocutor of the hearing-impaired person, which can be with or without integrated pretreatment in order to take account of the environment sound to reduce background noise, a head-up display device,

a processing system for analyzing, in real time, sound data transmitted by the acquisition device and for transmitting to the display device at least a partial phonetic transcription of these sound data, to be displayed in the person's field of vision, in such a way as to to allow him to observe both the movement of the lips and / or the gestures of the interlocutor, in particular facial, and the phonetic transcription.

"Head-up display device" means a device having at least one transparent surface through which the hearing-impaired person can observe, and on or in the vicinity of which the information may be displayed, to enable the person to observe both the displayed information and the scene behind the transparent surface. The latter can be defined by a mineral or organic glass, possibly corrector or tinted, fixed or worn by the user.

In addition, since the display device can be worn by the hearing-impaired person in certain embodiments of the invention, the help system can be used very easily in many situations.

The phonetic transcription of the sound data can be complete. The phonetic transcription, partial or complete, of the sound data can be performed in phonemes. A phoneme is a sound element of a given language, determined by the relationships it has with other sounds of that language. For example, the word "neck" is composed of the phonemes "keu" and "or". The French includes 36, including 16 vowels and 20 consonants.

In an exemplary implementation of the invention, the help system allows the user to disable the display of phonetic transcription based on, for example, user preference or the quality of speech recognition, which may be dependent on the sound environment.

The display of an at least partial phonetic transcription does not require a grammatical analysis of a complete sentence by the processing system and makes it possible to gain in speed, which allows the phonetic transcription to be displayed almost simultaneously with the movement of the lips. Thus, the hearing-impaired person does not suffer from any excessive sensory shift with respect to the observed scene, the display of the phonetic transcription being able to be performed with a delay in relation to remission of the sound data that can be less than 100 ms. The phonetic transcription may be carried out with various signs, which may include images, pictograms, photographs or representations of gestures of hands and / or facial expressions, including a hand and / or virtual face, alphanumeric characters or special, phonemes, graphemes, or even possibly personalized signs whose appearance is decided at Tavance by the user. The phonetic transcription can not involve any grammar and do not contain any alphanumeric characters, in order to be easily apprehendable by a child who can not read.

The signs that can be displayed can be selected in a database of images stored by the processing system. In the case where a hand, which can be virtual, is displayed, it can take a configuration selected from those of the LPC.

The hard-of-hearing person may interpret the phonetic transcription himself to reconstruct the word and sentence, and possibly correct for himself phonetic transcriptions that would be erroneous depending on the context. The displayed signs make it possible to remove the ambiguity existing between several phonemes corresponding to the same movement of the lips. It may be advantageous if the signs displayed for the phonetic transcription are international and independent of a particular language.

The invention may facilitate the integration of the deaf into hearing classes and / or the intervention of untrained teachers in classes with deaf pupils, and avoid the presence of "coders".

The deaf student can then have the lab reading accompanied by the equivalent of the keys of the LPC without a specific learning, neither of him nor of his teacher, nor of joint intervention of informant and the coder, such an intervention requiring an important preparation and common reflection. The processing system can be arranged to parameterize the speech signal of the speaker, segment the sound data into elementary linguistic segments, and identify them.

The parameterization can consist in obtaining a characteristic "imprint" of the sound by successively applying to its electrical signal a mathematical treatment, based on the frequency decomposition of the signal, for example the Fourier transform, without prior knowledge of its fine structure. This characteristic imprint of the sound can be represented by a "spectrogram", i.e. a graph giving rise to amplitude and frequency as a function of time. The acoustophonetic decoding implemented by the processing system can make it possible to describe the acoustic signal in terms of discrete linguistic units and aims at segmenting the signal into elementary segments. If these linguistic units are long, such as syllables, words or a sentence, the recognition itself will be facilitated, but their identification is difficult. If short linguistic units are chosen, such as "phones", localization would be easier, but the hard-of-hearing effort will be more important to exploit them. "Phonemes" can be a good compromise, as their number is limited.

The sound data, after parameterization, can be compared to reference data in terms of tempo-frequency acoustic images. A database containing average phonetic fingerprints may be used to allow the recognition of multiple voices independently of the speaker and to make the help system "multilocutor".

The processing system can then accept and understand different voices, accents, etc., and be robust to possible noise. The treatment system can, in an example of implementation of

Tinvention, do not require training before the voice of the interlocutor, which facilitates the use of the help system.

The manufacture of the models of the words to be recognized can be done thanks to the arrangement of previously manufactured models of phonemes and not from numerous recordings of the words. Their identification can be done according to articulatory and phonetic data. Layout instructions may imply the phonemization of the words to be recognized. Stochastic modeling, in the form of Markovian models and / or neuromimic models, can be used to choose the most resembling sound, regardless of the durations and rhythms pronounced. Examples of modeling include: - Y. Laprie and Ch. Cerisara, "Towards Success in Speech Recognition", Project

SPEECH, INRIA Lorraine / LORIA,

- L. R. Rabiner, "A tutorial on hidden Markov models and selected applications in speech recognition", Proc. IEEE, vol. 77, No. 2, 1989, p. 257

- Presentations in ^XXIIIth Study Days on the Speech, Aussois, 19-23 June 2000, for example: M. Adda-Decker and L. Lamel, "Systems of automatic alignment & studies of pronunciation variants" and others,

- V. Luba and A. Younes "Multimedia Project: Hidden Markov Models. Recognition of the word ", Polytechnic Faculty of Mons, 2005,

- B. Jacob, "A computer tool for managing hidden Markov models: experiments in automatic speech recognition", University P. Sabatier,

Toulouse, 1995,

- The Hidden Markov Model Toolkit (HTK): http://htk.eng.cam.ac.uk/.

H. Schwenk and J. -L. Gawain, "Using Continuous Space Language Models or Conversational Speech Recognition"; IEEE Workshop on Spontaneous Speech Recognition, 2003, and

- J. L. Gauvain, L. Lamel, and G. Adda, "The LIMSI Broadcast News Transcription System. Speech Communication ". 37 (1-2): 89-108, 2002, the contents of which are incorporated by reference.

Modeling can use acoustic models of phones and other types of segments, such as breaths, hesitations, and various environmental noise frequently observed.

A better prediction of the models of phones can be obtained by distinguishing, for a given phone, different models according to the phonemic context.

A phonemic decision tree can share the same number of Gaussians between a large number of contexts and thus reduce the number of hypotheses to be evaluated and the overall cost of decoding in computing time. We can usefully refer to the publication G. Linares, P. Nocera and D. Matrouf, "Dynamic Partitioning of distributions for the calculation of emissions in a Markovian acoustic-phonetic decoder ", ^XXIIIth Study Days on the Speech, Aussois, 19-23 June 2000, the contents of which are incorporated by reference.

Thus, small mobile processing systems such as PDAs may be sufficient to achieve a high performance and robust level of phonetic transcription.

According to another of its aspects, independently or in combination with the foregoing, the subject of the invention is a hearing aid system comprising: an acquisition device, with or without integrated pretreatment, of the sounds emitted by an interlocutor; of the person, comprising:

at least one microphone arranged to be worn by the interlocutor, or

at least one directional microphone directed towards the interlocutor,

a head-up display device, preferably for integrating the microphones,

a processing system for analyzing, in real time, sound data transmitted by the acquisition device and transmitting to the display device an at least partial phonetic transcription of this sound data, to be displayed in the person's field of vision so as to to allow him to observe simultaneously, that is to say, without sensible sensory shift, both the movement of the lips and / or the gestures of the speaker and the phonetic transcription.

The presence of at least one directional microphone, integrated or not in the head-up display device, or a microphone worn by the person increases the signal-to-noise ratio and facilitates speech recognition even in an environment sound relatively noisy.

According to another of its aspects, independently or in combination with the foregoing, the invention further relates to a hearing aid system comprising:

a device for acquiring noise emitted in the sound environment of the person,

a head-up display device, a processing system for analyzing, in real time, sound data transmitted by the acquisition device, arranged to recognize noises other than speech, and to transmit to the display device at least a partial phonetic transcription of these noises for its display in the field of vision of the person with hearing loss.

Thus, the hearing-impaired person can be informed about the sound environment and be warned of the presence of a danger, for example.

The noises recognized by the treatment system may include: horn, alarm, traffic noise, screaming of children, screaming of animals, ringing of the telephone, etc.

If necessary, the processing system can be arranged to allow the user himself to program the recognition of a particular noise, for example a ringing of a given device, and the display of a corresponding information, which can present a graphic defined by the user, if any. This can allow a personalized adaptation of the help system to a particular sound environment, or even to a particular danger.

According to other non-limiting aspects of the invention, the acquisition device may comprise a microphone integrated in a headset or a tie microphone.

When the microphone is integrated in a headset, it may allow, especially in the context of privacy or in the case of rehabilitation, that the interlocutor of the hearing-impaired person receives the information broadcast by a speaker of Roreillette , allowing not only the transmission of the original sounds, but also the displayed information, transformed into sound by the computing unit. For this, the computing unit can use a speech synthesis software of known type, allowing the transcription of text displayed in speech (Text-To-Speech). Thus the interlocutors of the hearing impaired can control that the projected information is correct and repeat it if necessary.

The processing system can be arranged to recognize both phonetic units and predefined noises.

The phonetic transcription of the noises emitted in the person's sound environment may involve no grammar and no alphanumeric characters, in order to be easily apprehended by a child who can not read. The display in the field of vision of the person carrying the device of the sound noises emitted in the environment can be carried out with a delay in relation to their emission less than 100 ms.

The treatment system can use general acoustic models, qualified on large corpora, including several interlocutors with a statistical distribution of age, sex, and geographic regions (accents) to represent a medium speaking language situation and guarantee excellent performance in the most common situations.

However, when there is some peculiarity of a speaker deviating from standard speech, an adaptation of the acoustic models may be necessary to improve performance.

Thus, in an exemplary implementation of the invention, the processing system is arranged to perform a phonetic learning allowing the individualization of non-stored, but frequently used, words and the integration of unanticipated pronunciation variants for the stored words. This can be particularly useful in the case of native-speaking interlocutors or regional accents.

Still in an exemplary implementation, the processing system can be arranged to adapt the acoustic models with the voice data collected in the field during use, thus improving the accuracy of acoustico-phonetic modeling.

The adaptation of the acoustic models may allow adaptation to the speaker, including a way of speaking or a language or accent, one or more input channels and / or a particular sound environment.

The processing system may be arranged to receive an update of the program (s) and / or files used to analyze the sound data, including acoustic models.

The processing system can also be arranged to be able to download programs and / or files, including acoustic models, according to criteria selected by the user, for example the language of the interlocutor, the geographical area, the signs to be displayed. for phonetic transcription, the sounds to be recognized ..

The display device may be arranged to receive data to be displayed over a wireless link. The display device may include glasses. Alternatively, the display device can be integrated with a helmet, a desk or a desk, among others. The display device may comprise a monochrome or color projection device. The help system may comprise at least one directional microphone and at least one omnidirectional microphone, the processing system being arranged to eliminate ambient noise not useful for speech understanding by differential processing of the signals received from the omnidirectional and directional microphones. .

The invention further relates to a method for displaying a visual transcription of words uttered by the interlocutor of a person with hearing loss, comprising the steps of:

- to pick up the words spoken by the speaker, preferably by means of at least one microphone carried by interlocutor or directional and directed towards it, preferably integrated in the head-up display device, - to segment and analyze in real time these words for recognizing phonetic units and generating at least partial phonetic transcription of these words, for example in the form of a sequence of phonetic signs,

- display in the field of vision of the person, through a head-up display device, the phonetic transcription so as to allow him to simultaneously see the movement of the lips and / or the actions of the speaker and the phonetic transcription.

In a variant, the phonetic transcription of the words can be complete. This phonetic transcription can be done in phonemes.

The invention further relates, in another of its aspects, to a method of displaying a visual transcription of noises present in the sound environment of a hearing-impaired person, comprising the steps of:

- capture, preferably by means of an omnidirectional microphone, integrated or not in the head-up display device, the noises in the sound environment of the hearing-impaired person, in particular noise from machinery or apparatus or cries from children or animals,

analyze in real time these noises and generate an at least partial visual transcription of these when recognized, - display in the person's field of vision, through a head-up display device, a visual transcription of these noises.

The invention will be better understood on reading the detailed description which follows, examples of non-limiting implementation thereof, and on examining the appended drawing, in which:

FIGS. 1 to 4 show diagrammatically different examples of aid systems according to the invention,

FIG. 5 is a block diagram illustrating an example of a method according to the invention; FIG. 6 represents examples of images that can be used as visual transcription for various corresponding noises;

FIG. 7 represents hand and finger positions of the LPC,

FIG. 8 is an exemplary image that can be displayed, and

FIG. 9 is a spectrogram of the word computer. FIG. 1 shows an example of a help system 1 produced according to the invention, comprising a head-up display device 10 for the hearing-impaired person, a processing system 20 arranged to send information to the display device 10 and an acquisition device 30 for transmitting sound data to the processing system 20. In the example under consideration, the display device comprises a pair of spectacles provided with display means in the field of vision of the hearing-impaired person.

The display device is for example a pair of glasses commercially available from the company THE MICRO OPTICAL CORPORATION. An example of such glasses is described in WO 99/23524.

The glasses may include a miniature projection device, monochrome or color, carried by a branch, for example, to display information on Tun glasses at least.

The projection device may be a transparent liquid crystal screen and / or light emitting diodes. It may also include a laser, for example.

The glasses can also incorporate at least one microphone and a power source. The information displayed by the display device may be in a region of the glasses allowing the hearing-impaired person to simultaneously perceive this information and the lips and / or gestures of an interlocutor, including facial expressions. The display region is for example located in a central area of the user's field of view, particularly in the case of a monocular display of the micro-display type, or remote to a peripheral area for greater comfort.

The display area may cover all or part of the user's field of view when the display is transparent and allows the concurrent perception of the audio-visual scene to the presented information, in particular in the case of a display projecting images. on a transparent glass, which is for example the case of glasses

Head-up company of THE MICRO OPTICAL CORPORATION.

The acquisition device 30 is for example intended to be worn by the speaker of the hearing-impaired person, and may be in the form of an integrated microphone earpiece.

The headset is for example a headset as conventionally used in association with a mobile phone, in order to leave the hands of the user free.

The processing system 20 can exchange data with the display device 10 and the acquisition device 30 by wire links or not, for example by radio frequency or infrared links, the data transmission taking place for example according to the protocols BlueTooth ^® , Wifi ^® , 802.1 lb ^® or others.

The processing system 20 comprises for example at least one microprocessor and at least one memory, being configured to execute a computer program for processing the sound data received from the acquisition device 30. The processing system 20 is for example a microcomputer staff provided with appropriate interfaces to receive the sound data from the acquisition device 30 and substantially address the display data in real time to the display device 10.

In the variant of Figure 2, the processing system 20 comprises a local processing unit 21 that can communicate with the remote processing unit 22, the latter performing all or part of the signal processing. The local processing unit 21 is, for example, a digital personal assistant provided with appropriate interfaces for receiving sound data from the acquisition device 30 and transmitting the data to be displayed to the display device 10. The local processing unit 21 can exchange information with the remote processing unit 22 by a wired link or not, particularly radio frequency or infrared.

The remote processing unit 22 is for example a microcomputer or a server of a site by computer network, in particular the Internet network. In a variant not shown, the local processing unit 21 is a wireless telephone and the remote processing unit 22 is for example a server communicating via the telephone network with the local processing unit 21.

In the variant of Figure 3, the processing system 20 is integrated with the display device 10, the latter directly receiving the sound data from the acquisition device 30 by a wire link or not, including a radio frequency link.

In the variant of Figure 4, the acquisition device 30 is no longer in the form of an atrium but in the form of one or more microphones 31 that are not worn by the interlocutor of the hearing-impaired person. The microphones 31 are for example standing microphones, and can be connected to the processing system 20 by a wired connection as in the example shown, or alternatively non-wired.

In a variant that is not illustrated, the acquisition device 30 is a tie microphone that can be worn by the interlocutor of the hearing-impaired person. In another variant not shown, the acquisition device 30 is a microphone that can be integrated into the device of the head-up display and directed to the speaker.

When the acquisition device 30 comprises at least one microphone carried by the speaker of the hearing-impaired person, which is for example the case of the atrium illustrated in FIGS. 1 to 3 or the tie microphone of the aforementioned variant, the proximity of the microphone with the sound source reduces the influence of the environment ambient sound in the further processing of sound data and facilitates speech recognition.

The help system may not have an additional microphone.

In an implementation variant, the help system comprises on the one hand at least one directional microphone to best capture the sounds emitted by the interlocutor of the hearing-impaired person or a microphone placed close to it, which can be directional or not, as is the case of a microphone integrated in a headset or microphone tie and on the other hand at least one additional microphone to capture the ambient sound environment. Such an additional microphone is advantageously an omnidirectional microphone.

The help system can function as shown in Figure 5.

The acquisition of the sound data can be done with one or more microphones as mentioned above. In the case of a multi-microphone acquisition, differential data processing may be performed to separate the data from the sound source to be analyzed, namely the hearing impaired person's interlocutor, and the data corresponding to the environment. sound, not useful for understanding speech.

The processing of the data leads to the display in the field of vision of the hearing-impaired person of a phonetic transcription 40 which is adapted to a rapid apprehension by the hearing-impaired person.

The processing system 20 is preferably sufficiently fast to allow the display of a phonetic transcription of a sound uttered by the interlocutor within less than 100 ms. The amount of information displayed can be chosen to be compatible with the simultaneous display and to ensure the apprehension of the information displayed by the person with hearing loss.

It can thus be advantageous for the displayed information to be limited to a sequence of phonetic signs for each sound pronounced, displayed sound after sound, without any sensory shift with the sound.

It can be for example various signs like characters representing phonemes, syllables, graphemes, or images, for example pictograms, and for the indications of the sound environment of two- or three-dimensional curves.

LPC finger or hand positions as illustrated in FIG. 7 may also be shown, alone or in addition to the display of another phonetic sign, for example a phoneme, as shown in FIG. Deaf child, who learns the LPC, can thus learn French at the same time, and will have fewer difficulties to pass the course of the written language.

Alternatively or additionally, the sound data may be analyzed so as to enable the display device to display information associated with noises other than speech, for example a buzzer sound, an alarm, a scream child, a traffic noise.

Each of these noises can for example be detected and cause the display on the display device of a corresponding image. By way of example, FIG. 6 represents various images that can be displayed to signal the sound presence of a horn, of a motor noise of a bus or of a car, of a baby cry, of a rooster's song or a bark.

If necessary, the processing system can be arranged to allow customization of the recognized noises, through a learning phase of the system or downloading data selected by the user. In all the illustrated examples, the recognition of sounds or noises can be carried out by a process based on the temporal, frequency and energy decomposition of the signal, for example the Fourier transform, then a classification according to stochastic models in predefined linguistic units, for example phoneme or word. During processing, the electrical signal from the microphone when the speaker speaks is sampled to generate digital data.

The processing system is arranged to analyze and parameterize these data and perform a mathematical treatment thereof.

This processing may include the description of the speech acoustic signal in terms of discrete linguistic units.

The most used units can be phonemes, syllables, words. A phoneme is a sound element of a given language, determined by the relationships it has with other sounds of that language. For example, the word "neck" is composed of the phonemes "keu" and "or".

The processing carried out can make it possible to segment the signal into elementary segments and once the segmentation has been performed, the identification of the different segments can be done according to phonetic and / or linguistic constraints.

The signal, following this processing, depends on three parameters: time, frequency and intensity, and can be represented in the form of a spectrogram.

By way of example, FIG. 9 is a spectrogram of the word computer. The vertical axis represents frequencies up to 8000 Hz, the horizontal axis shows the positive time to the right and the colors represent the intensity of the most important acoustic peaks of a given time slot, the red representing the highest energies high.

An analytic approach, which takes advantage of the linguistic structure of words, can be used to detect and identify elementary components (phonemes and syllables). This approach has a more general character to recognize large vocabularies because it is enough to record in the memory of the processing system the main characteristics of the basic units. Words may not be memorized in their entirety, but treated as a sequence of phonemes. The speaker signal is obtained by subtracting the spectral density of the background noise from the spectral density of the signal detected by the microphone. The background noise spectrum can be approximated by the average of the noise spectra measured during the silences separating the utterances. When an additional multidirectional microphone is present, the background noise spectrum can be estimated by this microphone. The invention is not limited to the examples which have just been described.

The display device 10 can be of any type suitable for head-up vision, not being limited to glasses.

The display device 10 may include in particular a transparent screen placed in the field of view of the hearing-impaired person, not worn by the latter, so as to enable him to observe both the information displayed on this screen and the interlocutor. The display device 10 may also include a table accessory, for example of the modern pendulum type, with a transparent liquid crystal display.

(LCD) or light-emitting diode (LED), placed in the field of view of the hearing-impaired person, not worn by the person, so as to enable him / her to observe both the information displayed synchronously and the movements of the interlocutor.

The display device 10 may also comprise, for example, a window, on which the information is projected, by means of at least one laser for example. This window can be present at a desk or a desk. It can still be the visor of a helmet. If necessary, the information displayed in the field of vision of the hearing-impaired person may also be displayed on a subsidiary screen, or even on another pair of glasses, in order, for example, to enable a re-educator to control the information visualized by the hearing-impaired person. .

The expression "having one" shall be understood as being synonymous with "having at least one", unless the opposite is specified.

Claims

1. A hearing aid system (1), comprising: a device (30) for acquiring sounds transmitted by a person's interlocutor,

a head-up display device (10),

a processing system (20) for analyzing, in real time, sound data transmitted by the acquisition device and transmitting to the display device an at least partial phonetic transcription of these sound data in phonemes, to be displayed in the field of view of the person, so as to enable him to observe both the movement of the lips and / or the gestures of the interlocutor and the phonetic transcription.

2. Hearing aid system, comprising:

a device for acquiring noise emitted in the sound environment of the hearing-impaired person,

a head-up display device,

a processing system for analyzing, in real time, sound data transmitted by the acquisition device, arranged to recognize noises other than speech, and to transmit to the display device at least partial visual transcription of these noises for their display in the field of vision of the person with hearing loss.

3. System according to claim 1 or 2, the sound acquisition device comprising:

at least one microphone arranged to be carried by the interlocutor, or at least one directional microphone directed towards the interlocutor and preferably integrated in the head-up display device.

4. System according to claim 2, the noises being selected among Tun noises: horn, alarm, traffic noise, cries child (s).

5. System according to claim 3, the acquisition device comprising a microphone integrated in a headset or microphone tie.

6. A help system according to claim 2, the processing system being arranged to recognize phonetic units.

7. A help system according to claim 1, the processing system being arranged to recognize predefined noises.

8. A help system according to any one of the preceding claims, the processing system being arranged to decompose the sound data into parameterized units and select phonemes by stochastic modeling.

9. A help system as claimed in any one of the preceding claims, the display device being arranged to receive data to be displayed over a wireless link.

10. A help system as claimed in any one of the preceding claims, the display device comprising spectacles.

11. Assistance system according to claim 3, comprising at least one directional microphone and at least one omnidirectional microphone, the processing system being arranged to eliminate ambient noise not useful for understanding speech by a differential treatment of received signals. omnidirectional and directional microphones.

12. System according to claim 1, being arranged to transmit to the display device a complete phonetic transcription of sound data in phonemes.

13. System according to any one of the preceding claims, the display in the person's field of vision of noises and / or sounds being made with a delay of less than 100 ms.

14. System according to claim 2, the transcription having no alphanumeric character.

15. System according to claim 1, the phonetic transcription comprising the representation of gestures of hands.

16. A method of displaying a phonetic transcription of words uttered by the speaker of a hearing-impaired person, comprising the steps of:

capturing, in particular by means of at least one microphone carried by the speaker or directional and directed towards it, the words uttered by the speaker, analyzing in real time these words to recognize phonetic units and generating a phonetic transcription in phonemes at the less partial of these words, display in the person's field of vision, using a head-up display device, the phonetic transcription so as to enable him simultaneously to see both the movement of the lips of the speaker and / or his gestures and the phonetic transcription.

17. A method of displaying a visual transcription of sounds present in the sound environment of a hearing-impaired person, including machine or apparatus noises or cries from children or animals, comprising the steps of:

- pick up the noises,

analyze in real time these noises and generate a visual transcription at least partial of these noises when recognized,

- display in the field of vision of the person, through a head-up display device, the visual transcription.