EP0953969A1 - Method for rendering speech with silence regulation - Google Patents

Method for rendering speech with silence regulation Download PDF

Info

Publication number
EP0953969A1
EP0953969A1 EP99400873A EP99400873A EP0953969A1 EP 0953969 A1 EP0953969 A1 EP 0953969A1 EP 99400873 A EP99400873 A EP 99400873A EP 99400873 A EP99400873 A EP 99400873A EP 0953969 A1 EP0953969 A1 EP 0953969A1
Authority
EP
European Patent Office
Prior art keywords
silence
sound
restitution
speech
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
EP99400873A
Other languages
German (de)
French (fr)
Other versions
EP0953969B1 (en
Inventor
Philippe Charbonnier
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sagem SA
Original Assignee
Sagem SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sagem SA filed Critical Sagem SA
Publication of EP0953969A1 publication Critical patent/EP0953969A1/en
Application granted granted Critical
Publication of EP0953969B1 publication Critical patent/EP0953969B1/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/005Correction of errors induced by the transmission channel, if related to the coding algorithm
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/04Time compression or expansion

Definitions

  • speech is transmitted in the form a continuous stream of code words representing the instantaneous amplitude of the voice signal, cyclically sampled and digitized for this purpose.
  • a communication has a permanent transmission channel, or circuit, to flow this continuous flow.
  • the code words representing the voices are transmitted in packets, on a channel offering sufficient speed to be able to share the channel in time between several terminals.
  • the crossing time of the network is essentially variable, because each packet can follow a variable path and we cannot take count the packets as soon as they are received, because a packet can arrive before the end of the sound reproduction of the previous one, or can, on the contrary, happen after this one.
  • packet crushing in the second case, dead time.
  • the present invention aims to remedy these drawbacks.
  • the invention relates to a method of sound reproduction of speech signals, received in successive packets representing slices successive speech times and temporarily stored before to be restored with sound, a process characterized by the fact that detects the presence of silences, in the slices received, and that regulates the duration of restitution to reproduce, in a single piece, the signals of speech other than silences.
  • the device of FIG. 1 comprises a microprocessor-based controller 1 controlling the operation of a reception and restitution chain sound of speech signals.
  • the chain connected as input to line 2 of a packet communication network, has as input a circuit 3 for reading the received packets, which identifies the sound segments and the rests and reconstructs their temporal position.
  • explicit reference points may exist that distinguish the segments and indicate their dates, or else such explicit marks do not exist and circuit 3 reconstructs them from the serial number of packets and their content decoded into voice signal whose form it analyzes.
  • Circuit 3 is followed by a buffer memory 4 of storage of the packets, which puts them back in their order of transmission and transmits to a circuit 5 of reproduction or sound reproduction commanding a headset 6.
  • a transmitting terminal in communication with that of figure 1, analyzes the signal of its microphone by a vocoder for the code in compressed form.
  • FIG. 2A represents the amplitude S of the voice signal as a function of time t .
  • a vocoder seeks to delimit segments with a voice signal corresponding to a normalized sound in library, such as a vowel, a consonant or a quasi-silence. For the transmission of information representing the signal S, this is then replaced by a series of code words representing the sounds recognized there. The volume of information data is thus very reduced. In reception, the consultation of a similar library allows the original signal S to be restored.
  • a normalized sound in library such as a vowel, a consonant or a quasi-silence.
  • the signal is not analyzed so finely for the coding, and it is circuit 3 of the receiver which analyzes the reconstructed signal to locate the segments of silence.
  • the information coded is transmitted in packets P1, P2, P3, P4 each carrying a more or less long section of the speech.
  • the signal S comprises, in the packet P1, the end S0 of a silence, a block of speech energy signal 11 followed of a silence S1 and of another block 12.
  • the block 12 is continues, with reference 13, and is followed by two blocks 14 and 15 with rests S2 and S3 interposed.
  • the package P3 includes the end of block 15, referenced 16, a silence S4, a block 17, a silence S5 and the beginning 18 of a block followed, packet P4, of an end 19 of the block then of the start S6 of a silence.
  • Blocks 11, 12-13, 14, 15-16, 17, 18 here represent the six sequences respective: “and” “the invention” “is” “new” “and” “inventive” (figure 2B).
  • block 12-13 (just like 15-16 and 18-19), spread over the two packets, P1 and P2, may be temporarily separated in two when of its sound reproduction. We therefore seek here to avoid it.
  • the instants t0, t1, t2, t3 delimit the slices of the initial signal assigned to successive packets P1, P2, P3, P4.
  • References t'0, t'1, t'2, t'3 (fig. 2C), uniformly translated with respect to the instants respective t0, t1, t2, t3, mark the corresponding theoretical dates of playback in the earpiece 6. Due to the fluctuation in the delay transmission, packets may arrive early or late. In this example, packet P2 arrives after a dead time or delay R following the instant t'1 of end of restitution of the packet P1. On the contrary, the P3 package, although arriving after t'2, arrives ahead of the end of the restitution of the P2 package.
  • the rests S2 and S4 are here shortened, or abbreviated (.) to empty buffer memory 4 as quickly as possible in order to better tolerate early arrivals of following packages. This is of particular interest in the case for which the duration of restitution of sound sequence (s) triggered by the arrival of a packet is greater than the theoretical period. Indeed an arriving packet can complete an earlier sound block representing an uninterrupted signal duration spanning several time slices, which will then be returned.
  • FIG. 3 illustrates the management for this purpose of the sound reproduction of packets received.
  • step 23 we shorten, in step 24, the silence in course or even delete it and play back the above sound sequence by transition to state 21. If not, in steps 22 and 23, we pass to a state 26 of silence reproduction.
  • step 25 is detected as a negative overflow of the memory 4 (no information available in memory, relating to next segment to be reproduced).
  • step 27 it is also detected if the memory 4 is empty of the segment next to reproduce and we decide in such a case, step 27, to extend the silence beyond its normal duration, i.e. we insert a silence additional.
  • step 28 if a contiguous sound sequence, like 12 - 13, is present and therefore available (packets P1 and P2 received) and represents a duration exceeding a threshold B, the silence in progress in state 26 (S1) will be abbreviated.
  • a contiguous sound sequence like 12 - 13
  • packets P1 and P2 received packets
  • S1 the silence in progress in state 26
  • step 26 We leave state 26 when all the silence that was to be reproduced has been and that the following sound sequence should be reproduced. We delay it however by passing temporarily or durably through a state 29, emission of background noise, or background noise, sounding or tone like "uhh", indicating that the speaker will speak again, which avoids being cut off by replacing an extended silence, or additional, by background noise.
  • the two conditions of step 28 are also sought in state 29 and we pass, in yes, in state 21 of sound reproduction.
  • step 20 if the delay in restitution exceeds a threshold C, by example of 1.5 times the value of threshold A, which indicates an overflow memory positive 4.
  • a threshold C by example of 1.5 times the value of threshold A, which indicates an overflow memory positive 4.
  • the oldest rests, restore first are reduced and possibly almost deleted. It can also be planned to delete the sound sequences the older or simply time slices thereof, which is equivalent to modulating, here accelerating, the speed of reproduction of the sound signal.

Abstract

The silent periods between speech signals are detected and the length of restoration gap of the signals regulated using a single measure to produce the speech signals.

Description

Dans le réseau téléphonique commuté, la parole est transmise sous forme d'un flux continu de mots de code représentant l'amplitude instantanée du signal vocal, échantillonné cycliquement et numérisé à cet effet. Une communication dispose en permanence d'une voie de transmission, ou circuit, pour écouler ce flux continu.In the switched telephone network, speech is transmitted in the form a continuous stream of code words representing the instantaneous amplitude of the voice signal, cyclically sampled and digitized for this purpose. A communication has a permanent transmission channel, or circuit, to flow this continuous flow.

Par contre, sur un réseau informatique du type de l'Internet, ou sur certains réseaux radio en mode paquet, les mots de code représentant la voix sont transmis par paquets, sur un canal offrant un débit suffisant pour pouvoir partager temporellemént le canal entre plusieurs terminaux.On the other hand, on a computer network of the Internet type, or on some packet radio networks, the code words representing the voices are transmitted in packets, on a channel offering sufficient speed to be able to share the channel in time between several terminals.

La transmission par paquets à travers un réseau engendre cependant des problèmes.Packet transmission over a network, however, generates problems.

En effet, le temps de traversée du réseau est essentiellement variable, car chaque paquet peut suivre un chemin variable et on ne peut pas prendre en compte les paquets dès leur réception, car un paquet peut arriver avant la fin de la restitution sonore du précédent, ou peut, au contraire, arriver après celle-ci. Dans le premier cas, il y aurait, si prise en compte, écrasement de paquet, dans le deuxième cas, temps mort.In fact, the crossing time of the network is essentially variable, because each packet can follow a variable path and we cannot take count the packets as soon as they are received, because a packet can arrive before the end of the sound reproduction of the previous one, or can, on the contrary, happen after this one. In the first case, there would be, if taken into account, packet crushing, in the second case, dead time.

On interpose donc une mémoire tampon dans laquelle on stocke les paquets au rythme aléatoire de leurs arrivées et on lit les paquets, pour la restitution sonore, au rythme fixe de l'émetteur de ceux-ci, avec un retard, par rapport à leurs instants d'émission, qui est fixe et suffisamment grand, par rapport aux fluctuations des durées de transmission, pour que les paquets aient été reçus.We therefore interpose a buffer memory in which we store the packets at random rate of their arrivals and we read the packets, for the sound reproduction, at the fixed rhythm of the transmitter thereof, with a delay, relative to their transmission times, which is fixed and sufficiently large, in relation to fluctuations in the durations of transmission, so that the packets were received.

Il peut même se produire que les paquets soient reçus dans un ordre différent de celui de leur émission, si bien qu'il faut les numéroter à l'émission pour rétablir l'ordre voulu en restitution. It may even happen that packets are received in order different from their broadcast, so you have to number them at the issue to restore the order in restitution.

Tout cela nécessite que la mémoire tampon soit de taille relativement grande. Cependant ce retard de la restitution sonore devient perceptible et gênant pour le dialogue entre les deux interlocuteurs.All of this requires the buffer to be relatively large big. However, this delay in sound reproduction becomes perceptible and troublesome for the dialogue between the two interlocutors.

La présente invention vise à remédier à ces inconvénients.The present invention aims to remedy these drawbacks.

A cet effet, l'invention concerne un procédé de restitution sonore de signaux de parole, reçus par paquets successifs représentant des tranches temporelles successives de parole et mémorisés temporairement avant d'être restitués de façon sonore, procédé caractérisé par le fait qu'on détecte la présence de silences, dans les tranches reçues, et qu'on en régule la durée de restitution pour restituer, d'un seul tenant, les signaux de parole autres que les silences.To this end, the invention relates to a method of sound reproduction of speech signals, received in successive packets representing slices successive speech times and temporarily stored before to be restored with sound, a process characterized by the fact that detects the presence of silences, in the slices received, and that regulates the duration of restitution to reproduce, in a single piece, the signals of speech other than silences.

Ainsi, on module le temps, au niveau de la restitution, pour, en pratique, compenser la fluctuation sur les instants d'arrivée des paquets. On réunifie ainsi temporellement les parties d'une séquence sonore d'origine qui ont été séparées physiquement, et donc temporellement, de par leur mise dans des paquets différents. Comme la modulation du temps porte sur les silences, elle est en pratique sans inconvénient pour la compréhension des paroles.Thus, we modulate the time, at the level of the restitution, for, in practice, compensate for the fluctuation in the times of arrival of the packets. We thus temporally reunites the parts of an original sound sequence which have been separated physically, and therefore temporally, by their put in different packages. As time modulation carries on silences, it is in practice without disadvantage for the understanding the lyrics.

Avantageusement, on abrège la restitution de tout silence qui est suivi d'une séquence sonore complète à restituer.Advantageously, the restitution of any silence that is followed is shortened of a complete sound sequence to be reproduced.

L'invention sera mieux comprise à l'aide de la description suivante d'une forme de réalisation préférée du procédé de l'invention, en référence au dessin annexé, sur lequel :

  • la figure 1 est un schéma par blocs illustrant un dispositif de mise en oeuvre du procédé de l'invention,
  • la figure 2, formée des figures 2A, 2B, 2C, 2D et 2E, illustre le découpage par paquets d'un signal de parole en fonction du temps t, et
  • la figure 3 est un diagramme de cheminement illustrant le procédé.
The invention will be better understood using the following description of a preferred embodiment of the method of the invention, with reference to the appended drawing, in which:
  • FIG. 1 is a block diagram illustrating a device for implementing the method of the invention,
  • FIG. 2, formed by FIGS. 2A, 2B, 2C, 2D and 2E, illustrates the cutting up of a speech signal in packets as a function of time t , and
  • Figure 3 is a flow diagram illustrating the process.

Le dispositif de la figure 1 comporte un automate 1 à microprocesseur commandant le fonctionnement d'une chaíne de réception et de restitution sonore de signaux de parole. La chaíne, reliée en entrée à une ligne 2 d'un réseau de communication par paquets, comporte en entrée un circuit 3 de lecture des paquets reçus, qui repère les segments sonores et les silences et reconstitue leur position temporelle. Selon le type de codage employé pour représenter la voix, des repères explicités peuvent exister qui distinguent les segments et indiquent leurs dates, ou bien de tels repères explicités n'existent pas et le circuit 3 les reconstitue à partir du numéro d'ordre des paquets et de leur contenu décodé en signal vocal dont il analyse la forme. Le circuit 3 est suivi d'une mémoire tampon 4 de stockage des paquets, qui remet ceux-ci dans leur ordre d'émission et les transmet à un circuit 5 de restitution ou reproduction sonore commandant un écouteur 6.The device of FIG. 1 comprises a microprocessor-based controller 1 controlling the operation of a reception and restitution chain sound of speech signals. The chain, connected as input to line 2 of a packet communication network, has as input a circuit 3 for reading the received packets, which identifies the sound segments and the rests and reconstructs their temporal position. According to the type of coding used to represent the voice, explicit reference points may exist that distinguish the segments and indicate their dates, or else such explicit marks do not exist and circuit 3 reconstructs them from the serial number of packets and their content decoded into voice signal whose form it analyzes. Circuit 3 is followed by a buffer memory 4 of storage of the packets, which puts them back in their order of transmission and transmits to a circuit 5 of reproduction or sound reproduction commanding a headset 6.

Le fonctionnement du dispositif ci-dessus va maintenant être expliqué.The operation of the above device will now be explained.

De façon classique, un terminal émetteur, en communication avec celui de la figure 1, analyse le signal de son microphone par un vocodeur pour le coder sous forme comprimée.Conventionally, a transmitting terminal, in communication with that of figure 1, analyzes the signal of its microphone by a vocoder for the code in compressed form.

La figure 2A représente l'amplitude S du signal vocal en fonction du temps t.FIG. 2A represents the amplitude S of the voice signal as a function of time t .

Dans une variante explicite, un vocodeur cherche à délimiter des segments comportant un signal vocal correspondant à un son normalisé en bibliothèque, comme par exemple une voyelle, une consonne ou un quasi-silence. Pour la transmission d'informations représentant le signal S, celui-ci est alors remplacé par une suite de mots de code représentant les sons qui y ont été reconnus. Le volume de données d'information est ainsi très réduit. En réception, la consultation d'une bibliothèque semblable permet la restitution du signal S d'origine.In an explicit variant, a vocoder seeks to delimit segments with a voice signal corresponding to a normalized sound in library, such as a vowel, a consonant or a quasi-silence. For the transmission of information representing the signal S, this is then replaced by a series of code words representing the sounds recognized there. The volume of information data is thus very reduced. In reception, the consultation of a similar library allows the original signal S to be restored.

Dans une variante implicite, le signal n'est pas analysé si finement pour le codage, et c'est le circuit 3 du récepteur qui analyse le signal reconstitué pour repérer les segments de silence. Dans tous les cas, l'information codée est transmise en paquets P1, P2, P3, P4 véhiculant chacun une tranche plus ou moins longue du discours. In an implicit variant, the signal is not analyzed so finely for the coding, and it is circuit 3 of the receiver which analyzes the reconstructed signal to locate the segments of silence. In all cases, the information coded is transmitted in packets P1, P2, P3, P4 each carrying a more or less long section of the speech.

Dans l'exemple de la figure 2A, le signal S comporte, dans le paquet P1, la fin S0 d'un silence, un bloc de signal énergétique de parole 11 suivi d'un silence S1 et d'un autre bloc 12. Dans le paquet P2, le bloc 12 se poursuit, avec la référence 13, et est suivi de deux blocs 14 et 15 avec des silences S2 et S3 interposés. Le paquet P3 comporte la fin du bloc 15, référencée 16, un silence S4, un bloc 17, un silence S5 et le début 18 d'un bloc suivi, paquet P4, d'une fin 19 du bloc puis du début S6 d'un silence.In the example of FIG. 2A, the signal S comprises, in the packet P1, the end S0 of a silence, a block of speech energy signal 11 followed of a silence S1 and of another block 12. In the packet P2, the block 12 is continues, with reference 13, and is followed by two blocks 14 and 15 with rests S2 and S3 interposed. The package P3 includes the end of block 15, referenced 16, a silence S4, a block 17, a silence S5 and the beginning 18 of a block followed, packet P4, of an end 19 of the block then of the start S6 of a silence.

Les blocs 11, 12-13, 14, 15-16, 17, 18 représentent ici les six séquences respectives : "et" "l'invention" "est" "nouvelle" "et" "inventive" (figure 2B).Blocks 11, 12-13, 14, 15-16, 17, 18 here represent the six sequences respective: "and" "the invention" "is" "new" "and" "inventive" (figure 2B).

En réception, le bloc 12-13 (tout comme 15-16 et 18-19), réparti sur les deux paquets, P1 et P2, risque d'être temporellement séparé en deux lors de sa restitution sonore. On cherche donc ici à l'éviter.On reception, block 12-13 (just like 15-16 and 18-19), spread over the two packets, P1 and P2, may be temporarily separated in two when of its sound reproduction. We therefore seek here to avoid it.

Les instants t0, t1, t2, t3 (fig. 2A) délimitent les tranches du signal initial affectées aux paquets successifs P1, P2, P3, P4. Les références t'0, t'1, t'2, t'3 (fig. 2C), uniformément translatées par rapport aux instants respectifs t0, t1, t2, t3, marquent les dates théoriques correspondantes de restitution dans l'écouteur 6. En raison de la fluctuation du délai de transmission, des paquets peuvent arriver en avance ou en retard. Dans cet exemple, le paquet P2 arrive après un temps mort ou retard R suivant l'instant t'1 de fin de restitution du paquet P1. Au contraire, le paquet P3, bien qu'arrivant après t'2, arrive en avance sur la fin de la restitution du paquet P2.The instants t0, t1, t2, t3 (fig. 2A) delimit the slices of the initial signal assigned to successive packets P1, P2, P3, P4. References t'0, t'1, t'2, t'3 (fig. 2C), uniformly translated with respect to the instants respective t0, t1, t2, t3, mark the corresponding theoretical dates of playback in the earpiece 6. Due to the fluctuation in the delay transmission, packets may arrive early or late. In this example, packet P2 arrives after a dead time or delay R following the instant t'1 of end of restitution of the packet P1. On the contrary, the P3 package, although arriving after t'2, arrives ahead of the end of the restitution of the P2 package.

La figure 2D illustre une restitution qui serait immédiate. La phrase :

  • "et..l'invention...est...nouvelle...et...inventive",
  • dans laquelle "..." représente un silence naturel comme S1, devient :
  • "et...l'inven□ □ □ tion...est...nouv/elle...et...inventive"
  • où "□ □ □" représente un silence parasite qui s'interpose (de durée quelconque), coupant le mot "invention".
  • et "/" représente une superposition additive, ou un écrasement, entre le début et la fin (qui arrive trop tôt) du mot "nouvelle", le mot "inventive" étant de même déformé.
  • Figure 2D illustrates a restitution which would be immediate. The phrase :
  • "and..the invention ... is ... new ... and ... inventive",
  • in which "..." represents a natural silence like S1, becomes:
  • "and ... the invention □ □ □ tion ... is ... new ... and ... inventive"
  • where "□ □ □" represents an interfering parasitic silence (of any length), cutting the word "invention".
  • and "/" represents an additive superimposition, or overwriting, between the beginning and the end (which arrives too early) of the word "new", the word "inventive" being likewise distorted.
  • La figure 2E illustre le procédé de l'invention. Sur celle-ci,

  • "." représente un silence d'origine qui a été abrégé et
  • "....." représente un silence d'origine qui a été allongé.
  • FIG. 2E illustrates the method of the invention. On this one,
  • "." represents an original silence which has been abbreviated and
  • "....." represents an original silence which has been lengthened.
  • On remarque tout d'abord que le silence parasite □ □ □ de la figure 2D a disparu, de même que la superposition "/".We notice first of all that the parasitic silence □ □ □ of figure 2D has disappeared, as well as the overlay "/".

    A l'arrivée du paquet P1, le bloc sonore entier 11 ("et") est ici restitué immédiatement (on suppose qu'on était dans une phase de restitution du début du silence S0 transmis dans le paquet qui précède). Par contre, la séquence 12 ("l'inven") du bloc 12-13 n'est pas restituée à ce moment, car elle ne constitue qu'une portion de séquence. Le silence S1 est allongé (.....) jusqu'à réception du paquet P2, pour lequel les séquences complètes 12-13 et 14, avec le silence S2, sont restituées. Lorsqu'arrive le paquet P3, en avance relative par rapport au paquet P2, le silence S3 est raccourci (.) pour restituer, sans délai notable, la séquence : "nouvelle.et.". Les silences S2 et S4 sont ici raccourcis, ou abrégés (.) pour vider au plus vite la mémoire tampon 4 afin de mieux tolérer des arrivées anticipées de paquets suivants. Ceci présente un intérêt surtout dans le cas pour lequel la durée de restitution de séquence(s) sonore(s) déclenchée par l'arrivée d'un paquet est supérieure à la période théorique. En effet un paquet arrivant peut compléter un bloc sonore antérieur représentant une durée ininterrompue de signal s'étendant sur plusieurs tranches de temps, qui va alors être restitué.When packet P1 arrives, the entire sound block 11 ("and") is restored here immediately (we assume that we were in a phase of restitution of the start of the S0 silence transmitted in the preceding packet). However, the sequence 12 ("the inven") of block 12-13 is not restored at this time, because it is only a portion of a sequence. S1 silence is elongated (.....) until reception of the P2 packet, for which the sequences 12-13 and 14, with S2 silence, are returned. When arrives packet P3, in relative advance with respect to packet P2, silence S3 is shortened (.) to restore, without notable delay, the sequence: "new.and.". The rests S2 and S4 are here shortened, or abbreviated (.) to empty buffer memory 4 as quickly as possible in order to better tolerate early arrivals of following packages. This is of particular interest in the case for which the duration of restitution of sound sequence (s) triggered by the arrival of a packet is greater than the theoretical period. Indeed an arriving packet can complete an earlier sound block representing an uninterrupted signal duration spanning several time slices, which will then be returned.

    La figure 3 illustre la gestion à cet effet de la restitution sonore des paquets reçus. Figure 3 illustrates the management for this purpose of the sound reproduction of packets received.

    On part ici d'un état 21 dans lequel le circuit 5 restitue une séquence sonore comme 12 - 13 ou 14. Lorsqu'arrive la fin de celle-ci, donc le début d'un silence détecté par le circuit 3, on teste, à une étape 22, la présence dans la mémoire tampon 4 de la description continue du signal jusqu'à au moins le début du segment ou bloc sonore suivant. Dans l'affirmative, on teste, à une étape 23, si le retard dépasse un seuil haut A. On appelle ici retard la durée totale de signal à restituer se trouvant en tampon; le taux de remplissage du tampon peut en constituer une approximation commode. On peut choisir un seuil A nul, c'est-à-dire passer alors de l'étape 22 à une étape 24, indiquée plus loin, de retour à l'état 21.We start here from a state 21 in which the circuit 5 restores a sequence sound like 12 - 13 or 14. When the end of it arrives, so the beginning of a silence detected by circuit 3, we test, in a step 22, the presence in buffer 4 of the continuous description of the signal until at least the start of the next segment or sound block. In if so, we test, in a step 23, if the delay exceeds a high threshold A. The total duration of the signal to be reproduced in buffer; the buffer filling rate can be one convenient approximation. We can choose a threshold A zero, that is to say then go from step 22 to step 24, indicated below, back to state 21.

    Dans l'affirmative à l'étape 23, on abrège, à l'étape 24, le silence en cours ou même le supprime et restitue la séquence sonore ci-dessus par passage à l'état 21. Dans la négative, aux étapes 22 et 23, on passe à un état 26 de reproduction de silence.If yes in step 23, we shorten, in step 24, the silence in course or even delete it and play back the above sound sequence by transition to state 21. If not, in steps 22 and 23, we pass to a state 26 of silence reproduction.

    On peut aussi passer à l'état 26, depuis l'état 21, en insérant un silence supplémentaire, si l'on détecte, étape 25, un débordement négatif de la mémoire 4 (absence d'information disponible en mémoire, relative au segment suivant à reproduire).We can also go to state 26, from state 21, by inserting a silence additional, if step 25 is detected as a negative overflow of the memory 4 (no information available in memory, relating to next segment to be reproduced).

    A l'état 26, on détecte également si la mémoire 4 est vide du segment suivant à reproduire et on décide en pareil cas, étape 27, de prolonger le silence au-delà de sa durée normale, c'est-à-dire qu'on insère un silence supplémentaire.At state 26, it is also detected if the memory 4 is empty of the segment next to reproduce and we decide in such a case, step 27, to extend the silence beyond its normal duration, i.e. we insert a silence additional.

    Au contraire, étape 28, si une séquence sonore jointive, comme 12 - 13, est présente et donc disponible (paquets P1 et P2 reçus) et représente une durée dépassant un seuil B, le silence en cours à l'état 26 (S1) va être abrégé. Dans cet exemple, on fait de même si un bloc (12) de séquence sonore présente un retard à la restitution dépassant le seuil A, même en l'absence de réception de la fin (13) du bloc. Si, par contre, le seuil B n'est pas atteint, on insère un silence supplémentaire. On the contrary, step 28, if a contiguous sound sequence, like 12 - 13, is present and therefore available (packets P1 and P2 received) and represents a duration exceeding a threshold B, the silence in progress in state 26 (S1) will be abbreviated. In this example, we do the same if a block (12) of sequence sound has a delay in restitution exceeding threshold A, even in the absence of reception of the end (13) of the block. If, on the other hand, threshold B is not reached, an additional silence is inserted.

    On quitte l'état 26 lorsque tout le silence qui devait être reproduit l'a été et qu'on doit reproduire la séquence sonore suivante. On la retarde cependant en passant transitoirement ou durablement par un état 29, d'émission d'un bruit de fond, ou d'ambiance, à consonance ou tonalité vocale du genre "euhh", traduisant que le locuteur va à nouveau parler, ce qui évite qu'on lui coupe la parole en remplaçant un silence prolongé, ou supplémentaire, par le bruit de fond. Les deux conditions de l'étape 28 (seuils B et A) sont aussi recherchées à l'état 29 et on passe, dans l'affirmative, à l'état 21 de restitution sonore.We leave state 26 when all the silence that was to be reproduced has been and that the following sound sequence should be reproduced. We delay it however by passing temporarily or durably through a state 29, emission of background noise, or background noise, sounding or tone like "uhh", indicating that the speaker will speak again, which avoids being cut off by replacing an extended silence, or additional, by background noise. The two conditions of step 28 (thresholds B and A) are also sought in state 29 and we pass, in yes, in state 21 of sound reproduction.

    On y teste (étape 20) si le retard à la restitution excède un seuil C, par exemple de 1,5 fois la valeur du seuil A, ce qui indique un débordement positif de la mémoire 4. En pareil cas, les silences les plus anciens, à restituer les premiers, sont réduits et éventuellement quasiment supprimés. Il peut aussi être prévu de supprimer les séquences sonores les plus anciennes ou simplement des tranches temporelles de celles-ci, ce qui revient à moduler, ici accélérer, la vitesse de restitution du signal sonore.We test there (step 20) if the delay in restitution exceeds a threshold C, by example of 1.5 times the value of threshold A, which indicates an overflow memory positive 4. In such cases, the oldest rests, restore first, are reduced and possibly almost deleted. It can also be planned to delete the sound sequences the older or simply time slices thereof, which is equivalent to modulating, here accelerating, the speed of reproduction of the sound signal.

    Claims (8)

    Procédé de restitution sonore de signaux de parole (11, S1, 12, 13, S2) reçus par paquets successifs (P1, P2, P3) représentant des tranches temporelles successives de parole et mémorisés temporairement (4) avant d'être restitués de façon sonore (5, 6), procédé caractérisé par le fait qu'on détecte la présence de silences (S1, S2), dans les tranches reçues, et qu'on en régule la durée de restitution pour restituer, d'un seul tenant, les signaux de parole (11, 12) autres que les silences.Method for sound reproduction of speech signals (11, S1, 12, 13, S2) received in successive packets (P1, P2, P3) representing slices successive speech times and temporarily memorized (4) before to be restored by sound (5, 6), process characterized by the fact that the presence of rests (S1, S2) is detected in the slices received, and that we regulate the duration of restitution to restore, in one piece, the speech signals (11, 12) other than rests. Procédé selon la revendication 1, dans lequel on abrège (24, 28) la restitution de tout silence (S1) dont au moins le début de la séquence sonore suivante est disponible.Method according to claim 1, in which the (24, 28) is shortened restitution of all silence (S1) including at least the beginning of the sequence next sound is available. Procédé selon la revendication 1, dans lequel on abrège (23, 28) la restitution du silence (S1) si le retard à la restitution des signaux de parole dépasse un seuil (A).Method according to claim 1, in which the (23, 28) is shortened restitution of silence (S1) if the delay in the restitution of the speech signals exceeds a threshold (A). Procédé selon l'une des revendications 1 à 3, dans lequel on insère un silence supplémentaire (25, 27) lorsqu'aucune séquence (11) suivante n'est disponible en mémoire.Method according to one of claims 1 to 3, in which a additional silence (25, 27) when no next sequence (11) is not available in memory. Procédé selon l'une des revendications 1 à 4, dans lequel, lorsqu'un silence (S1) n'est pas suivi en mémoire d'une séquence sonore de durée supérieure à un seuil (B), on insère un silence supplémentaire.Method according to one of claims 1 to 4, in which, when silence (S1) is not followed in memory by a sound sequence of duration above a threshold (B), an additional silence is inserted. Procédé selon l'une des revendications 4 et 5, dans lequel on remplace le silence supplémentaire par un bruit de fond (29).Method according to one of claims 4 and 5, in which it is replaced additional silence by background noise (29). Procédé selon la revendication 6, dans lequel le bruit de fond est à tonalité vocale.The method of claim 6, wherein the background noise is voice tone. Procédé selon l'une des revendications 1 à 7, dans lequel on élimine (20), de la restitution, les signaux mémorisés les plus anciens lorsque leur retard pour la restitution dépasse un seuil (C).Method according to one of claims 1 to 7, in which it is eliminated (20), of the restitution, the oldest memorized signals when their delay for return exceeds a threshold (C).
    EP19990400873 1998-04-27 1999-04-09 Method for rendering speech with silence regulation Expired - Lifetime EP0953969B1 (en)

    Applications Claiming Priority (2)

    Application Number Priority Date Filing Date Title
    FR9805227A FR2778011B1 (en) 1998-04-27 1998-04-27 METHOD OF RESTORING SPEECH WITH SILENCE CONTROL
    FR9805227 1998-04-27

    Publications (2)

    Publication Number Publication Date
    EP0953969A1 true EP0953969A1 (en) 1999-11-03
    EP0953969B1 EP0953969B1 (en) 2003-10-01

    Family

    ID=9525692

    Family Applications (1)

    Application Number Title Priority Date Filing Date
    EP19990400873 Expired - Lifetime EP0953969B1 (en) 1998-04-27 1999-04-09 Method for rendering speech with silence regulation

    Country Status (3)

    Country Link
    EP (1) EP0953969B1 (en)
    DE (1) DE69911685T2 (en)
    FR (1) FR2778011B1 (en)

    Citations (3)

    * Cited by examiner, † Cited by third party
    Publication number Priority date Publication date Assignee Title
    US5526353A (en) * 1994-12-20 1996-06-11 Henley; Arthur System and method for communication of audio data over a packet-based network
    EP0756267A1 (en) * 1995-07-24 1997-01-29 International Business Machines Corporation Method and system for silence removal in voice communication
    US5682384A (en) * 1995-10-31 1997-10-28 Panagiotis N. Zarros Apparatus and methods achieving multiparty synchronization for real-time network application

    Patent Citations (3)

    * Cited by examiner, † Cited by third party
    Publication number Priority date Publication date Assignee Title
    US5526353A (en) * 1994-12-20 1996-06-11 Henley; Arthur System and method for communication of audio data over a packet-based network
    EP0756267A1 (en) * 1995-07-24 1997-01-29 International Business Machines Corporation Method and system for silence removal in voice communication
    US5682384A (en) * 1995-10-31 1997-10-28 Panagiotis N. Zarros Apparatus and methods achieving multiparty synchronization for real-time network application

    Non-Patent Citations (2)

    * Cited by examiner, † Cited by third party
    Title
    DEMPSEY ET AL.: "Destination Buffering for Low-Bandwidth Audio Transmission using Redundancy-Based Error Control", PROCEEDINGS OF THE 21ST IEEE CONFERENCE ON LOCAL COMPUTER NETWORKS, 13 October 1996 (1996-10-13) - 16 October 1996 (1996-10-16), MINNEAPOLIS, pages 345 - 354, XP002088688 *
    RAMACHANDRAN RAMJEE ET AL: "ADAPTIVE PLAYOUT MECHANISMS FOR PACKETIZED AUDIO APPLICATIONS IN WIDE-AREA NETWORKS", PROCEEDINGS OF THE CONFERENCE ON COMPUTER COMMUNICATIONS (INFOCOM), TORONTO, JUNE 12 - 16, 1994, vol. 2, 12 June 1994 (1994-06-12), INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS, pages 680 - 688, XP000496524 *

    Also Published As

    Publication number Publication date
    DE69911685T2 (en) 2004-07-29
    FR2778011B1 (en) 2000-06-09
    EP0953969B1 (en) 2003-10-01
    DE69911685D1 (en) 2003-11-06
    FR2778011A1 (en) 1999-10-29

    Similar Documents

    Publication Publication Date Title
    US8937963B1 (en) Integrated adaptive jitter buffer
    US6885987B2 (en) Method and apparatus for encoding and decoding pause information
    EP0082077B1 (en) Method of teledistributing recorded information, particularly pieces of music, and system for carrying it out
    US7162418B2 (en) Presentation-quality buffering process for real-time audio
    US7319703B2 (en) Method and apparatus for reducing synchronization delay in packet-based voice terminals by resynchronizing during talk spurts
    US6873954B1 (en) Method and apparatus in a telecommunications system
    JP4485690B2 (en) Transmission system for transmitting multimedia signals
    WO2000067414A2 (en) A method and apparatus for providing continuous playback of audio and audio-visual streamed multimedia having non-deterministic delays
    EP1372289A2 (en) Generation of a frame descriptor of silence for generation of comfort noise
    CH653165A5 (en) METHOD AND APPARATUS FOR MOUNTING DIGITAL SIGNALS RECORDED ON A RECORDING MEDIUM.
    FR2488434A1 (en) CODED SIGNAL REPRODUCTION SYSTEM
    CN101518001B (en) Network jitter smoothing with reduced delay
    EP0953969B1 (en) Method for rendering speech with silence regulation
    EP0251854A1 (en) Method and device for the transmission of digital signals via higher rate data channels
    AU2007202608A1 (en) Method and device for the production and distribution of messages directed at a multitude of recipients in a communications network
    WO2002028015A2 (en) Apparatus, and an associated method, for compensating for variable delay of packet data in a packet data communication system
    CA2106118C (en) Method and devices for transmitting two heterochronous binary signals simultaneously over the same medium
    FR2848049A1 (en) Data packet processing procedure, involves fulfilling conditions pertaining to size of burst in packets, detecting memory burst content based on delay in reception of packet, and controlling content by fulfilling condition of size
    CN109155680B (en) Method and apparatus for resuming current audio/video reproduction after current audio/video reproduction is overwritten by interruption
    JP5330183B2 (en) Packet insertion / deletion method and call system
    WO2001001727A1 (en) Method for decoding and retrieving a sound signal in an asynchronous transmission system
    EP0632922B1 (en) High fidelity sound reproduction device for cinematographic films
    EP0762639B1 (en) Sound volume control signal for a receiver for block coded speech signals
    JP2000092122A5 (en)
    EP1245099A1 (en) Packet reception device

    Legal Events

    Date Code Title Description
    PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

    Free format text: ORIGINAL CODE: 0009012

    AK Designated contracting states

    Kind code of ref document: A1

    Designated state(s): DE GB

    AX Request for extension of the european patent

    Free format text: AL;LT;LV;MK;RO;SI

    17P Request for examination filed

    Effective date: 20000422

    AKX Designation fees paid

    Free format text: DE GB

    GRAH Despatch of communication of intention to grant a patent

    Free format text: ORIGINAL CODE: EPIDOS IGRA

    RIC1 Information provided on ipc code assigned before grant

    Ipc: 7G 10L 21/04 B

    Ipc: 7G 10L 19/00 A

    GRAS Grant fee paid

    Free format text: ORIGINAL CODE: EPIDOSNIGR3

    GRAA (expected) grant

    Free format text: ORIGINAL CODE: 0009210

    AK Designated contracting states

    Kind code of ref document: B1

    Designated state(s): DE GB

    REG Reference to a national code

    Ref country code: GB

    Ref legal event code: FG4D

    Free format text: NOT ENGLISH

    REF Corresponds to:

    Ref document number: 69911685

    Country of ref document: DE

    Date of ref document: 20031106

    Kind code of ref document: P

    GBT Gb: translation of ep patent filed (gb section 77(6)(a)/1977)

    Effective date: 20040120

    PLBE No opposition filed within time limit

    Free format text: ORIGINAL CODE: 0009261

    STAA Information on the status of an ep patent application or granted ep patent

    Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

    26N No opposition filed

    Effective date: 20040702

    REG Reference to a national code

    Ref country code: GB

    Ref legal event code: 732E

    Free format text: REGISTERED BETWEEN 20090319 AND 20090325

    REG Reference to a national code

    Ref country code: GB

    Ref legal event code: 732E

    Free format text: REGISTERED BETWEEN 20090326 AND 20090401

    REG Reference to a national code

    Ref country code: DE

    Ref legal event code: R082

    Ref document number: 69911685

    Country of ref document: DE

    Representative=s name: ,

    Ref country code: DE

    Ref legal event code: R082

    Ref document number: 69911685

    Country of ref document: DE

    REG Reference to a national code

    Ref country code: DE

    Ref legal event code: R081

    Ref document number: 69911685

    Country of ref document: DE

    Owner name: SAGEMCOM SAS, FR

    Free format text: FORMER OWNER: SAGEM S.A., PARIS, FR

    Effective date: 20120120

    REG Reference to a national code

    Ref country code: DE

    Ref legal event code: R081

    Ref document number: 69911685

    Country of ref document: DE

    Owner name: SAGEMCOM SAS, FR

    Free format text: FORMER OWNER: SAGEM TELECOMMUNICATIONS S. A., PARIS, FR

    Effective date: 20130129

    PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

    Ref country code: GB

    Payment date: 20160324

    Year of fee payment: 18

    PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

    Ref country code: DE

    Payment date: 20160321

    Year of fee payment: 18

    REG Reference to a national code

    Ref country code: DE

    Ref legal event code: R119

    Ref document number: 69911685

    Country of ref document: DE

    GBPC Gb: european patent ceased through non-payment of renewal fee

    Effective date: 20170409

    PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

    Ref country code: DE

    Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

    Effective date: 20171103

    PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

    Ref country code: GB

    Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

    Effective date: 20170409