WO2001042875A2

WO2001042875A2 - Language translation voice telephony

Info

Publication number: WO2001042875A2
Application number: PCT/US2000/042472
Authority: WO
Inventors: Ralph Samuel Hoefelmeyer; James Patrick Brechtel
Original assignee: Mci Worldcom, Inc.
Priority date: 1999-12-02
Filing date: 2000-12-01
Publication date: 2001-06-14
Also published as: AU4512601A; WO2001042875A3

Abstract

To allow speakers of different languages to communicate with each other, either verbally or in written words, the present invention takes the incoming voice stream, which could be either analog or digital, and breaks it into appropriate segments. The appropriately broken down segments are then converted from a voice stream into a text stream and stored in a text buffer. The texts, and particularly the words in the text, are classified according to its vocabulary, grammar, and semantics classes to generate a source language object (Figure 1). Once that is done and the text stream is parsed into a tree format, the texts are mapped onto a standard reference language to generate a standard reference language object.

Description

Title: Language Translation Voice Telephony

Field of the Invention

The present invention relates to telecommunications and more particularly to the conversion of a voice message of one language to a corresponding voice or text message of a second language in .a grammatically correct manner.

Background of the Invention

In the present global economy, peoples who speak different languages often have to communicate with each other. This communication between persons who speak different languages often are done either by voice or by texts such as emails. To communicate effectively, oftentimes persons who speak different languages would have to settle on a common language. Yet due to the oftentimes idiomatic and grammatic usage of a language, such communication may become confusing and misunderstood. Moreover, the translation of one language into another language is often done manually by a human translator, therefore necessitating a time delay as well as the cost of the translator.

In a telephony environment where the conversation between two persons are held in real time and the exchange of texts such as emails between the parties are held in substantially real time, the use of translators would at best be awkward and time consuming, and at worst not work. Thus, if two parties who speak different languages were to communicate effectively, without having to require the services of translators, a method of readily converting a voice stream in one language into either spoken or written texts in an other language in substantially real time is needed.

Summary of the Invention

The translation system of the present invention provides an abstracted, or generalized, capability for effecting language translation among different human languages, both written and spoken. This method is based on a context-centric basis and focuses specifically to an object oriented technique.

In particular, in a telephony environment, the incoming speech signal, either analog or digital, Is first stored in a voice buffer. The speech stream that is stored in the buffer is broken down into phoneme sets. This could be done to an analog input signal by sensing the distinct changes in the state of the carrier signal. In the case of a digital signal such as a voice packet stream, the silence symbol of the digitized voice stream can be used for separating the voice stream into the phoneme sets.

Once given the various phoneme sets, the speech is converted into texts. The texts, most likely in ASCII format, are stored in a text buffer.

The individual text words in the buffer are then tokenized so as to generate a token that is unique to each word. The tokenized words are used as keys for effecting a fast retrieval or look-up in a tokenized dictionary of a given standard reference language. A pattern matching mechanism is then used to validate the tokenized texts against the tokenized dictionary of the standard reference language. Thereafter, the tokenized texts in the text buffer are translated by means of a translation mechanism that is specific to the targeted language. This is done by applying grammatical rules using a grammar engine that parses the tokenized words in the buffer and applies the grammatical rules of the target language thereto. Once the text words have been translated into the target language, they are compressed and packetized. Thereafter, the packetized packets could be transmitted.

In the case of a communication in a textual format, the packetized text words could be transmitted directly, for example in the form of an email or fax. On the other hand, if it is a voice communication, then the packetized words are fed to a voice synthesizer for conversion, so that the voice output Is in the target language. If there is compression of the tokenized texts, decompression is used for decompressing the tokenized texts.

It is therefore an objective of the present invention to provide. a method of translating in substantially real time an incoming voice message of one language into a communicative output of a different language.

It is another objective of the present invention to translate an input voice stream of one language into either an output voice stream or an output text stream of another language. It is yet another objective of the present invention to allow persons who speak or understand different languages to communicate directly without the need for translators.

Brief Description of the Figures The above-mentioned objectives and advantages of the present invention will become apparent and the invention itself will be best understood by reference to the following description of an embodiment of the invention taken in conjunction with the accompanying drawings, wherein:

Fig. 1 is a high level block diagram of the architecture of the system of the present invention; and

Fig. 2 is a flow chart illustrating the method in which an input voice stream is translated into a target language data stream.

Detailed Description of the invention With reference to Fig. 1 , a general translation system (GTS) architecture of the present invention for converting an input voice stream of one language into an output voice or text stream of another language is shown.

In particular, an input voice stream 2, from an input transmission medium such as for example either a landline or wireless phone, -is received by a conventional voice signal receiver 4. The voice stream is then routed to a speech to text converter 6 that converts the voice stream into text. Converter 6 could be a hardwired converter or a processor that runs any one of a number of conventional speech to text conversion programs such as for example Dragon Naturally Speaking by the Dragon System Inc. of Newton, Massachusetts that enables the conversion of spoken words into texts. Moreover, the technology for converting speech to texts is well known and two such systems are described in U.S. patent 5,031 ,113 assigned to the U.S. Phillips Corporation and U.S. patent 5,754,978 assigned to the Speech Systems of Colorado Company. The disclosures of the '113 and '978 patents are incorporated by reference herein.

The incoming voice stream could be either an analog voice stream or a stream of digitized voice packets. In any event, the voice stream is chunked into phoneme sets for storage in a text buffer, or a series of text buffers 8. The partition of the voice stream may be done by the pauses that are inherent in the spoken words uttered by a speaker. And there are a number of conventional systems available for identifying the phoneme sound types that are contained in an audio speech stream. One such system for recognizing speech and dividing the speech into phoneme sets is described in U.S. patent 5,646,490, assigned to the Fonix Corporation. The disclosure of the '490 patent is incorporated by reference herein.

The texts in the text buffer 8, which are in the language of the input voice stream, i.e., the source language, are then parsed. Each word of the text stream is then tokenized by a tokenizing algorithm so as to build a tree of tokenized words, which are tied to a standard reference language. This could be done by utilizing the NeoData program by the NeoCore Corporation of Colorado Springs. In essence, the NeoData program generates a grammar object based on the grammar of the source language, and then expresses the source grammar object in terms of the grammar object of a standard reference language. The NeoData program therefore provides a transform generator that, in receipt of input transforms and data strings, would convert them into new transform outputs. The technology behind the NeoData program is disclosed in U.S. patent 5,942,002 assigned to the NeoCore Corporation. The disclosure of the '002 patent is incorporated by reference herein.

The source language in addition to having a particular grammar also has a given vocabulary. By means of the tokenizing technique as disclosed in the '002 patent, the grammar and the vocabulary of the source language are combined to generate a semantics object that is expressed in terms of the standard reference language. The standard reference language can be any language, such as for example English, that has a sufficiently rich vocabulary and grammar so as to be able to act as a standard from which other languages could be compared to and translated from.

The vocabulary class is basically an object that points to a given place in a dictionary of the particular language for definition. So, too, any language has its own grammar object classes, such as for example nouns, subjects, objects, predicates, possessives, interrogatives, etc. that are common in a language such as for example the English language. In other words, the grammar object class encapsulates state models and language meaning modifiers for the language. And once the actual state models and language meaning modifiers are defined for a language such as for example the standard reference language, specific grammar classes for the language could be derived.

The standard reference language further has a semantics object that in essence provides a study of the meanings, significance and changes in the various words and phrases of the language, and the linguistic development of the meaning and relationship of the various words of the language. In essence, with the vocabulary object and the grammar object being combined to generate the semantics object, the source language, and also the standard reference language, would each have objects that comprise the vocabulary class 10, the grammar class 12 and the semantics class 14, as shown in the source language module 16 of Fig. 1.

Source language module 16 is further shown to be connected to a standard reference language module 18. The interconnection between source language module 16 and standard reference language module "18 allows the matching of the various classes between the source language and the standard reference language. This may be referred to as a pattern matching for looking up text, words, or phrases in the standard reference language that correspond to the transformed texts, words, or phrases in the source language. By thus patterning or correlating the source language with the standard reference language, a semantics object . is created with the standard reference language that is a combination of the dictionary, grammar and semantics classes of the source language. Therefore, a meaningful translation of the voice stream in a standard reference language text could be generated. The texts generated from the mapping of the source language texts with the standard reference language texts could be done using the method and architecture as described in U.S. patent 5,677,835 assigned to the Catipillar Inc. In brief, the "835 patent discloses that source texts may be converted into target texts by using a constraints source language analyzer and a machine translation generator. The disclosure of the '835 patent is incorporated by reference herein.

The thus translated or derived source language text stream is stored in a source reference language mapped text store 20. Depending on the language to be targeted, a selector 22 would select from among a number of target language modules 24a-24n for mapping therewith the translated voice stream texts based on the standard reference language. Another mapping process such as that taught in the '835 patent whereby the translated voice stream based on the standard reference language is mapped to the target language is effected so that a text stream that now is based on the targeted language is sent to and stored in a target language text buffer 26. From there, the translated target language text stream could be output a number of ways via a transmission output medium such as for example laπdline or wireless telephony.

For example, if it is a voice-to-voice communication between two speakers, the translated voice stream now based on the target language is output to a voice synthesizer 28 so that a voice stream based on the target language is output to the listener, who presumably is a speaker of the target language. Such translated speech may be output as voice packets in a telephony environment with insignificant lag time. Of course, the reverse process takes place in a two-way voice communication, as the listener then becomes the speaker and the same process as described above, but in the reverse order, will take place and will continue until the conversation is terminated.

In the event that the input voice stream is to be output as texts such as for example an email or fax to the receiving party, or a braille message if the receiving party happens to be blind, then the translated texts stored in the target language text buffer 26 are output as a text stream.

Note that the various modules as noted in Fig. 1 could be program applications that reside and run in a computer or processor means such as for example a Pentium based personal computer. Furthermore, the various buffers may be high speed and high capacity IC memory chips built onto a board inserted into any one of the available slots of the personal computer.

Fig. 2 provides a flow chart for illustrating the method of how the voice stream in the source language is translated to a standard reference language, and the subsequent translation of the standard reference language text stream to a target language text stream.

In particular, when a voice stream is received at the processor, it is converted into texts and chunked into segments using some natural division or pauses in the voice stream The chunked voice text segments are then stored into a text buffer such as buffer 8. This is illustrated in step

30. Thereafter, in step 32, the stored texts are processed and mapped onto a source language object. This is done by matching each word to the vocabulary and placing each word in its grammatical context, and then matching the word with the semantics object of the language so as to place the word into its proper context as being used in the input voice stream. Having done that, the word could then be mapped with the standard reference language, as illustrated in process step 34. And with the vocabulary, grammar and semantics classes or objects having been established for each word, the now mapped standard reference language word is next related to the target language, by means of the different objects of the target language, per step 36. Once that is done and a target language object is generated, a target voice stream is created and provided to a buffer of the target language, per step 38. Thereafter, the now converted voice stream, in the form of a text stream, is output from the voice buffer per step 40. As was mentioned previously, the target language texts could be output in either a speech or a text format.

Inasmuch as the present invention is subject to many variations, modifications and changes in detail, it is intended that all matter described throughout this specification and shown in the accompanying drawings be interpreted as illustrative only and not in a limiting sense. For example, even though the present invention discussed so far relates to the conversion of an incoming voice message, in practice, an incoming text message in one language could be converted just as well as the former. Such translation of an input text message may occur, for example, in the guise of a received email of one language that requires translation to another language, or for that matter, to an output voice message. In such input text scenario, there is no need for any speech to voice translation process. Accordingly, it is intended that the invention be limited only the spirit and scope of the hereto attached claims.

Claims

1. A method of translating a source language into a target language, comprising the steps of: breaking an incoming voice stream of said source language into phoneme sets; converting said phoneme sets of said incoming voice stream into a text stream of a standard reference language; storing said text stream into a text buffer; translating said text words in said text buffer into text words of said standard reference language; and converting said translated text words of said standard reference language into words of said target language.

2. Method of claim 1 , wherein said translating step comprises the steps of: parsing each word of said text stream; correlating said each word with the vocabulary of said standard reference language to obtain a vocabulary object; defining a grammar object of said each word based on the grammar of said source language; and deriving from a combination of said vocabulary and grammar objects a semantics object for said source language.

3. Method of claim 2, further comprising the steps of: expressing said semantics object for said source language in said standard reference language; mapping said semantics object of said standard reference language with said target language; and outputting a target language object patterned from said standard reference language.

4. Method of claim 2, further comprising the step of : tokenizing the words of said text stream in said text buffer before said correlating step.

5. Method of claim 1 , wherein said words converted from said source language into said target language can be either spoken or written texts.

6. Method of claim 1 , wherein said translating step comprises the steps of: defining a grammar object class that encapsulates state models and language meaning modifiers for said standard reference language; deriving language specific grammar classes wherein the actual state models and language meaning modifiers are defined for said standard reference language; and using said derived language specific grammar classes to generate a stream of words in said target language that corresponds semantically to said source language.

7. Method of claim 1 , wherein said incoming voice stream comprises voice packets.

8. Method of claim 1 , further comprising the step of: outputting said converted words of said target language as voice packets..

9. A method of converting an input voice stream of one language into an output of an other language, comprising the steps of: receiving from an input transmission medium said input voice stream; breaking said input voice stream into phoneme sets; converting said phoneme sets of said input voice stream into a text stream of a standard reference language; storing said text stream into a text buffer; translating the words of said text stream in said text buffer into text words of said standard reference language; converting said translated text words of said standard reference language into words of said other language; combining the words of said other language into an output stream; and outputting said output stream onto an output transmission medium.

10. Method of claim 9, wherein said translating step comprises the steps of: parsing each word of said text stream; correlating said each word with the vocabulary of said standard reference language to obtain a vocabulary object; defining a grammar object of said each word based on the grammar of said one language; and deriving from a combination of said vocabulary and grammar objects a semantics object for said other language.

11. Method of claim 10, further comprising the steps of: expressing said semantics object for said one language in said standard reference language; mapping said semantics object of said standard reference language with said other language; and outputting an object of said other language patterned from said standard reference language.

12. Method of claim 9, further comprising the step of : tokenizing the text words of said text stream in said text buffer so that said words are readily retrievable after said breaking step; and packetizing said output stream into voice packets before outputting said voice packets onto said output transmission medium.

13. Method of claim 9, wherein said words converted from said one language into said other language can be either spoken or written texts.

14. Apparatus for translating a source language into a target language, comprising: means for breaking an incoming voice stream of said source language into phoneme sets; means for converting said phoneme sets of said incoming voice stream into a text stream of a standard reference language; means for storing said text stream into a text buffer; means for translating said text words in said text buffer into text words of said standard reference language; and means for converting said translated text words of said standard reference language into words of said target language.

15. Apparatus of claim 14, further comprising: means for parsing each word of said text stream; means for correlating said each word with the vocabulary of said standard reference language to obtain a vocabulary object; means for defining a grammar object of said each word based on the grammar of said source language; and means for deriving from a combination of said vocabulary and grammar objects a semantics object for said source language.

16. Apparatus of claim 15, further comprising: means for expressing said semantics object for said source language in said standard reference language; and means for mapping said semantics object of said standard reference language with said target language; and means for outputting said target language mapped from said standard reference language.

17. Apparatus of claim 14, further comprising: means for tokenizing the text words of said text stream in said text buffer; and means for packetizing said tokentized text words into output packets for said target language; and means for outputting said target language packets onto an output transmission medium.

18. Apparatus of claim 14, wherein said words converted from said source language into said target language can be either spoken or written texts.

19. A system for converting an input voice stream of one language into an output of an other language, comprising: processor means; receiver means workingly connected to said processor means for receiving from an input transmission medium said input voice stream, said input voice stream being routed to said processor means; said processor means including module means for breaking said input voice stream into phoneme sets; module means for converting said^'phoneme sets of said input voice stream into a text stream of a standard reference language; store means electrically connected to said processor means for storing said text stream into a text buffer; said processor means further including module means for translating the words of said text stream in said text buffer into text words of said standard reference language; module means for converting said translated text words of said standard reference language into words of said other language; module means for combining the words of said other language into an output stream; and transmitting means electrically connected to an output medium for outputting said output stream onto said output transmission medium.

20. System of claim 19, wherein said translating module means further performs the operations of: parsing each word of said text stream; correlating said each word with the vocabulary of said standard reference language to obtain a vocabulary object; defining a grammar object of said each word based on the grammar of said one language; and deriving from a combination of said vocabulary and grammar objects a semantics object for said other language.

21. System of claim 20, wherein said translating module means further performs the operations of: expressing said semantics object for said one language in said standard reference language; and mapping said semantics object of said standard reference language with said other language; and outputting as the other language mapped from said standard reference language.