WO1996012271A1 - Speech synthesis apparatus and method for synthesizing a finite set of sentences and numbers using one program - Google Patents
Speech synthesis apparatus and method for synthesizing a finite set of sentences and numbers using one program Download PDFInfo
- Publication number
- WO1996012271A1 WO1996012271A1 PCT/US1995/013134 US9513134W WO9612271A1 WO 1996012271 A1 WO1996012271 A1 WO 1996012271A1 US 9513134 W US9513134 W US 9513134W WO 9612271 A1 WO9612271 A1 WO 9612271A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sentence
- word
- data
- synthesized
- database
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/027—Concept to speech synthesisers; Generation of natural phrases from machine-based concepts
Definitions
- the present invention relates to techniques for synthesizing speech for use in data processing systems, telephone answering machines, and other devices, and more specifically, to an apparatus and method capable of synthesizing speech in multiple languages using a single application program.
- Synthesized speech is used in many electronic devices as part of the user interface to enable a user to interact with or obtain information from the device.
- Such devices typically contain a speech synthesizer chip which consists of a processor having speech synthesis capability
- the synthesized speech may be output through any one of several mediums, e.g., audio voice synthesis, morse code, message display, etc.
- the speech synthesizer chip may be separate from the other functional units of the device, or it may be incorporated with additional functions such as memory, digital signal processing, timers, etc As shown in Fig. 1.
- a typical speech synthesis chip 1 contains a system control ler 10 which is linked to a word synthesizer 12 by means of a communication link 14
- Word synthesizer 12 accesses vocabulary database 16 in order to retrieve word data needed to construct sentences in response to instructions issued by controller 10
- Vocabulary database 16 stores the words or groups o f words used to synthesize the sentences requested by controller 10 in a non-volatile memory
- Controller 10 typically contains an application program stored in a read-only memory (ROM ) with the program being designed for the specific application for which the sy nthesized words are required
- ROM read-only memory
- the application program includes routines written for each sentence which the speech synthesis chip I is expected to produce for the desired application.
- Each routine generates a desired sentence by causing controller 10 to issue a set of commands to word synthesizer 12 where each command causes a word or group of words in that sentence to be synthesized.
- the grammar rules, word order structure, and rules for constructing numbers (among other characteristics) specific to a particular language are embedded in the application program and are reflected in the order and types of commands w hich the program causes ontroller 10 to issue
- the present invention is directed to an apparatus and method for synthesizing a finite set of sentences and numbers in one of several languages using an application program which is independent of the language being synthesized
- the invention includes a system controller which communicates with a sentence and word synthesizer by means of a communication link
- the sentence and word synthesizer responds to instructions from the controller by accessing a vocabulary and sentence database which contains all of the language specific information usually found in a controller resident application program in standard implementations of speech synthesizers
- the language specific information is encoded in a language independent format in the database Therefore the application program can be w ritten in a form which is independent of the language to be synthesized
- the database contains all of the language specific information and its contents is retrieved b y an indexing sy stem winch assigns an index number to each sentence
- the application program causes the controller to issue a command to retriev e a desired sentence bv using its index number w here the command includes intormation regardi ng the specific data needed
- variables are ivpicallv numbers
- the control terms act to control the operation of the sentence synthesizer and determine the siructurc of the sentence being synthesized
- thev mas determ ine w hether the singular or plural lorm of a w ord is appropriate or act to produce the proper pronunciation ol a number depending upon its context
- the controller issues a command instructinc the sentence synthesizer to produce a sentence having a prescribed index number
- the command inc ludes the values of any variables needed t o complete the sentence
- the sentence synthesizer retriev es tlu sentence content from the database and then implements the sentence according to the words control terms and variables contained in it
- Each daia w ord in the sentence is read bv a word decoder w hich determines it the data word is a word v ariable or control term for each word to be synthesized tn .
- sentence sy nthesizer instructs a w ord synthesizer to retrieve that word from the database and prodnee it in spoken torm
- sentence svnthesizer points to a data table which contains the spoken word equivalents of the number or numbers to be produced by the speech synthesizer fhe data table points to the entries in the word database corresponding to the words needed to produce the spoken number
- w ords are then retriev ed and produced as speech by the action of the w ord su ⁇ thes ⁇ /er T he conirol terms are interpreted bv the sentence synthesizer as commands to carry out operations w hich implement the urammar rules contextual checking ete of the language and therebv determine the final sentence structure
- Fig 1 is a block diagram of a typical speech synthesis chip
- Fig 2 is a block diagram of a speech synthesis chip constructed according to the present invention
- Fig 3 is a flowchart showing the operation of the sentence synthesizer module of the present invention
- Fig 4 shows how a simple sentence is constructed and synthesized by the speech synthesis chip of the present invention
- Fig 5 is a block diagram of a telephone answering machine which incorporates the speech synthesizer chip of the present invention DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
- Fig 2 is a block diagram of a speech synthesis chip 100 constructed according to the present invention
- Speech chip 100 includes a system controller 102 which communicates with a sentence and word synthesizer 104 via a communication link 103
- System controller 102 can take the form of a separate processor which interacts with synthesizer 104 via communication link 103 in a master/slave type of architecture, or controller 102 can be a separate software module running on the same processor as synthesizer 104 In the latter situation, communication betw een controller 102 and synthesizer 104 occurs via the internal registers of the processor or by means of a v ariable in memory.
- Synthesizer 104 accesses vocabulary and sentence database 108 in order to construct sy nthesized speech sentences in response to commands issued by controller 102
- Database 108 is typically separated into two sections, a vocabulary or word section 109 and a sentence section 1 10
- Database 108 contains the words, grammar rules, numbers, and contextual information needed tor synthesizer 104 to synthesize sentences in response to commands from controller 102
- Sy nthesizer 104 ty pically contains tw o modules a sentence sy nthesizer 105 and a word synthesizer 106
- Sentence sy nthesizer 105 acts to control the production of a desired sentence by interpreting the data retriev ed from database 108 in response to a command from controller 102 to synthesize a particular sentence W ord sy nthesizer 106 acts to synthesize specific words in response to commands from sentence sy nthesizer 105
- Database 108 contains all of the language specific information needed to synthesize any of the set of sentences which system 100 is capable of synthesizing T h is is accomplished by use of a data structure which includes the language specific information in the definition of the sentence T hus as w ill be described in greater detail later w hen controller 102 issues a command to synthesize a particular sentence by providing its index, sentence synthesizer 105 retriev es that sentence structure from sentence section 1 10 of database 108. where the sentence structure contains al l of the grammar and contextual rules of the language being synthesized This significant reduces the complexity of the application program which is resident in controller 102. and makes the speec h sy nthesis system more flexible and capable of being used to synthesize multiple languages
- Fig 3 is a flowchart showing the operation of sentence sy nthesizer 105 module of the present invention
- Sentence synthesizer 105 receives an instruction from controller 102 to synthesize sentence (n).
- n represents the index of the sentence to be produced ( box 200)
- a pointer is set to the sentence w ith index (n) ( box 210) in the sentence sect ion 1 10 of database 108
- the sentence content is retriev ed from database 108 and the data is read one data w ord at a time by a word decoder contained in sentence sy nthesizer 105 ( box 220)
- a test is then performed to determine if the data w ord which has been read by the decoder is an end marker, signify ing the end of the sentence data ( box 230 ) If the data word is an end marker the program ends (box 250) I f the data word is not an end marker the character of the data word determines w hether a number is
- Controller 102 issues a command to synthesizer 104 via communication link 103
- the command of the form "synthesize sentence (n, x 1 , x 2 , x 3 , ... )", where n is a number corresponding to the sentence index, and x 1 , x 2 , x 3 , etc. represent values of the arguments or variables to be inserted into the sentence structure
- synthesizer 104 accesses database 108. using a pointer to retrie the sentence corresponding to index (n) from the sentence database portion 1 10 of database 108
- a sentence contained in database 108 is composed of data words representing three types of objects, words, variables, and control terms
- the words are fixed entries ("You have , etc in the example sentence) for the invariant parts of the sentence
- the sentence structure in database 108 contains pointers for the words to be synthesized which direct the word synthesizer portion 106 of synthesizer 104 to retrieve those word(s) from the word section 109 of database 108 and then synthesize them
- the variables or arguments correspond to portions of the sentence which change w ith the situatio in which the sentence is being synthesized They are usuallv numerals and the sentence structure contain a pointer to a numeral decoder or table 300 w hich translates the number (in this case 2 1 ) to its corresponding words ("twenty-one")
- the control terms are instructions w hich cause the synthesizer to check for a particular condition, such as the existence of a plural argument If the condition is satisfied the index to the next word to be synthesized
- a word decoder After retrieval of the appropriate sentence, a word decoder reads each data word from the sentence w here the data words correspond to the words, v ariables, and control terms previously described If the data w ord corresponds to a word or word group, that w ord or w ord group is retrieved from the v ocabularx or word section 109 of database 108 and then is svnthesized bv word synthesizer 106 Sentence synthesizer 105 then reads the next data w ord, which is the case of the example of Fig 4 is an instruction to go to table I to retrieve a number The instruction to go to table 1 can if necessary be followed b ⁇ a logic step which determines the contexi in which the number is being used in the sentence so that the appropriate spoken form of the number w ill be sy nthesized This logic step is important in languages such as German in which the form of a number (the actual w ords used to express that number) depends
- this context determining logic is represented by a context selector ( box 310) T he w ord follow ing the instruction to go to table 1 is read next and provides the argument for the variable in the sentence, in this case th e number of messages Based on this argument and the results of the context selector logic the appropriate entrx in table 1 or another data table is located A pointer or pomiers trom that entrv indicates the w ords in w ord section 109 of database 108 which correspond to the argument needed for the sentence This is followed by an instruction to word synthesizer 106 to synthesize those words
- Sentence synthesizer 105 then reads the next data word, w hich in this case is a control term of instruction to check if the argument is singular or plural If the argument is singular, tne word messag e is retrieved trom the w ord section 109 of database 108 and is then spoken by w ord sv ntnesizer 106 I f the argument is plural, then sentence synthesizer 105 increments the w ord index by one therein causing the word "messages" to be retrieved and synthesized
- sentence sv nthesizer 105 of the present invention performs the processing steps necessary to retrieve the sentence to be synthesized, parse through the data words which comprise the content of that sentence, and control the synthesizing of each of the w ords or variables in that sentence In this way the complete sentence is svnthesized bv a sequence of logic steps and instructions to retrieve words from the word section 109 of database 108 and then synthesize those words.
- the present invention solves the problems inherent in the prior art approach of synthesizing a sentence word by word under the control of a controller by incorporating the language specific information in the same database as the word information.
- the application program being run by the master controller need only issue a command to synthesize a desired sentence which is identified by its index number.
- Control is then passed to the sentence synthesizer which retrieves the sentence content data and parses through it to carry out the process of synthesizing the sentence
- the language specific information is expressed in terms of the sentence content, i.e., as data, it can be stored in a standard memory device instead of being expressed as code which runs on the controller This reduces the complexity of the application program while also reducing system costs and increasing the flexibility of the system.
- the controller need issue only a single command in order to synthesize an entire sentence (as opposed to currently available systems in which the controller must issue a command for each word to be synthesized), the controller can be used to perform or monitor other system functions during the synthesis process.
- controller 102 can issue a command to sentence synthesizer 105 to select a different portion of database 108 to use w hen retrieving sentence and word data, or database 108 may be replaced by a different memory device w hich contains the sentence content and words needed for the new language Because the sentence content tor the new language contains all of the language specific information required when synthesizing a sentence in that language the application program being executed by controller 102 does not have to be changed
- FIG. 5 is a block diagram of a telephone answering machine w hich incorporates the speech synthesizer chip of the present invention
- Fig 5 represents a ty pical application of the present invention, a is only one example of many environments in which the present invention may he utilized to prov ide efficient multi-language speech synthesis capabilities
- System controller 102 In order to retrieve messages from a telephone answering machine, a user depresses a key on keypad 401. System controller 102 decodes which key has been depressed and translates the keystroke into an action to be implemented by the speech synthesizer If. for example, the action is to announce the current time, system controller 102 will issue a command to module interlace 402 to synthesize a particular sentence from sentence database 110. which is pan of language database 108 The sentence to be synthesized is identified by its index number, n. Module interface 402 sends the sentence index to sentence and word synthesizer 104 As previously described with reference to F igs 3-4.
- sentence synthesizer module 105 of synthesizer 104 will retrieve the sentence definition corresponding to the sentence with index n from sentence section 1 10 of database 108. decode its content, and convert it to a series of words to be synthesized or control terms, where the words may include the steps of conv erting numbers into the equivalent words
- the decoded words which are to be spoken are passed to the word synthesizer 106 module of synthesizer 104, along with instructions to synthesize those words Word sy nthesizer 106 retrieves the desired words from the word section 109 of database 108.
- Codec module 403 is under the control of module interlace 402 and is responsible for performing the digital-to-analog and analog-to-digital conversion functions required by the system Codec module 403 converts the decompressed digital samples to analog signals which are then produced as audible speech by means of a loudspeaker 408 If desired, the sentence can also be displayed visually by means of a display 409
- a request to the answering machine is provided by means of a signal transmitted over a telephone line instead of keypad 401. that request enters the system via telephone line interface 404
- the incoming signal is passed by interface 404 to analog multiplexer 405 which controls the input, output, and processing of analog signals
- Analog multiplexer 405 sends the signal to codec module 403 which reads the signal and converts it to digital form
- a digital signal processing (DSP) and systems function module 406 decodes the signal read by codec module 403 and determines if the decoded signal corresponds to an instruction the system is designed to recognize If so module interface 402 informs system controller 102 what the instruction or digit represented by the incoming signal is. and controller 102 then implements that instruction as previously described
- DSP and systems functions module 406 can also perform other functions such as voice compression and decompression, tone generation and detection real-time clock generation, memory management, etc
- most telephone answ ering systems include a microphone 407 for recording messages, and a loudspeaker 408 for playing back messages and the synthesized speech
- analog multiplexer 405 controls the input output function w hich cause analog signals to enter the other system modules or cause analog signals to oe produced by the system
- a display 409 may also be included to visual ly display system information or messages to the user
- the sentence synthesizer module 105 can be modified depending upon the application bv altering the set of variables which are recognized automatically and the se t of grammar rules
- the set of v ariables may be expanded to account for a larger set of numbe r s w hile the grammar rules w hich are usuallv encoded as control terms in a sentence, can be chanced to permit the sy nthesis of oiher language or of additional aspects of the same language
- Such modificat ions allow the speech sy nthesizer svstem the present invention to more efficiently adapt to new uses or marke ts in w hich a product incorporating the svstem will be sold
- the structure of the sentence database data table can be expressed as tw o columns a first column containing an index number for the line of sentence data w ords contained in the second column and a corresponding line of data words which define the contents ol the sentence
- the line of data words can be expressed as a sequence of numbers in a fixed-size bnurv representation Each number corresponds to an index for an entry representing the sampled spoken form ol a particular w ord stored in w ord section 109 of database 108.
- sentence synthesizer 105 reads each data word in the line of data words retriev ed trom sentence section 1 10. it either instructs word synthesizer 106 to retrieve a particular word from word section 109 of database 108 and produce that word as spoken speech implement a conditional test or other instruction defined bv a control word, or point to a designated number table containing the word indices for the spoken word equivalents of a variable in the sentence If a number is spoken differently depending upon the context (as in the number one being spoken as "one" or "first"), a different number table should be constructed for each context.
- the entries in a number table represent the index numbers for the words contained in word section 109 which correspond to the spoken words for the number to be synthesized
- Control words can be used to determine whether the word "AM” or "PM” should be used in a time announcement, whether a singular or plural term should be used (and point to the appropriate word in the word section of the database), select the proper day of the week to announce, etc.
- the option codes can be used to select the appropriate number table, determine whether the time is announced in 12 or 24 hour format, or perform other functions which involve synthesizing words for numbers.
- the data representing the various indices and the digital data representing the spoken words is burned into a ROM
- a memory device can store both the data representing the word samples for each word to be spoken, and the various links which allow sentence synthesizer 105 to control how a sentence is produced If it is desired to have the synthesizer be able to produce speech in more than one language a different ROM or section of an existing ROM should be used to store those words
- the database structure and design of the sentence synthesizer of the present invention permit multiple languages to be produced by a controller running a single application program
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1019960703143A KR960706671A (en) | 1994-10-14 | 1995-10-16 | SPEECH SYNTHESIS APPARATUS AND METHOD FOR SYNTHESIZING A FINITE SET OF SENTENCES AND NUMBERS USING ONE PROGRAM |
EP95937434A EP0734568A1 (en) | 1994-10-14 | 1995-10-16 | Speech synthesis apparatus and method for synthesizing a finite set of sentences and numbers using one program |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US32313694A | 1994-10-14 | 1994-10-14 | |
US08/323,136 | 1994-10-14 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO1996012271A1 true WO1996012271A1 (en) | 1996-04-25 |
WO1996012271A9 WO1996012271A9 (en) | 1996-07-04 |
Family
ID=23257867
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1995/013134 WO1996012271A1 (en) | 1994-10-14 | 1995-10-16 | Speech synthesis apparatus and method for synthesizing a finite set of sentences and numbers using one program |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP0734568A1 (en) |
KR (1) | KR960706671A (en) |
WO (1) | WO1996012271A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1997007499A2 (en) * | 1995-08-14 | 1997-02-27 | Philips Electronics N.V. | A method and device for preparing and using diphones for multilingual text-to-speech generating |
WO2001006489A1 (en) * | 1999-07-21 | 2001-01-25 | Lucent Technologies Inc. | Improved text to speech conversion |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0251296A2 (en) * | 1986-06-30 | 1988-01-07 | Wang Laboratories Inc. | Portable communication terminal for remote data query |
EP0450533A2 (en) * | 1990-03-31 | 1991-10-09 | Gold Star Co. Ltd | Speech synthesis by segmentation on linear formant transition region |
EP0606520A2 (en) * | 1993-01-15 | 1994-07-20 | ALCATEL ITALIA S.p.A. | Method of implementing intonation curves for vocal messages, and speech synthesis method and system using the same |
-
1995
- 1995-10-16 KR KR1019960703143A patent/KR960706671A/en not_active Application Discontinuation
- 1995-10-16 EP EP95937434A patent/EP0734568A1/en not_active Withdrawn
- 1995-10-16 WO PCT/US1995/013134 patent/WO1996012271A1/en not_active Application Discontinuation
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0251296A2 (en) * | 1986-06-30 | 1988-01-07 | Wang Laboratories Inc. | Portable communication terminal for remote data query |
EP0450533A2 (en) * | 1990-03-31 | 1991-10-09 | Gold Star Co. Ltd | Speech synthesis by segmentation on linear formant transition region |
EP0606520A2 (en) * | 1993-01-15 | 1994-07-20 | ALCATEL ITALIA S.p.A. | Method of implementing intonation curves for vocal messages, and speech synthesis method and system using the same |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1997007499A2 (en) * | 1995-08-14 | 1997-02-27 | Philips Electronics N.V. | A method and device for preparing and using diphones for multilingual text-to-speech generating |
WO1997007499A3 (en) * | 1995-08-14 | 1997-04-03 | Philips Electronics Nv | A method and device for preparing and using diphones for multilingual text-to-speech generating |
WO2001006489A1 (en) * | 1999-07-21 | 2001-01-25 | Lucent Technologies Inc. | Improved text to speech conversion |
Also Published As
Publication number | Publication date |
---|---|
EP0734568A1 (en) | 1996-10-02 |
KR960706671A (en) | 1996-12-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111566656B (en) | Speech translation method and system using multi-language text speech synthesis model | |
US6801897B2 (en) | Method of providing concise forms of natural commands | |
US5878393A (en) | High quality concatenative reading system | |
US8825486B2 (en) | Method and apparatus for generating synthetic speech with contrastive stress | |
EP0140777B1 (en) | Process for encoding speech and an apparatus for carrying out the process | |
KR900009170B1 (en) | Synthesis-by-rule type synthesis system | |
US8775185B2 (en) | Speech samples library for text-to-speech and methods and apparatus for generating and using same | |
US8914291B2 (en) | Method and apparatus for generating synthetic speech with contrastive stress | |
JP5198046B2 (en) | Voice processing apparatus and program thereof | |
US20040141597A1 (en) | Method for enabling the voice interaction with a web page | |
US4455615A (en) | Intonation-varying audio output device in electronic translator | |
EP0734568A1 (en) | Speech synthesis apparatus and method for synthesizing a finite set of sentences and numbers using one program | |
Olive | A scheme for concatenating units for speech synthesis | |
WO1996012271A9 (en) | Speech synthesis apparatus and method for synthesizing a finite set of sentences and numbers using one program | |
JP4649207B2 (en) | A method of natural language recognition based on generated phrase structure grammar | |
van Leeuwen et al. | Speech Maker: a flexible and general framework for text-to-speech synthesis, and its application to Dutch | |
Ouh-Young et al. | A Chinese text-to-speech system based upon a syllable concatenation model | |
KR0175249B1 (en) | How to process pronunciation of Korean sentences for speech synthesis | |
Tatham et al. | Prosodic Assignment in Spruce Text to Speech Synthesis | |
KR100292376B1 (en) | Device and method for converting sentence | |
Hertz et al. | A look at the SRS synthesis rules for Japanese | |
Malcangi et al. | Toward languageindependent text-to-speech synthesis | |
JPH01119822A (en) | Sentence reader | |
CN113889112A (en) | On-line voice recognition method based on kaldi | |
JPH01300334A (en) | Sentence read-aloud device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): DE KR |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1995937434 Country of ref document: EP |
|
COP | Corrected version of pamphlet |
Free format text: PAGES 1-7,DESCRIPTION,REPLACED BY NEW PAGES 1-8;PAGES 8 AND 9,CLAIMS,REPLACED BY NEW PAGES 9 AND 10;PAGES 1/5-5/5,DRAWINGS,REPLACED BY NEW PAGES 1/4-4/4;DUE TO LATE TRANSMITTAL BY THE RECEIVING OFFICE |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWP | Wipo information: published in national office |
Ref document number: 1995937434 Country of ref document: EP |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 1995937434 Country of ref document: EP |