CA2423144A1 - Automatic segmentation in speech synthesis - Google Patents
Automatic segmentation in speech synthesis Download PDFInfo
- Publication number
- CA2423144A1 CA2423144A1 CA002423144A CA2423144A CA2423144A1 CA 2423144 A1 CA2423144 A1 CA 2423144A1 CA 002423144 A CA002423144 A CA 002423144A CA 2423144 A CA2423144 A CA 2423144A CA 2423144 A1 CA2423144 A1 CA 2423144A1
- Authority
- CA
- Canada
- Prior art keywords
- phone
- labels
- hmms
- corrected
- speech synthesis
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/06—Elementary speech units used in speech synthesisers; Concatenation rules
Abstract
Systems and methods for automatically segmenting speech inventories. A set of Hidden Markov Models (HMMs) are initialized using bootstrap data. The HMMs are next re-estimated and aligned to produce phone labels. The phone boundaries of the phone labels are then corrected using spectral boundary correction. Optionally, this process of using the spectral-boundary-corrected phone labels as input instead of the bootstrap data is performed iteratively in order to further reduce mismatches between manual labels and phone labels assigned by the HMM approach.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US36904302P | 2002-03-29 | 2002-03-29 | |
US60/369,043 | 2002-03-29 | ||
US10/341,869 | 2003-01-14 | ||
US10/341,869 US7266497B2 (en) | 2002-03-29 | 2003-01-14 | Automatic segmentation in speech synthesis |
Publications (2)
Publication Number | Publication Date |
---|---|
CA2423144A1 true CA2423144A1 (en) | 2003-09-29 |
CA2423144C CA2423144C (en) | 2009-06-23 |
Family
ID=28457009
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA002423144A Expired - Lifetime CA2423144C (en) | 2002-03-29 | 2003-03-21 | Automatic segmentation in speech synthesis |
Country Status (4)
Country | Link |
---|---|
US (3) | US7266497B2 (en) |
EP (1) | EP1394769B1 (en) |
CA (1) | CA2423144C (en) |
DE (1) | DE60336102D1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114547551A (en) * | 2022-02-23 | 2022-05-27 | 阿波罗智能技术(北京)有限公司 | Pavement data acquisition method based on vehicle reported data and cloud server |
Families Citing this family (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7369994B1 (en) | 1999-04-30 | 2008-05-06 | At&T Corp. | Methods and apparatus for rapid acoustic unit selection from a large speech corpus |
US6684187B1 (en) * | 2000-06-30 | 2004-01-27 | At&T Corp. | Method and system for preselection of suitable units for concatenative speech |
US6505158B1 (en) * | 2000-07-05 | 2003-01-07 | At&T Corp. | Synthesis-based pre-selection of suitable units for concatenative speech |
US7266497B2 (en) * | 2002-03-29 | 2007-09-04 | At&T Corp. | Automatic segmentation in speech synthesis |
JP4150645B2 (en) * | 2003-08-27 | 2008-09-17 | 株式会社ケンウッド | Audio labeling error detection device, audio labeling error detection method and program |
TWI220511B (en) * | 2003-09-12 | 2004-08-21 | Ind Tech Res Inst | An automatic speech segmentation and verification system and its method |
US7496512B2 (en) * | 2004-04-13 | 2009-02-24 | Microsoft Corporation | Refining of segmental boundaries in speech waveforms using contextual-dependent models |
US20070203706A1 (en) * | 2005-12-30 | 2007-08-30 | Inci Ozkaragoz | Voice analysis tool for creating database used in text to speech synthesis system |
JP4246790B2 (en) * | 2006-06-05 | 2009-04-02 | パナソニック株式会社 | Speech synthesizer |
US9620117B1 (en) * | 2006-06-27 | 2017-04-11 | At&T Intellectual Property Ii, L.P. | Learning from interactions for a spoken dialog system |
US20080027725A1 (en) * | 2006-07-26 | 2008-01-31 | Microsoft Corporation | Automatic Accent Detection With Limited Manually Labeled Data |
US20080077407A1 (en) * | 2006-09-26 | 2008-03-27 | At&T Corp. | Phonetically enriched labeling in unit selection speech synthesis |
US8321222B2 (en) * | 2007-08-14 | 2012-11-27 | Nuance Communications, Inc. | Synthesis by generation and concatenation of multi-form segments |
CA2657087A1 (en) * | 2008-03-06 | 2009-09-06 | David N. Fernandes | Normative database system and method |
US8095365B2 (en) * | 2008-12-04 | 2012-01-10 | At&T Intellectual Property I, L.P. | System and method for increasing recognition rates of in-vocabulary words by improving pronunciation modeling |
JP5457706B2 (en) * | 2009-03-30 | 2014-04-02 | 株式会社東芝 | Speech model generation device, speech synthesis device, speech model generation program, speech synthesis program, speech model generation method, and speech synthesis method |
US8457965B2 (en) * | 2009-10-06 | 2013-06-04 | Rothenberg Enterprises | Method for the correction of measured values of vowel nasalance |
US8630971B2 (en) * | 2009-11-20 | 2014-01-14 | Indian Institute Of Science | System and method of using Multi Pattern Viterbi Algorithm for joint decoding of multiple patterns |
US20140074465A1 (en) * | 2012-09-11 | 2014-03-13 | Delphi Technologies, Inc. | System and method to generate a narrator specific acoustic database without a predefined script |
US20140244240A1 (en) * | 2013-02-27 | 2014-08-28 | Hewlett-Packard Development Company, L.P. | Determining Explanatoriness of a Segment |
US9646613B2 (en) * | 2013-11-29 | 2017-05-09 | Daon Holdings Limited | Methods and systems for splitting a digital signal |
US9240178B1 (en) * | 2014-06-26 | 2016-01-19 | Amazon Technologies, Inc. | Text-to-speech processing using pre-stored results |
US9972300B2 (en) * | 2015-06-11 | 2018-05-15 | Genesys Telecommunications Laboratories, Inc. | System and method for outlier identification to remove poor alignments in speech synthesis |
CN105513597B (en) * | 2015-12-30 | 2018-07-10 | 百度在线网络技术(北京)有限公司 | Voiceprint processing method and processing device |
CN108053828A (en) * | 2017-12-25 | 2018-05-18 | 无锡小天鹅股份有限公司 | Determine the method, apparatus and household electrical appliance of control instruction |
CN110136691B (en) * | 2019-05-28 | 2021-09-28 | 广州多益网络股份有限公司 | Speech synthesis model training method and device, electronic equipment and storage medium |
Family Cites Families (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5390278A (en) * | 1991-10-08 | 1995-02-14 | Bell Canada | Phoneme based speech recognition |
DE69322894T2 (en) * | 1992-03-02 | 1999-07-29 | At & T Corp | Learning method and device for speech recognition |
US5317673A (en) * | 1992-06-22 | 1994-05-31 | Sri International | Method and apparatus for context-dependent estimation of multiple probability distributions of phonetic classes with multilayer perceptrons in a speech recognition system |
JP3272842B2 (en) * | 1992-12-17 | 2002-04-08 | ゼロックス・コーポレーション | Processor-based decision method |
US5623609A (en) * | 1993-06-14 | 1997-04-22 | Hal Trust, L.L.C. | Computer system and computer-implemented process for phonology-based automatic speech recognition |
JP3450411B2 (en) * | 1994-03-22 | 2003-09-22 | キヤノン株式会社 | Voice information processing method and apparatus |
US5655058A (en) * | 1994-04-12 | 1997-08-05 | Xerox Corporation | Segmentation of audio data for indexing of conversational speech for real-time or postprocessing applications |
US5625749A (en) * | 1994-08-22 | 1997-04-29 | Massachusetts Institute Of Technology | Segment-based apparatus and method for speech recognition by analyzing multiple speech unit frames and modeling both temporal and spatial correlation |
US5687287A (en) * | 1995-05-22 | 1997-11-11 | Lucent Technologies Inc. | Speaker verification method and apparatus using mixture decomposition discrimination |
JP3453456B2 (en) * | 1995-06-19 | 2003-10-06 | キヤノン株式会社 | State sharing model design method and apparatus, and speech recognition method and apparatus using the state sharing model |
JP2871561B2 (en) * | 1995-11-30 | 1999-03-17 | 株式会社エイ・ティ・アール音声翻訳通信研究所 | Unspecified speaker model generation device and speech recognition device |
WO1997032299A1 (en) * | 1996-02-27 | 1997-09-04 | Philips Electronics N.V. | Method and apparatus for automatic speech segmentation into phoneme-like units |
US5913193A (en) * | 1996-04-30 | 1999-06-15 | Microsoft Corporation | Method and system of runtime acoustic unit selection for speech synthesis |
US6076057A (en) * | 1997-05-21 | 2000-06-13 | At&T Corp | Unsupervised HMM adaptation based on speech-silence discrimination |
US5913192A (en) * | 1997-08-22 | 1999-06-15 | At&T Corp | Speaker identification with user-selected password phrases |
US6317716B1 (en) * | 1997-09-19 | 2001-11-13 | Massachusetts Institute Of Technology | Automatic cueing of speech |
US6163769A (en) * | 1997-10-02 | 2000-12-19 | Microsoft Corporation | Text-to-speech using clustered context-dependent phoneme-based units |
US6202047B1 (en) * | 1998-03-30 | 2001-03-13 | At&T Corp. | Method and apparatus for speech recognition using second order statistics and linear estimation of cepstral coefficients |
US6292778B1 (en) * | 1998-10-30 | 2001-09-18 | Lucent Technologies Inc. | Task-independent utterance verification with subword-based minimum verification error training |
JP2002530703A (en) * | 1998-11-13 | 2002-09-17 | ルノー・アンド・オスピー・スピーチ・プロダクツ・ナームローゼ・ベンノートシャープ | Speech synthesis using concatenation of speech waveforms |
WO2000054254A1 (en) * | 1999-03-08 | 2000-09-14 | Siemens Aktiengesellschaft | Method and array for determining a representative phoneme |
US6202049B1 (en) | 1999-03-09 | 2001-03-13 | Matsushita Electric Industrial Co., Ltd. | Identification of unit overlap regions for concatenative speech synthesis system |
US6539354B1 (en) * | 2000-03-24 | 2003-03-25 | Fluent Speech Technologies, Inc. | Methods and devices for producing and using synthetic visual speech based on natural coarticulation |
US7120575B2 (en) * | 2000-04-08 | 2006-10-10 | International Business Machines Corporation | Method and system for the automatic segmentation of an audio stream into semantic or syntactic units |
US7165030B2 (en) * | 2001-09-17 | 2007-01-16 | Massachusetts Institute Of Technology | Concatenative speech synthesis using a finite-state transducer |
US6965861B1 (en) * | 2001-11-20 | 2005-11-15 | Burning Glass Technologies, Llc | Method for improving results in an HMM-based segmentation system by incorporating external knowledge |
US7266497B2 (en) * | 2002-03-29 | 2007-09-04 | At&T Corp. | Automatic segmentation in speech synthesis |
US6928407B2 (en) * | 2002-03-29 | 2005-08-09 | International Business Machines Corporation | System and method for the automatic discovery of salient segments in speech transcripts |
US7089185B2 (en) * | 2002-06-27 | 2006-08-08 | Intel Corporation | Embedded multi-layer coupled hidden Markov model |
KR100486735B1 (en) * | 2003-02-28 | 2005-05-03 | 삼성전자주식회사 | Method of establishing optimum-partitioned classifed neural network and apparatus and method and apparatus for automatic labeling using optimum-partitioned classifed neural network |
US7664642B2 (en) * | 2004-03-17 | 2010-02-16 | University Of Maryland | System and method for automatic speech recognition from phonetic features and acoustic landmarks |
US7496512B2 (en) * | 2004-04-13 | 2009-02-24 | Microsoft Corporation | Refining of segmental boundaries in speech waveforms using contextual-dependent models |
-
2003
- 2003-01-14 US US10/341,869 patent/US7266497B2/en active Active
- 2003-03-21 CA CA002423144A patent/CA2423144C/en not_active Expired - Lifetime
- 2003-03-27 EP EP03100795A patent/EP1394769B1/en not_active Expired - Lifetime
- 2003-03-27 DE DE60336102T patent/DE60336102D1/en not_active Expired - Lifetime
-
2007
- 2007-08-01 US US11/832,262 patent/US7587320B2/en not_active Expired - Lifetime
-
2009
- 2009-08-20 US US12/544,576 patent/US8131547B2/en not_active Expired - Fee Related
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114547551A (en) * | 2022-02-23 | 2022-05-27 | 阿波罗智能技术(北京)有限公司 | Pavement data acquisition method based on vehicle reported data and cloud server |
CN114547551B (en) * | 2022-02-23 | 2023-08-29 | 阿波罗智能技术(北京)有限公司 | Road surface data acquisition method based on vehicle report data and cloud server |
Also Published As
Publication number | Publication date |
---|---|
EP1394769B1 (en) | 2011-02-23 |
US20090313025A1 (en) | 2009-12-17 |
EP1394769A3 (en) | 2004-06-09 |
US7266497B2 (en) | 2007-09-04 |
US20070271100A1 (en) | 2007-11-22 |
US7587320B2 (en) | 2009-09-08 |
EP1394769A2 (en) | 2004-03-03 |
US8131547B2 (en) | 2012-03-06 |
CA2423144C (en) | 2009-06-23 |
US20030187647A1 (en) | 2003-10-02 |
DE60336102D1 (en) | 2011-04-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA2423144A1 (en) | Automatic segmentation in speech synthesis | |
AU2003217013A1 (en) | System for estimating parameters of a gaussian mixture model | |
EP1050872A3 (en) | Method and system for selecting recognized words when correcting recognized speech | |
WO2007005098A3 (en) | Method and apparatus for generating and updating a voice tag | |
EP1569202A3 (en) | System and method for augmenting spoken language understanding by correcting common errors in linguistic performance | |
WO2004090866A3 (en) | Phonetically based speech recognition system and method | |
WO2000039788A3 (en) | Knowledge-based strategies applied to n-best lists in automatic speech recognition systems | |
AU2003296981A1 (en) | Techniques for disambiguating speech input using multimodal interfaces | |
GB2431492A (en) | Methods and apparatus for modifying process control data | |
WO2006060443A3 (en) | A system and method for improving recognition accuracy in speech recognition applications | |
AU7830300A (en) | Lpc-harmonic vocoder with superframe structure | |
CA2363561A1 (en) | Automated transcription system and method using two speech converting instances and computer-assisted correction | |
WO2007047587A3 (en) | Method and device for recognizing human intent | |
WO2004049305A3 (en) | Discriminative training of hidden markov models for continuous speech recognition | |
EP1465153A3 (en) | Method and apparatus for formant tracking using a residual model | |
EP1553560A4 (en) | Transmission device, transmission method, reception device, reception method, transmission/reception device, communication device, communication method, recording medium, and program | |
WO2004003697A3 (en) | Swine genetics business system | |
AU2002248398A1 (en) | Versioning method for business process models | |
Rodríguez et al. | Computer assisted transcription of speech | |
DE60219030D1 (en) | Method for multilingual speech recognition | |
US20050075143A1 (en) | Mobile communication terminal having voice recognition function, and phoneme modeling method and voice recognition method for the same | |
WO2007067837A3 (en) | Voice quality control for high quality speech reconstruction | |
NZ331430A (en) | Automatic speech recognition | |
Hori et al. | Language model adaptation using WFST-based speaking-style translation | |
CN202795705U (en) | Remote controller and intelligent control system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
EEER | Examination request | ||
MKEX | Expiry |
Effective date: 20230321 |