WO2007103520A3 - Codebook-less speech conversion method and system - Google Patents
Codebook-less speech conversion method and system Download PDFInfo
- Publication number
- WO2007103520A3 WO2007103520A3 PCT/US2007/005962 US2007005962W WO2007103520A3 WO 2007103520 A3 WO2007103520 A3 WO 2007103520A3 US 2007005962 W US2007005962 W US 2007005962W WO 2007103520 A3 WO2007103520 A3 WO 2007103520A3
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- target
- source
- speaker
- utterance
- frames
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
- G10L2021/0135—Voice conversion or morphing
Abstract
The conversion of speech can be used to transform an utterance by a source speaker to match the speech characteristic of a target speaker, for applications such as dubbing a motion picture. During a training phase, utterances corresponding to the same sentences by both the target speaker and source speaker are force aligned according to the phonemes within the sentences. A transformation or mapping is trained so that each frame of the source utterances is mapped to a corresponding frame of the target utterance. After the completion of the training phase, a source utterance is divided into frames, which are transformed into target frames. After all target frames are created from the sequence of frames from the source utterance, a target utterance is created having the speech of the source speaker, but with the vocal characteristics of the target speaker.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/370,682 US20070213987A1 (en) | 2006-03-08 | 2006-03-08 | Codebook-less speech conversion method and system |
US11/370,682 | 2006-03-08 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2007103520A2 WO2007103520A2 (en) | 2007-09-13 |
WO2007103520A3 true WO2007103520A3 (en) | 2008-03-27 |
Family
ID=38475569
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2007/005962 WO2007103520A2 (en) | 2006-03-08 | 2007-03-07 | Codebook-less speech conversion method and system |
Country Status (2)
Country | Link |
---|---|
US (1) | US20070213987A1 (en) |
WO (1) | WO2007103520A2 (en) |
Families Citing this family (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7809145B2 (en) * | 2006-05-04 | 2010-10-05 | Sony Computer Entertainment Inc. | Ultra small microphone array |
US7783061B2 (en) | 2003-08-27 | 2010-08-24 | Sony Computer Entertainment Inc. | Methods and apparatus for the targeted sound detection |
US8073157B2 (en) * | 2003-08-27 | 2011-12-06 | Sony Computer Entertainment Inc. | Methods and apparatus for targeted sound detection and characterization |
US8947347B2 (en) | 2003-08-27 | 2015-02-03 | Sony Computer Entertainment Inc. | Controlling actions in a video game unit |
US8139793B2 (en) | 2003-08-27 | 2012-03-20 | Sony Computer Entertainment Inc. | Methods and apparatus for capturing audio signals based on a visual image |
US8160269B2 (en) | 2003-08-27 | 2012-04-17 | Sony Computer Entertainment Inc. | Methods and apparatuses for adjusting a listening area for capturing sounds |
US8233642B2 (en) | 2003-08-27 | 2012-07-31 | Sony Computer Entertainment Inc. | Methods and apparatuses for capturing an audio signal based on a location of the signal |
US7803050B2 (en) | 2002-07-27 | 2010-09-28 | Sony Computer Entertainment Inc. | Tracking device with sound emitter for use in obtaining information for controlling game program execution |
US9174119B2 (en) | 2002-07-27 | 2015-11-03 | Sony Computer Entertainement America, LLC | Controller for providing inputs to control execution of a program when inputs are combined |
US20080082320A1 (en) * | 2006-09-29 | 2008-04-03 | Nokia Corporation | Apparatus, method and computer program product for advanced voice conversion |
US20080120115A1 (en) * | 2006-11-16 | 2008-05-22 | Xiao Dong Mao | Methods and apparatuses for dynamically adjusting an audio signal based on a parameter |
US8131549B2 (en) * | 2007-05-24 | 2012-03-06 | Microsoft Corporation | Personality-based device |
DE102009013020A1 (en) * | 2009-03-16 | 2010-09-23 | Hayo Becks | Apparatus and method for adapting sound images |
US8340965B2 (en) * | 2009-09-02 | 2012-12-25 | Microsoft Corporation | Rich context modeling for text-to-speech engines |
CN102063899B (en) * | 2010-10-27 | 2012-05-23 | 南京邮电大学 | Method for voice conversion under unparallel text condition |
US8594993B2 (en) | 2011-04-04 | 2013-11-26 | Microsoft Corporation | Frame mapping approach for cross-lingual voice transformation |
CN103280224B (en) * | 2013-04-24 | 2015-09-16 | 东南大学 | Based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm |
US9640185B2 (en) * | 2013-12-12 | 2017-05-02 | Motorola Solutions, Inc. | Method and apparatus for enhancing the modulation index of speech sounds passed through a digital vocoder |
US10127916B2 (en) * | 2014-04-24 | 2018-11-13 | Motorola Solutions, Inc. | Method and apparatus for enhancing alveolar trill |
US9659564B2 (en) * | 2014-10-24 | 2017-05-23 | Sestek Ses Ve Iletisim Bilgisayar Teknolojileri Sanayi Ticaret Anonim Sirketi | Speaker verification based on acoustic behavioral characteristics of the speaker |
US10176819B2 (en) * | 2016-07-11 | 2019-01-08 | The Chinese University Of Hong Kong | Phonetic posteriorgrams for many-to-one voice conversion |
WO2018090356A1 (en) * | 2016-11-21 | 2018-05-24 | Microsoft Technology Licensing, Llc | Automatic dubbing method and apparatus |
US11195507B2 (en) * | 2018-10-04 | 2021-12-07 | Rovi Guides, Inc. | Translating between spoken languages with emotion in audio and video media streams |
WO2020188101A1 (en) * | 2019-03-20 | 2020-09-24 | Piksel, Inc | A method and system for content internationalization & localisation |
US11238888B2 (en) * | 2019-12-31 | 2022-02-01 | Netflix, Inc. | System and methods for automatically mixing audio for acoustic scenes |
CN112750446A (en) * | 2020-12-30 | 2021-05-04 | 标贝(北京)科技有限公司 | Voice conversion method, device and system and storage medium |
CN116798405B (en) * | 2023-08-28 | 2023-10-24 | 世优(北京)科技有限公司 | Speech synthesis method, device, storage medium and electronic equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5230037A (en) * | 1990-10-16 | 1993-07-20 | International Business Machines Corporation | Phonetic hidden markov model speech synthesizer |
US5327521A (en) * | 1992-03-02 | 1994-07-05 | The Walt Disney Company | Speech transformation system |
US5642466A (en) * | 1993-01-21 | 1997-06-24 | Apple Computer, Inc. | Intonation adjustment in text-to-speech systems |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0970466B1 (en) * | 1997-01-27 | 2004-09-22 | Microsoft Corporation | Voice conversion |
US6336092B1 (en) * | 1997-04-28 | 2002-01-01 | Ivl Technologies Ltd | Targeted vocal transformation |
US6836761B1 (en) * | 1999-10-21 | 2004-12-28 | Yamaha Corporation | Voice converter for assimilation by frame synthesis with temporal alignment |
US6463412B1 (en) * | 1999-12-16 | 2002-10-08 | International Business Machines Corporation | High performance voice transformation apparatus and method |
FR2868587A1 (en) * | 2004-03-31 | 2005-10-07 | France Telecom | METHOD AND SYSTEM FOR RAPID CONVERSION OF A VOICE SIGNAL |
-
2006
- 2006-03-08 US US11/370,682 patent/US20070213987A1/en not_active Abandoned
-
2007
- 2007-03-07 WO PCT/US2007/005962 patent/WO2007103520A2/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5230037A (en) * | 1990-10-16 | 1993-07-20 | International Business Machines Corporation | Phonetic hidden markov model speech synthesizer |
US5327521A (en) * | 1992-03-02 | 1994-07-05 | The Walt Disney Company | Speech transformation system |
US5642466A (en) * | 1993-01-21 | 1997-06-24 | Apple Computer, Inc. | Intonation adjustment in text-to-speech systems |
Also Published As
Publication number | Publication date |
---|---|
WO2007103520A2 (en) | 2007-09-13 |
US20070213987A1 (en) | 2007-09-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2007103520A3 (en) | Codebook-less speech conversion method and system | |
WO2006053256A3 (en) | Speech conversion system and method | |
WO2008038082A3 (en) | Prosody conversion | |
EP3739477A4 (en) | Speech translation method and system using multilingual text-to-speech synthesis model | |
WO2008142836A1 (en) | Voice tone converting device and voice tone converting method | |
EP4318463A3 (en) | Multi-modal input on an electronic device | |
WO2009006081A3 (en) | Pronunciation correction of text-to-speech systems between different spoken languages | |
WO2011133766A3 (en) | Methods and systems for training dictation-based speech-to-text systems using recorded samples | |
WO2007129156A3 (en) | Soft alignment in gaussian mixture model based transformation | |
WO2006023631A3 (en) | Document transcription system training | |
EP4016526A4 (en) | Sound conversion system and training method for same | |
TW200601263A (en) | Apparatus and method for synthesized audible response to an utterance in speaker-independent voice recognition | |
WO2015009586A3 (en) | Performing an operation relative to tabular data based upon voice input | |
WO2008118195A3 (en) | System and method for a cooperative conversational voice user interface | |
WO2012036424A3 (en) | Method and apparatus for performing microphone beamforming | |
EP1291848A3 (en) | Multilingual pronunciations for speech recognition | |
WO2006122161A3 (en) | Comprephension instruction system and method | |
AU2003217013A1 (en) | System for estimating parameters of a gaussian mixture model | |
WO2010041131A8 (en) | Associating source information with phonetic indices | |
WO2006070373A3 (en) | A system and a method for representing unrecognized words in speech to text conversions as syllables | |
WO2007140047A3 (en) | Grammar adaptation through cooperative client and server based speech recognition | |
WO2005099414A8 (en) | Comprehensive spoken language learning system | |
WO2007120418A3 (en) | Electronic multilingual numeric and language learning tool | |
WO2009114499A3 (en) | Methods and devices for language skill development | |
WO2006076280A3 (en) | Method and system for assessing pronunciation difficulties of non-native speakers |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
NENP | Non-entry into the national phase |
Ref country code: DE |
|
DPE1 | Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101) | ||
122 | Ep: pct application non-entry in european phase |
Ref document number: 07752646 Country of ref document: EP Kind code of ref document: A2 |