WO2007103520A3 - Codebook-less speech conversion method and system - Google Patents

Codebook-less speech conversion method and system Download PDF

Info

Publication number
WO2007103520A3
WO2007103520A3 PCT/US2007/005962 US2007005962W WO2007103520A3 WO 2007103520 A3 WO2007103520 A3 WO 2007103520A3 US 2007005962 W US2007005962 W US 2007005962W WO 2007103520 A3 WO2007103520 A3 WO 2007103520A3
Authority
WO
WIPO (PCT)
Prior art keywords
target
source
speaker
utterance
frames
Prior art date
Application number
PCT/US2007/005962
Other languages
French (fr)
Other versions
WO2007103520A2 (en
Inventor
Oytun Turk
Levent Arslan
Fred Deutsch
Original Assignee
Voxonic Inc
Oytun Turk
Levent Arslan
Fred Deutsch
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Voxonic Inc, Oytun Turk, Levent Arslan, Fred Deutsch filed Critical Voxonic Inc
Publication of WO2007103520A2 publication Critical patent/WO2007103520A2/en
Publication of WO2007103520A3 publication Critical patent/WO2007103520A3/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • G10L21/013Adapting to target pitch
    • G10L2021/0135Voice conversion or morphing

Abstract

The conversion of speech can be used to transform an utterance by a source speaker to match the speech characteristic of a target speaker, for applications such as dubbing a motion picture. During a training phase, utterances corresponding to the same sentences by both the target speaker and source speaker are force aligned according to the phonemes within the sentences. A transformation or mapping is trained so that each frame of the source utterances is mapped to a corresponding frame of the target utterance. After the completion of the training phase, a source utterance is divided into frames, which are transformed into target frames. After all target frames are created from the sequence of frames from the source utterance, a target utterance is created having the speech of the source speaker, but with the vocal characteristics of the target speaker.
PCT/US2007/005962 2006-03-08 2007-03-07 Codebook-less speech conversion method and system WO2007103520A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US11/370,682 US20070213987A1 (en) 2006-03-08 2006-03-08 Codebook-less speech conversion method and system
US11/370,682 2006-03-08

Publications (2)

Publication Number Publication Date
WO2007103520A2 WO2007103520A2 (en) 2007-09-13
WO2007103520A3 true WO2007103520A3 (en) 2008-03-27

Family

ID=38475569

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2007/005962 WO2007103520A2 (en) 2006-03-08 2007-03-07 Codebook-less speech conversion method and system

Country Status (2)

Country Link
US (1) US20070213987A1 (en)
WO (1) WO2007103520A2 (en)

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7809145B2 (en) * 2006-05-04 2010-10-05 Sony Computer Entertainment Inc. Ultra small microphone array
US7783061B2 (en) 2003-08-27 2010-08-24 Sony Computer Entertainment Inc. Methods and apparatus for the targeted sound detection
US8073157B2 (en) * 2003-08-27 2011-12-06 Sony Computer Entertainment Inc. Methods and apparatus for targeted sound detection and characterization
US8947347B2 (en) 2003-08-27 2015-02-03 Sony Computer Entertainment Inc. Controlling actions in a video game unit
US8139793B2 (en) 2003-08-27 2012-03-20 Sony Computer Entertainment Inc. Methods and apparatus for capturing audio signals based on a visual image
US8160269B2 (en) 2003-08-27 2012-04-17 Sony Computer Entertainment Inc. Methods and apparatuses for adjusting a listening area for capturing sounds
US8233642B2 (en) 2003-08-27 2012-07-31 Sony Computer Entertainment Inc. Methods and apparatuses for capturing an audio signal based on a location of the signal
US7803050B2 (en) 2002-07-27 2010-09-28 Sony Computer Entertainment Inc. Tracking device with sound emitter for use in obtaining information for controlling game program execution
US9174119B2 (en) 2002-07-27 2015-11-03 Sony Computer Entertainement America, LLC Controller for providing inputs to control execution of a program when inputs are combined
US20080082320A1 (en) * 2006-09-29 2008-04-03 Nokia Corporation Apparatus, method and computer program product for advanced voice conversion
US20080120115A1 (en) * 2006-11-16 2008-05-22 Xiao Dong Mao Methods and apparatuses for dynamically adjusting an audio signal based on a parameter
US8131549B2 (en) * 2007-05-24 2012-03-06 Microsoft Corporation Personality-based device
DE102009013020A1 (en) * 2009-03-16 2010-09-23 Hayo Becks Apparatus and method for adapting sound images
US8340965B2 (en) * 2009-09-02 2012-12-25 Microsoft Corporation Rich context modeling for text-to-speech engines
CN102063899B (en) * 2010-10-27 2012-05-23 南京邮电大学 Method for voice conversion under unparallel text condition
US8594993B2 (en) 2011-04-04 2013-11-26 Microsoft Corporation Frame mapping approach for cross-lingual voice transformation
CN103280224B (en) * 2013-04-24 2015-09-16 东南大学 Based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm
US9640185B2 (en) * 2013-12-12 2017-05-02 Motorola Solutions, Inc. Method and apparatus for enhancing the modulation index of speech sounds passed through a digital vocoder
US10127916B2 (en) * 2014-04-24 2018-11-13 Motorola Solutions, Inc. Method and apparatus for enhancing alveolar trill
US9659564B2 (en) * 2014-10-24 2017-05-23 Sestek Ses Ve Iletisim Bilgisayar Teknolojileri Sanayi Ticaret Anonim Sirketi Speaker verification based on acoustic behavioral characteristics of the speaker
US10176819B2 (en) * 2016-07-11 2019-01-08 The Chinese University Of Hong Kong Phonetic posteriorgrams for many-to-one voice conversion
WO2018090356A1 (en) * 2016-11-21 2018-05-24 Microsoft Technology Licensing, Llc Automatic dubbing method and apparatus
US11195507B2 (en) * 2018-10-04 2021-12-07 Rovi Guides, Inc. Translating between spoken languages with emotion in audio and video media streams
WO2020188101A1 (en) * 2019-03-20 2020-09-24 Piksel, Inc A method and system for content internationalization & localisation
US11238888B2 (en) * 2019-12-31 2022-02-01 Netflix, Inc. System and methods for automatically mixing audio for acoustic scenes
CN112750446A (en) * 2020-12-30 2021-05-04 标贝(北京)科技有限公司 Voice conversion method, device and system and storage medium
CN116798405B (en) * 2023-08-28 2023-10-24 世优(北京)科技有限公司 Speech synthesis method, device, storage medium and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5230037A (en) * 1990-10-16 1993-07-20 International Business Machines Corporation Phonetic hidden markov model speech synthesizer
US5327521A (en) * 1992-03-02 1994-07-05 The Walt Disney Company Speech transformation system
US5642466A (en) * 1993-01-21 1997-06-24 Apple Computer, Inc. Intonation adjustment in text-to-speech systems

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0970466B1 (en) * 1997-01-27 2004-09-22 Microsoft Corporation Voice conversion
US6336092B1 (en) * 1997-04-28 2002-01-01 Ivl Technologies Ltd Targeted vocal transformation
US6836761B1 (en) * 1999-10-21 2004-12-28 Yamaha Corporation Voice converter for assimilation by frame synthesis with temporal alignment
US6463412B1 (en) * 1999-12-16 2002-10-08 International Business Machines Corporation High performance voice transformation apparatus and method
FR2868587A1 (en) * 2004-03-31 2005-10-07 France Telecom METHOD AND SYSTEM FOR RAPID CONVERSION OF A VOICE SIGNAL

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5230037A (en) * 1990-10-16 1993-07-20 International Business Machines Corporation Phonetic hidden markov model speech synthesizer
US5327521A (en) * 1992-03-02 1994-07-05 The Walt Disney Company Speech transformation system
US5642466A (en) * 1993-01-21 1997-06-24 Apple Computer, Inc. Intonation adjustment in text-to-speech systems

Also Published As

Publication number Publication date
WO2007103520A2 (en) 2007-09-13
US20070213987A1 (en) 2007-09-13

Similar Documents

Publication Publication Date Title
WO2007103520A3 (en) Codebook-less speech conversion method and system
WO2006053256A3 (en) Speech conversion system and method
WO2008038082A3 (en) Prosody conversion
EP3739477A4 (en) Speech translation method and system using multilingual text-to-speech synthesis model
WO2008142836A1 (en) Voice tone converting device and voice tone converting method
EP4318463A3 (en) Multi-modal input on an electronic device
WO2009006081A3 (en) Pronunciation correction of text-to-speech systems between different spoken languages
WO2011133766A3 (en) Methods and systems for training dictation-based speech-to-text systems using recorded samples
WO2007129156A3 (en) Soft alignment in gaussian mixture model based transformation
WO2006023631A3 (en) Document transcription system training
EP4016526A4 (en) Sound conversion system and training method for same
TW200601263A (en) Apparatus and method for synthesized audible response to an utterance in speaker-independent voice recognition
WO2015009586A3 (en) Performing an operation relative to tabular data based upon voice input
WO2008118195A3 (en) System and method for a cooperative conversational voice user interface
WO2012036424A3 (en) Method and apparatus for performing microphone beamforming
EP1291848A3 (en) Multilingual pronunciations for speech recognition
WO2006122161A3 (en) Comprephension instruction system and method
AU2003217013A1 (en) System for estimating parameters of a gaussian mixture model
WO2010041131A8 (en) Associating source information with phonetic indices
WO2006070373A3 (en) A system and a method for representing unrecognized words in speech to text conversions as syllables
WO2007140047A3 (en) Grammar adaptation through cooperative client and server based speech recognition
WO2005099414A8 (en) Comprehensive spoken language learning system
WO2007120418A3 (en) Electronic multilingual numeric and language learning tool
WO2009114499A3 (en) Methods and devices for language skill development
WO2006076280A3 (en) Method and system for assessing pronunciation difficulties of non-native speakers

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
122 Ep: pct application non-entry in european phase

Ref document number: 07752646

Country of ref document: EP

Kind code of ref document: A2