US20160232893A1 - Operating method for voice function and electronic device supporting the same - Google Patents

Operating method for voice function and electronic device supporting the same Download PDF

Info

Publication number
US20160232893A1
US20160232893A1 US15/017,957 US201615017957A US2016232893A1 US 20160232893 A1 US20160232893 A1 US 20160232893A1 US 201615017957 A US201615017957 A US 201615017957A US 2016232893 A1 US2016232893 A1 US 2016232893A1
Authority
US
United States
Prior art keywords
information
function
voice
speech information
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US15/017,957
Inventor
Chakladar SUBHOJIT
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics Co Ltd filed Critical Samsung Electronics Co Ltd
Assigned to SAMSUNG ELECTRONICS CO., LTD. reassignment SAMSUNG ELECTRONICS CO., LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SUBHOJIT, CHAKLADAR
Publication of US20160232893A1 publication Critical patent/US20160232893A1/en
Assigned to SAMSUNG ELECTRONICS CO., LTD. reassignment SAMSUNG ELECTRONICS CO., LTD. CORRECTIVE ASSIGNMENT TO CORRECT THE ASSIGNEE'S COUNTRY PREVIOUSLY RECORDED AT REEL: 037686 FRAME: 0727. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT. Assignors: SUBHOJIT, CHAKLADAR
Priority to US15/998,997 priority Critical patent/US10733978B2/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • G10L15/07Adaptation to the speaker
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/187Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/025Phonemes, fenemes or fenones being the recognition units
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the present disclosure relates to operation of a voice function in an electronic device.
  • An electronic device which includes a microphone or the like provides a function of collecting and recognizing a user's voice.
  • recent electronic devices provide a function of recognizing a user's voice and outputting information corresponding to a recognized voice.
  • a voice function providing method of a typical electronic device may provide a specific function regardless of a person who inputs a voice.
  • an aspect of the present disclosure is to provide a voice function operating method for supporting a voice function of an electronic device so that the voice function is operated in a user (i.e., speaker)-dependent manner, and an electronic device supporting the same.
  • Another aspect of the present disclosure is to provide a voice function operating method for selectively providing a voice function based on the type of an input audio signal, and an electronic device supporting the same.
  • an electronic device may include a memory for storing at least a portion of a plurality of pieces of speech information used for voice recognition, and a control module (or a processor) configured to generate voice recognition information based on at least a portion of the plurality of pieces of speech information, wherein the control module may be configured to select speaker speech information from at least a portion of the plurality of pieces of speech information based on mutual similarity, and may be configured to generate the voice recognition information to be registered as personalized voice information based on the speaker speech information.
  • a control module or a processor
  • a voice function operating method may include storing at least a portion of a plurality of pieces of speech information used for voice recognition, selecting speaker speech information from at least a portion of the plurality of pieces of speech information based on mutual similarity, and generating voice recognition information to be registered as personalized voice information based on the speaker speech information selected.
  • FIG. 1 is a diagram illustrating an example personalized voice function providing environment according to various example embodiments of the present disclosure
  • FIG. 2 is a block diagram illustrating an example of an electronic device supporting a voice function according to various example embodiments of the present disclosure
  • FIG. 3 is a block diagram illustrating an example of a control module according to various example embodiments of the present disclosure
  • FIG. 4 is a diagram illustrating an example candidate group handling method related to speaker-dependent setting according to various example embodiments of the present disclosure
  • FIG. 5 is a diagram illustrating an example personalized voice information update according to various example embodiments of the present disclosure
  • FIG. 6 is a flowchart illustrating an example method of personalized voice during operation of a voice function according to various example embodiments of the present disclosure
  • FIG. 7 is a flowchart illustrating an example personalized voice information update method according to various example embodiments of the present disclosure.
  • FIG. 8 is a diagram illustrating an example of a screen interface related to execution of a personalized voice function according to various example embodiments of the present disclosure
  • FIG. 9 is a diagram illustrating an example of a screen interface related to setting of personalized voice information according to various example embodiments of the present disclosure.
  • FIG. 10 is a block diagram illustrating an example of an electronic device according to various example embodiments of the present disclosure.
  • FIG. 11 is a block diagram illustrating another example of an electronic device according to various example embodiments of the present disclosure.
  • a or B “at least one of A and/or B”, or “one or more of A and/or B” may include all possible combinations of items listed together.
  • the term “A or B”, “at least one of A and B”, or “at least one of A or B” may indicate all the cases of (1) including at least one A, (2) including at least one B, and (3) including at least one A and at least one B.
  • first may modify various elements regardless of the order and/or priority thereof, but does not limit the elements.
  • a first user device and “a second user device” may indicate different user devices regardless of the order or priority.
  • a first element may be referred to as a second element and vice versa.
  • a certain element e.g., a first element
  • another element e.g., a second element
  • the certain element may be coupled to the other element directly or via another element (e.g., a third element).
  • a certain element e.g., a first element
  • another element e.g., a second element
  • there may be no intervening element e.g., a third element between the element and the other element.
  • the term “configured (or set) to” may be interchangeably used with the term, for example, “suitable for”, “having the capacity to”, “designed to”, “adapted to”, “made to”, or “capable of”.
  • the term “configured (or set) to” may not necessarily have the meaning of “specifically designed to”.
  • the term “device configured to” may indicate that the device “may perform” together with other devices or components.
  • processor configured (or set) to perform A, B, and C may represent a dedicated processor (e.g., an embedded processor) for performing a corresponding operation, processing circuitry or a general-purpose processor (e.g., a CPU or an application processor) for executing at least one software program stored in a memory device to perform a corresponding operation.
  • a dedicated processor e.g., an embedded processor
  • processing circuitry e.g., a CPU or an application processor
  • the term “user” used herein may refer to a person who uses an electronic device or may refer to a device (e.g., an artificial electronic device) that uses an electronic device.
  • FIG. 1 is a diagram illustrating an example personalized voice function providing environment according to various example embodiments of the present disclosure.
  • the personalized voice function providing environment may provide a first-state voice function module 10 s of an electronic device for receiving audio signals input by a plurality of speakers 10 a to 10 c in relation of a speaking independent setting.
  • the first-state voice function module 10 s may include, for example, at least one of a hardware module comprising hardware circuitry, a firmware module comprising firmware, or a software module related to provision of a voice function prior to application of a personalized voice function.
  • At least one of the speakers 10 a to 10 c may input a voice (or speech information) using the first-state voice function module 10 s.
  • the first-state voice function module 10 s may perform a voice command function (e.g., a function of recognizing a collected voice, analyzing a voice command based on a result of recognition, and outputting information or performing an available function by an electronic device based on a result of analysis) based on a voice (or speech information) input by the speakers 10 a to 10 c.
  • a voice command function e.g., a function of recognizing a collected voice, analyzing a voice command based on a result of recognition, and outputting information or performing an available function by an electronic device based on a result of analysis
  • the speakers 10 to 10 c may, for example, input a voice (or a speech or speech information) using at least one microphone included in the first-state voice function module 10 s.
  • the first-state voice function module 10 s may collect candidate data (including, for example, speaker speech information or speech information of each speaker) on the speakers 10 a to 10 c without performing speaker identification in a state in which a personalized voice function (e.g., a function of restricting use of functions of an electronic device differentially specified for each speaker) is not applied.
  • a candidate data collecting operation may be automatically performed based on a specified condition. For example, the candidate data collecting operation may be automatically performed while a voice function is performed. Furthermore, the candidate data collecting operation may be automatically performed while a microphone activating operation is performed. According to various example embodiments of the present disclosure, the candidate data collecting operation may be performed for data obtained through successful voice recognition.
  • the first-state voice function module 10 s may collect first candidate data 11 a related to the first speaker 10 a. Furthermore, the first-state voice function module 10 s may collect second candidate data 11 b related to the second speaker 10 b and third candidate data 11 c related to the third speaker 10 c. The first-state voice function module 10 s may perform voice function personalization processing (or voice recognition function personalization processing) if at least a specified number of candidate data are collected or collection of candidate data is completed for a specified time.
  • voice function personalization processing or voice recognition function personalization processing
  • the first-state voice function module 10 s may analyze a plurality of candidate data and may register, as personalized voice information, a speaker recognition model (including, for example, voice recognition information or voice recognition model information) including the first candidate data 11 a related to the first speaker 10 a. Accordingly, the first-state voice function module 10 s may be operated as (or changed into) a second-state voice function module 10 p.
  • the first-state voice function module 10 s may store collected candidate data locally (e.g., in a memory thereof).
  • the first-state voice function module 10 s may, for example, provide the collected candidate data to a specified server device. In the example where the collected candidate data are transmitted to the server device, recognition model training for candidate data may, for example, also be performed in the server device.
  • the second-state voice recognition module 10 p may analyze the collected speech information and may compare an analysis result with the registered personalized voice information. If it is determined, as a result of the comparison, that the speech information corresponds to a speaker recognition model registered as the personalized voice information, the second-state voice function module 10 p may handle execution of a function corresponding to the analysis result of the input speech information.
  • the second-state voice function module 10 p may not perform a function corresponding to the speech information or may perform a limited function based on a specified policy.
  • the second-state voice function module 10 p may output a function execution unavailability message or a limited function execution message.
  • the personalized voice function providing environment may handle execution of a function of an electronic device in a speaker-dependent manner (e.g., only a voice (or speech information) of a specific speaker is handled as valid information, or another speaker's voice (or speech information) is restrictively handled) based on registration of the personalized voice information.
  • a speaker-dependent manner e.g., only a voice (or speech information) of a specific speaker is handled as valid information, or another speaker's voice (or speech information) is restrictively handled
  • FIG. 2 is a block diagram illustrating an example of an electronic device supporting a voice function according to various example embodiments of the present disclosure.
  • an electronic device 100 may include, for example, a communication interface (e.g., including communication circuitry) 110 , a memory 130 , a microphone module (e.g., including a microphone or microphone circuitry) 140 , a display (e.g., including a display panel and/or display processing circuitry) 150 , and a control module (e.g., including a processor including processing circuitry) 160 .
  • a communication interface e.g., including communication circuitry
  • a memory 130 e.g., a memory 130
  • a microphone module e.g., including a microphone or microphone circuitry
  • a display e.g., including a display panel and/or display processing circuitry
  • a control module e.g., including a processor including processing circuitry
  • the electronic device 100 may collect candidate data using the microphone module 140 and may operate the control module 160 , so as to process the candidate data, register personalized voice information (e.g., a specific speaker recognition model), and/or apply the personalized voice information. Based on this process, the electronic device 100 may handle a personalized voice function for supporting a speaker-dependent function.
  • personalized voice information e.g., a specific speaker recognition model
  • the communication interface 110 may handle a communication function of the electronic device 100 .
  • the communication interface 110 may establish a communication channel to a server device or the like in relation to a call function, a video call function, or the like of the electronic device 100 .
  • the communication interface 110 may include at least one communication module or communication chip/circuitry for supporting various communication standards such as 2G, 3G, 4G, LTE, 5G, etc.
  • the communication interface 110 may include at least one antenna covering a single frequency band or a multi-frequency band.
  • the communication interface 110 may establish a short-range communication channel to another electronic device in relation to a data transfer function or a call function of the electronic device 100 .
  • the communication interface 110 may be operated in association with a voice function.
  • the communication interface 110 may establish a communication channel in relation to the voice function such as a call function or a voice-recognition-based message sending/receiving function.
  • the communication interface 110 may establish a communication channel to a server device for analyzing a voice (or speech information) and providing information based on a result of analysis.
  • the communication interface 110 may be restrictively operated in relation to application of a personalized voice function.
  • the communication interface 110 may be enabled based on a speech information input corresponding to a speaker recognition model registered as personalized voice information.
  • the communication interface 110 may establish a communication channel to a specified server device (e.g., a web server device for management of financial information, stock information, or specific information) in response to a speech information input from a specific recognized speaker.
  • a specified server device e.g., a web server device for management of financial information, stock information, or specific information
  • the memory 130 may store various information related to operation of the electronic device 100 .
  • the memory 130 may store an operating system required for operating the electronic device 100 , at least one program related to support for a user function, etc.
  • the memory 130 may store a personalized voice program to support a personalized voice function.
  • the memory 130 may store voice data information 131 and personalized voice information 133 related to operation of the personalized voice program.
  • the voice data information 131 may include a voice signal (e.g., speech information) input from at least one speaker or an audio signal collected when the microphone module 140 is enabled.
  • a voice signal e.g., speech information
  • pieces of speech information from which a noise or a band other than a human voice band has been removed may be stored as candidate data of the voice data information 131 .
  • the voice data information 131 may include pieces of speech information, of which a speech interval has a length of at least a specified time, as a plurality of candidate data.
  • the voice data information 131 may include a specified number of pieces of speech information as candidate data or may include pieces of speech information collected for a specified time as candidate data.
  • a function of collecting the voice data information 131 may, for example, be automatically performed when the microphone module 140 is enabled in relation to execution of a voice function. Furthermore, this function may be automatically ended on completion of collecting the voice data information 131 . According to various example embodiments of the present disclosure, the function of collecting the voice data information 131 may be automatically performed if specified voice recognition is successful, and may be automatically ended immediately after the collection is completed or after elapse of a specified time.
  • the personalized voice information 133 may be related to candidate data selected by applying a specified algorithm or process to the voice data information 131 .
  • the personalized voice information 133 may be a speaker recognition model generated from candidate data related to a specific speaker (e.g., candidate data having a relatively large population in the voice data information 131 ) from among the plurality of candidate data included in the voice data information 131 .
  • the personalized voice information 133 may be candidate models obtained by modeling the candidate data related to the specific speaker.
  • the personalized voice information 133 may be any one of the candidate data of the specific speaker, or information obtained by combining audio features detected from each candidate data, or a speaker recognition model including the audio features.
  • the personalized voice information 133 may include at least one phonemic model (e.g., a signal or information obtained by dividing speech information by phoneme such as h, ai, g, ae, l, ax, k, s, iy) constituting speech information (e.g., a signal or information obtained by speaking speech reference information such as, for example, ‘high galaxy’ by a specific speaker) obtained by speaking speech reference information (e.g., readable specified information such as characters or numbers, for example, ‘high galaxy’) by a specific speaker.
  • a phonemic model e.g., a signal or information obtained by dividing speech information by phoneme such as h, ai, g, ae, l, ax, k, s, iy
  • speech information e.g., a signal or information obtained by speaking speech reference information such as, for example, ‘high galaxy’ by a specific speaker
  • speaking speech reference information e.g., readable specified information such as characters
  • different phonemic models of various forms e.g., phonemic signals or pieces of information with different pitches, tones, or timbres with respect to the same phonemic model such as ‘ha’
  • the same reference phoneme e.g., information obtained by dividing speech reference information by phoneme, for example, hi, ga, lax, sy, etc.
  • “h-a” or “h-ai” may be collected as a phonemic model corresponding to a reference phoneme “hi”.
  • the personalized voice information 133 may include at least one phonemic model included in speech information obtained by speaking specified speech reference information (e.g., at least one specified word, phrase, clause, sentence, etc.), so that, with respect to one reference phoneme, one or more phonemic models for each situation may be associated or one reference phoneme may be indicated.
  • specified speech reference information e.g., at least one specified word, phrase, clause, sentence, etc.
  • the microphone module 140 may include at least one microphone. In the case where one microphone is disposed, the microphone module 140 may enable the microphone in response to control by the control module 160 , and may transfer a collected audio signal to the control module 160 through the enabled microphone. Alternatively, the microphone module 140 may remain in a turned on state and may collect an audio signal while the electronic device 100 is supplied with power or the control module 160 is operated, in response to control by the control module 160 . According to various example embodiments of the present disclosure, the microphone module 140 may include a plurality of microphones. The microphone module 140 may be automatically enabled, for example, when candidate data corresponding to the voice data information 131 are collected.
  • the electronic device 100 may collect speech information corresponding to candidate data by automatically enabling the microphone module 140 for a specified time or until a specified number of candidate data is satisfied in order to collect candidate data.
  • the electronic device 100 may determine whether it is required to collect candidate data so as to automatically collect speech information.
  • the display 150 may output various screens related to operation of the electronic device 100 .
  • the display 150 may output a lock screen, a menu screen, a home screen, a screen on which at least one icon is disposed, a screen to which a background image is output, a specific function execution screen, or the like.
  • the display 150 may output a screen related to execution of a voice function.
  • the display 150 may output a screen related to execution of a voice command function, a screen related to execution of a voice recording function, a screen related to execution of a voice call function, a screen related to execution of a voice recognition function, or the like in response to execution of a corresponding application.
  • the display 150 may output at least one information (e.g., a text, an image, or the like) related to operation of a personalized voice function.
  • the display 150 may output at least one of an icon, a menu, an indicator, or a guide text related to setting of the personalized voice function.
  • the display 150 may output a message, a text, an indicator, or the like for notifying application of the personalized voice function.
  • the display 150 may output a personalized voice function setting screen in response to control by a user input.
  • the electronic device 100 may further include various information output units such as a speaker, a vibration module, a lamp, etc.
  • the information output units may output various information related to operation of the personalized voice function using an audio, at least one specified vibration pattern, or at least one specified flickering pattern.
  • the control module 160 may be configured to perform signal flow control, signal processing control, and information processing in relation to operation of the electronic device 100 .
  • the control module 160 may be configured to control setting of the personalized voice function (e.g., setting for collecting the voice data information 131 for registering the personalized voice information 133 ).
  • the control module 160 may be configured to handle extraction and registration of the personalized voice information 133 on completion of collecting the voice data information 131 .
  • the control module 160 may be configured to handle application of the personalized voice function based on the registered personalized voice information 133 .
  • control module 160 may be configured to allow a specified voice function to be applied in response to speech information input from a specific speaker or may limit a voice function (e.g., allow access to only a part of the function or prevent the function from being executed) in response to speech information input from a non-specific speaker.
  • FIG. 3 is a block diagram illustrating an example of a control module according to various example embodiments of the present disclosure.
  • control module 160 may include a microphone control module 161 , a voice data collecting module 163 , an information processing module 165 , and an information updating module 167 .
  • Each of the foregoing modules may, for example, be embodied by processor including processing circuitry configured to perform the operations of the various modules.
  • the microphone control module 161 may be configured to control enablement and audio signal collection of the microphone 140 . For example, if the electronic device 100 is in a turned-on state, the microphone control module 161 may maintain a turned-on state (e.g., always turned-on state) of the microphone module 140 based on a setting. In the case where a plurality of microphones is included in the microphone module 140 , the microphone control module 161 may control operation of the microphones.
  • a turned-on state e.g., always turned-on state
  • the microphone control module 161 may transfer the collected audio signal to the voice data collecting module 163 .
  • the microphone control module 161 may, for example, transfer the collected audio signal to the voice data collecting module 163 if the collected audio signal is a signal (or speech information) of a frequency band of a voice of a human being, or may treat (or ignore) the collected audio signal as a noise if, for example, the collected audio signal has a frequency outside the voice frequency band.
  • the microphone control module 161 may transfer the collected audio signal to the voice data collecting module 163 regardless of a frequency band of the collected audio signal.
  • the microphone control module 161 may transfer, to the voice data collecting module 163 , only data from which a voice has been successfully recognized.
  • the microphone control module 161 may be configured to control collecting candidate data related to setting of the personalized voice function is automatically performed when the microphone module 140 is enabled. For example, if the microphone module 140 is enabled in order to execute a voice call function, a voice command function, a voice recognition function, a voice recording function, or the like, the microphone control module 161 may determine whether the personalized voice information 133 is registered. If the personalized voice information 133 is not registered, the microphone control module 161 may automatically collect pieces of speech information to be used as the voice data information 131 and may transfer the speech information to the voice data collecting module 163 . If the personalized voice information 133 is registered, the microphone control module 161 may be configured to terminate collection of the speech information to be used as the voice data information 131 automatically.
  • the voice data collecting module 163 may, for example, analyze whether the audio signal has been generated from a human speech. Furthermore, the voice data collecting module 163 may collect pieces of speech information corresponding to a voice frequency band as preliminary candidate group information. In the example where the microphone control module 161 is configured to transmit speech information, a speech information classifying operation of the voice data collecting module 163 may be skipped.
  • the voice data collecting module 163 may be configured to classify preliminary candidate data in the preliminary candidate group which satisfy a specified condition as candidate data of the voice data information 131 .
  • the voice data collecting module 163 may classify only preliminary candidate data of which lengths (e.g., speech time) are at least a specified length as the candidate data of the voice data information 131 .
  • the voice data collecting module 163 may, for example, classify only preliminary candidate data related to specified speech reference information as the candidate data.
  • the voice data collecting module 163 may specify the number of candidate data or a time in relation to collection of the voice data information 131 .
  • the voice data collecting module 163 may be configured to collect the voice data information 131 for a specified time after a specific event occurs (e.g., after the electronic device 100 is assigned specified personal information (e.g., a personal telephone number provided by a service provider) or after the electronic device 100 firstly accesses a specified base station).
  • a specific event e.g., after the electronic device 100 is assigned specified personal information (e.g., a personal telephone number provided by a service provider) or after the electronic device 100 firstly accesses a specified base station.
  • the voice data collecting module 163 may be configured to collect the voice data information 131 for a specified time.
  • the voice data collecting module 163 may be configured to collect the voice data information 131 until a specified number of candidate data are collected after setting of the personalized voice function is started.
  • the number of candidate data may be changed based on a setting of a personalized voice function policy or may be changed by user's setting.
  • the voice data collecting module 163 may provide, to the information processing module 165 , the voice data information 131 including the specified number of candidate data or candidate data collected for a specified time.
  • the information processing module 165 may be configured to select the personalized voice information 133 from the voice data information 131 .
  • the information processing module 165 may select arbitrary candidate data from the voice data information 131 and may perform voice feature (e.g., a unique voice feature of each speaker, such as a timbre) comparison between the selected candidate data and another candidate data.
  • the information processing module 165 may classify (e.g., by clustering) candidate data by performing the feature comparison. For example, an unsupervised learning method such as vector quantization may be used.
  • the information processing module 165 may select candidate data, the number of which is relatively large, from among classified candidate data.
  • the arbitrary candidate data may be selected from among, for example, initially collected candidate data, lastly collected candidate data, and candidate data collected in a specified certain time slot.
  • the information processing module 165 may be configured to register selected candidate data as the personalized voice information 133 .
  • the information processing module 165 may provide a guide on whether to register the personalized voice information 133 , and may, for example, request user approval.
  • the information processing module 165 may provide a popup window providing a query on whether to register specified candidate data as the personalized voice information 133 , and may handle registration of the personalized voice information 133 based on a user confirmation.
  • the information processing module 165 may be configured to output time information about collection times of the candidate data or voice recognition information of the candidate data output together with the candidate data in order to differentiate the candidate data.
  • the information processing module 165 may be configured to perform speaker identification based on collected speech information and the registered personalized voice information 133 .
  • the information processing module 165 may be configured to differentiate a function to be performed based on a result of speaker identification. For example, in the case where speech information of a speaker registered in the personalized voice information 133 is collected, the information processing module 165 may perform a function to be performed in response to speech information recognition. Alternatively, in the case where speech information of a speaker not registered in the personalized voice information 133 is collected, the information processing module 165 may notify that information output or function execution corresponding to speech information is unable to be performed.
  • the information processing module 165 may be configured to perform multi-condition training while performing modeling based on data included in the voice data information 131 .
  • the information processing module 165 may handle various effects for the data included in the voice data information 131 .
  • the information processing module 165 may provide a specified sound effect to the data included in the voice data information 131 and may generate candidate data based on the sound effect, or may generate candidate data with which a specified noise is combined.
  • the information processing module 165 may extract a speaker model to be registered as the personalized voice information 133 , by applying multi-condition-trained candidate data (e.g., data to which a specified sound effect is added or data to which a noise is added) together with data included in other voice data information 131 .
  • multi-condition-trained candidate data e.g., data to which a specified sound effect is added or data to which a noise is added
  • the information processing module 165 may generate multi-condition training models in relation to candidate data included in a cluster having a relatively large number of candidate data after, for example, clustering candidate data included in the voice data information 131 . Furthermore, the information processing module 165 may be configured so that multi-condition training models generated based on candidate data included, for example, in a cluster of a specific speaker are used for determining a speaker recognition model.
  • the information processing module 165 may use a universal background model (UBM) during a speaker modeling process for candidate data included in the voice data information 131 .
  • UBM information may include a statistical model generated based on features of speech information of various persons.
  • the UBM information may be generated based on non-speaker data during a process of calculating a speaker recognition model of a speaker specified in the voice data information 131 .
  • the non-speaker data may, for example, be differentiated from speaker data based on the above-mentioned clustering method.
  • the information updating module 167 may be configured to handle modification, adaptation or enhancement of the personalized voice information 133 .
  • the information updating module 167 may request and receive, from the microphone control module 161 , an audio signal collected by the microphone module 140 , and may extract information to which the personalized voice information 133 is to be adapted.
  • the information updating module 167 may check whether the collected audio signal includes user's speech information (including at least one of a wakeup audio signal related to a voice function or a voice command audio signal).
  • the information updating module 167 may check whether phonemes corresponding to phonemic models included in the specified personalized voice information are included in the collected speech information.
  • the information updating module 167 may collect new phonemic samples corresponding to the phonemic models included in the personalized voice information 133 by performing voice recognition on the collected speech information, and may perform phonemic model training based on the collected phonemic samples. Furthermore, the information updating module 167 may perform enhancement (or adaption or the like) of the phonemic models of the personalized voice information 133 according to the phonemic model training.
  • the information updating module 167 may check an adaptation ratio (or an adaptation degree or an enhancement ratio) of the personalized voice information 133 adapted using the collected speech information. For example, the information updating module 167 may determine whether a frequency of information update of the personalized voice information 133 by newly collected speech information is equal to or higher than a specified value. If the newly collected speech information is already obtained speech information, additional update may not occur. The information updating module 167 may determine that the adaptation ratio is high if the update frequency is high (e.g., the number of pieces of speech information used for update from among a certain number of collected pieces of speech information is at least a specified value), or may determine that the adaptation ratio is low if the update frequency is low and may terminate adaptation of the personalized voice information 133 .
  • the adaptation ratio or an adaptation degree or an enhancement ratio
  • the information updating module 167 may automatically collect speech information when the microphone module 140 is enabled in relation to adaptation of the personalized voice information 133 . If a function of adapting the personalized voice information 133 is ended (e.g., the adaptation ratio is equal to or lower than a specified condition), the information updating module 167 may automatically end collection of speech information related to adaptation of the personalized voice information 133 .
  • the information updating module 167 may be configured so that specified information is output through the display 150 in relation to starting or automatic ending of adaptation-related speech information collection.
  • FIG. 4 is a diagram illustrating an example candidate group handling method related to speaker-dependent setting according to various example embodiments of the present disclosure.
  • the electronic device 100 may collect a specified number of pieces of the voice data information 131 or may collect the voice data information 131 for a specified time.
  • the collected voice data information 131 may include, for example, pieces of speech information 401 a to 401 c corresponding to candidate data spoken by three speakers. If collection of the pieces of speech information 401 a to 401 c is completed, the electronic device 100 may classify the pieces of speech information 401 a to 401 c.
  • the electronic device 100 may select any one arbitrary piece of speech information 401 from among the collected pieces of speech information 401 a to 401 c based on a specified condition. If the arbitrary speech information 401 is selected, the electronic device 100 may convert the arbitrary speech information 401 into a first temporary model 460 a. If the first temporary model 460 a is generated, the electronic device 100 may compare the first temporary model 460 a with the pieces of speech information 401 a to 401 c, and may assign a score to each of the pieces of speech information 401 a to 401 c.
  • the electronic device 100 may assign a low score to speech information similar to the first temporary model 460 a, and may assign a high score to speech information having no similarity with the first temporary model 460 a.
  • the electronic device 100 may sort the pieces of speech information 401 a to 401 c in order of score.
  • the electronic device 100 may cluster the pieces of speech information 401 a to 401 c in order of score as illustrated in the center of FIG. 4 .
  • three data from among pieces of the first speech information 401 a spoken by a first speaker and one piece of data from among pieces of the second speech information 401 b spoken by a second speaker may be clustered as one group.
  • one piece of the first speech information 401 a spoken by the first speaker, the second speech information 401 b, and the third speech information 401 c may be clustered as separate groups respectively.
  • the electronic device 100 may detect a second temporary model 460 b using pieces of information 403 clustered with pieces of speech information having low scores. Furthermore, the electronic device 100 may compare the pieces of speech information 401 a to 401 c with the second temporary model 460 b generated based on the clustered pieces of information 403 . Accordingly, as illustrated in FIG. 4 , the first speech information 401 a obtains lowest scores (or scores equal to or higher than a specified threshold), and the second speech information 401 b and the third speech information 401 c obtain relatively high scores (or scores equal to or lower than the specified threshold).
  • the electronic device 100 may re-perform clustering based on the scores, thereby obtaining a cluster including pieces of the first speech information 401 a, a cluster including pieces of the second speech information 401 b, and a cluster including the third speech information 401 c, as illustrated in FIG. 4 . Based on the above result, the electronic device 100 may register the cluster including the pieces of the first speech information 401 a as the personalized voice information 133 .
  • FIG. 5 is a diagram illustrating an example personalized voice information update according to various example embodiments of the present disclosure.
  • the personalized voice information 133 of a specific speaker may be audio information corresponding to speech reference information “Hi Galaxy”.
  • the personalized voice information 133 may include phonemic models for each of “h-ai-g-ae-l-ax-k-s-iy” as illustrated in FIG. 5 .
  • the personalized voice information 133 may include, for example, a “ha” registration phonemic model 501 , as a phonemic model.
  • the personalized voice information 133 may include a registration frequency model 510 related to the corresponding registration phonemic model 501 when the speaker speaks “Hi Galaxy”.
  • the electronic device 100 may enable the microphone module 140 based on a specified condition. As illustrated in FIG. 5 , the microphone 140 may collect audio information obtained by speaking speech reference information such as “How's the weather?” by a specific speaker. In this example, the electronic device 100 may extract phonemic models “h-aw-s-th-ax-w-eh-th-er” for the speech reference information. The electronic device 100 may collect a new phonemic model 503 of the same “ha” from among the extracted phonemic models. Furthermore, the electronic device 100 may collect a new frequency model 530 corresponding to the new phonemic model 503 .
  • the electronic device 100 may store the new phonemic model 503 and the new frequency model 530 in association with the registration phonemic model 501 and the registration frequency model 510 , or may integrate and store the foregoing models and frequencies as one phonemic model group.
  • the electronic device 100 may extract a phonemic model and a frequency model from speech information spoken by a specific speaker so as to extend a model group of the registered personalized voice information 133 . Based on this extended model group, the electronic device 100 may more accurately recognize specified speech reference information registered as the personalized voice information 133 even if a speaker speaks the speech reference information in various situations.
  • an electronic device may include a memory for storing at least a portion of a plurality of pieces of speech information used for voice recognition, and a control module for generating voice recognition information based on at least a portion of the plurality of pieces of speech information, wherein the control module may select speaker speech information from at least a portion of the plurality of pieces of speech information based on mutual similarity, and may generate the voice recognition information to be registered as personalized voice information based on the speaker speech information.
  • control module may be configured so that a message for applying the voice recognition information to the voice recognition is output.
  • control module may be configured so that the pieces of speech information are collected for a specified time or until a specified number of the pieces of speech information is satisfied.
  • control module may be configured to generate multi-condition training models of the plurality of pieces of speech information, and may use the multi-condition training models to determine the voice recognition information to be registered as the personalized voice information.
  • control module may be configured to generate multi-condition training models of pieces of the speaker speech information, and may use the multi-condition training models to determine the voice recognition information to be registered as the personalized voice information.
  • control module may be configured so that other speech information input from a specific speaker corresponding to the personalized voice information is collected and a model of the personalized voice information is adapted.
  • control module may be configured so that a phonemic sample corresponding to a registered phonemic model included in the personalized voice information is extracted from the speech information input from the specific speaker and is used to adapt the registered phonemic model.
  • control module may be configured so that a message of unavailability of function execution based on the new speech information is output or may selectively control the function execution based on the type of a function requested by the new speech information.
  • control module may be configured so that the function is not performed if the function is a specified secure function or the function is performed if the function is a non-secure function not specified.
  • control module may be configured so that a setting screen is output for setting at least one function item to be executed based on a voice function in response to a speech information input from a speaker specified based on the personalized voice information.
  • an electronic device may include a memory for storing voice data information including pieces of speech information as candidate data, and a control module configured so that one piece of speaker-related information is selected from the candidate data, wherein the control module may be configured so that the candidate data are clustered based on mutual similarity, and specified personalized voice information is registered to be used to restrict execution of a function based on whether specified speech information is input, based on candidate data with the same similarity, the number of which is relatively large.
  • FIG. 6 is a flowchart illustrating an example method of personalized voice during operation of a voice function according to various example embodiments of the present disclosure.
  • the control module 160 of the electronic device 100 may be configured to determine whether the event is related to setting of a personalized voice function. For example, the control module 160 may be configured to determine whether the event is for executing a specified function for personalized voice, or is related to automatic execution of a personalized voice function, or is for executing specified function such as a voice recognition function.
  • control module 160 may be configured to control execution of a function based on the type of the event that has occurred in operation 603 .
  • the control module 160 may check the type of the event, and may handle playback of a music file, transfer of a specified file, execution of a call function, or execution of a web access function based on the type of the event.
  • the control module 160 may collect candidate data as the voice data information 131 in operation 605 . In relation to this operation, the control module 160 may enable the microphone 140 if the electronic device 100 is in a turned-on state or at a specified time. The control module 160 may collect a specified number of candidate data at a specified period, or in real time, or when an audio signal having a specified intensity or higher occurs. According to an example embodiment of the present disclosure, the control module 160 may be configured to perform a candidate group collecting operation until the number of candidate data becomes a specified number.
  • control module 160 may be configured to automatically enable the microphone module 140 for a specified time (e.g., one hour, one day, one week, one month, or the like) after the electronic device 100 is purchased, so as to collect candidate data.
  • control module 160 may be configured to collect candidate data until specified number of candidate data are collected or for a specified time, when a voice function (e.g., a call function, a voice recognition function, a recording function, a voice command function, or the like) is operated.
  • a voice function e.g., a call function, a voice recognition function, a recording function, a voice command function, or the like
  • the control module 160 may be configured to process the voice data information 131 and may extract the personalized voice information 133 .
  • the control module 160 may be configured to extract clusters including candidate data spoken by the same speaker by performing comparison between collected pieces of the voice data information 131 with a temporary model and performing clustering of the collected pieces of the voice data information 131 .
  • the control module 160 may be configured to compare data of the extracted clusters so as to extract candidate data of a cluster having a largest number of data and register the extracted candidate data as the personalized voice information 133 .
  • the control module 160 may be configured to handle application of personalized voice information. If the personalized voice information 133 is registered, the control module 160 may be configured to compare speaker speech information input thereafter with data of the personalized voice information 133 to check similarity therebetween. Furthermore, if the similarity satisfies a specified condition (e.g., a similarity degree is equal to or higher than a specified value), the control module 160 may recognize the input speech information as speech information of a specific speaker. If it is determined that the input speech information is the speech information of the specific speaker, the control module 160 may be configured to control a voice function for the speech information. For example, the control module 160 may perform voice recognition on the speech information, and may control execution of a specified function based on a voice recognition result. Alternatively, the control module 160 may support at least one of retrieval and output of internal information of the electronic device 100 with respect to the voice recognition result or retrieval and output of information using an external server device in relation to the voice recognition result.
  • a specified condition e.g., a similarity degree is equal to
  • the control module 160 may be configured to output a guide text for notifying that a speaker of the input speech information is not the specific speaker, or may support execution of a specified function according to a user's setting or a set policy. For example, the control module 160 may perform retrieval and output of information related to the result of voice recognition from the speech information using an external server device. Alternatively, in the case where the speaker of the input speech information is not the specific speaker, the control module 160 may be configured to check the type of information or the type of a function to be performed by the speech information based on the user's setting or policy, and may restrictively or selectively perform function execution or information output.
  • FIG. 7 is a flowchart illustrating an example personalized voice information update method according to various example embodiments of the present disclosure.
  • the control module 160 may be configured to determine whether a personalized voice function is currently executed or an event that has occurred is related to execution of the personalized voice function. If the personalized voice function is not currently executed or there is no occurrence of the related event, the control module 160 may support execution of a specified function or control of a specified state in operation 703 . For example, the control module 160 may support a camera function or a music playback function according to the type of the event. Alternatively, the control module 160 may maintain a sleep mode.
  • control module 160 may be configured to collect adaptation (or enhancement) information in operation 705 .
  • control module 160 may be configured to enable the microphone module 140 and may collect speech information having a specified length or longer or speech information corresponding to specified speech reference information.
  • control module 160 may be configured to perform personalized voice information adaptation.
  • the control module 160 may be configured to collect phonemic models from various information spoken by a specific speaker, and may store or integrate the collected models in association with phonemic models having the same reference phonemes as those of phonemic models registered as the personalized voice information 133 .
  • the control module 160 may be configured to collect only speech information corresponding to the specified speech reference information, and may manage phonemic models corresponding to the same reference phonemes in the collected speech information by integrating the phonemic models into one model group.
  • the control module 160 may determine whether an adaption ratio (or an adaptation degree or an enhancement ratio) satisfies a specified condition. For example, the control module 160 may be configured to check the degree of similarity between the phonemic models in the collected speech information and phonemic models being managed and an information update ratio based on the degree of similarity, and may specify the adaptation ratio based on the update ratio or update frequency. If the adaptation ratio does not satisfy the specified condition, the process may return to operation 701 so that the control module 160 may re-perform operation 701 and the following operations. If the adaptation ratio satisfies the specified condition, the control module 160 may end a personalized voice information adaptation function.
  • an adaption ratio or an adaptation degree or an enhancement ratio
  • FIG. 8 is a diagram illustrating an example of a screen interface related to execution of a personalized voice function according to various example embodiments of the present disclosure.
  • the control module 160 of the electronic device 100 may be configured to output, to the display 150 , a screen corresponding to activation of a voice function (e.g., a voice command function) as illustrated in a screen 801 .
  • a voice function e.g., a voice command function
  • the control module 160 may output a guide message 811 for providing a notification that the personalization function is being set.
  • the guide message 811 may include at least one of a text or an image for notifying that candidate group information is being collected in relation to setting of the personalized voice function. Output of the guide message 811 may be skipped based on a setting or a user input.
  • the control module 160 may output, to a specified area (e.g., an indicator area), a first indicator 810 for notifying that the personalization function is being set.
  • the control module 160 may be configured to determine whether a collected audio signal is speech information corresponding to a voice by checking a frequency band of the audio signal. If the audio signal is the speech information, the control module 160 may collect it as the voice data information 131 . Alternatively, even if the audio signal is the speech information, the control module 160 may determine whether a specified condition (e.g., speech information having at least a certain length or speech information corresponding to specified speech reference information) is satisfied. The control module 160 may be configured to collect pieces of the speech information satisfying the specified condition as the voice data information 131 .
  • a specified condition e.g., speech information having at least a certain length or speech information corresponding to specified speech reference information
  • the control module 160 may collect an audio signal as the voice data information 131 or may collect an audio signal of which a signal existence state is maintained for at least a certain length as the voice data information 131 . Furthermore, if the voice data information 131 is collected by as much as a specified amount or for a specified time, the control module 160 may evaluate the collected voice data information 131 with respect to division of speech information or correspondence to speech reference information.
  • the control module 160 may output a guide message 831 for notifying that the personalization function is being applied, as illustrated in a screen 803 .
  • the guide message 831 may include at least one of a text or an image indicating that the personalized voice function is being applied. Output of the guide message 831 may be skipped based on a setting or a user control input. Alternatively, the control module 160 may output, to a specified area (e.g., an indicator area), a second indicator 830 for notifying that the personalized voice function is being applied.
  • the control module 160 may be configured to perform training for voice modeling, after sufficient voice samples are obtained for a specified time or a specified number or a specified amount of sufficient voice samples are obtained. If, for example, it is determined that a sufficient speaker recognition performance is obtained since a training result brings about a specified amount (e.g., equal to or larger than a specified sample number or specified reliability), the control module 160 may provide, to a user, a recommendation or selection message for inducing the user to use a personalized voice recognition function. In this operation, the control module 160 may request user's approval (e.g., confirmation according to a popup message output) for updating a model.
  • approval e.g., confirmation according to a popup message output
  • the control module 160 may analyze the input audio signal. Based on a result of audio signal analysis, the control module 160 may support function execution or restrictive function execution. For example, if a first voice command 820 is collected, the control module 160 may analyze the first voice command 820 and may classify it as a request for non-secure function execution. According to an example embodiment of the present disclosure, in the case where the analyzed first voice command 820 includes a non-specified word (e.g., weather, news, bus information, etc.), the control module 160 may classify the first voice command 820 as a request for non-secure function execution.
  • a non-specified word e.g., weather, news, bus information, etc.
  • the control module 160 may classify the first voice command 820 as a request for secure function execution. Alternatively, the control module 160 may determine whether the type of an application to be executed by the first voice command 820 is a secure function or a non-secure function. In relation to this operation, the electronic device 100 may include classification information on a secure function or a non-secure function for each application type.
  • a specified word e.g., cost, card, mail, message, call history, etc.
  • the control module 160 may collect and output information for the first voice command 820 . For example, as illustrated in a screen 805 , the control module 160 may output first voice recognition information 851 corresponding to the first voice command 820 , and may output first execution information 853 as a result of performing a function or retrieval corresponding to the first voice recognition information 851 .
  • the control module 160 may perform speaker analysis (e.g., comparison with the personalized voice information 133 ) on the second voice command 840 , and may process the second voice command 840 only if analyzed speaker information indicates a registered speaker. For example, if it is determined that a speaker indicated as a result of analysis is not a registered speaker, the control module 160 may output a message related to unavailability of processing the second voice command 840 .
  • speaker analysis e.g., comparison with the personalized voice information 133
  • the control module 160 may evaluate the collected second voice command 840 , and may determine whether the second voice command 840 is related to a secure function or a function specified as a speaker-dependent function. If the second voice command 840 is related to a non-secure function or a function not specified as a speaker-dependent function, the control module 160 may handle execution of a function based on the second voice command 840 without additionally checking the personalized voice information 133 . Alternatively, if the second voice command 840 is related to a secure function or a speaker-dependent function, the control module 160 may identify a speaker of the second voice command 840 using the personalized voice information 133 .
  • the control module 160 may execute a function corresponding to the second voice command 840 . If the second voice command 840 is not speech information input from a specific speaker, the control module 160 may output, in response to the second voice command 840 , a restrictive message 873 of user identification or unavailability of function execution. For example, the control module 160 may selectively output second voice recognition information 871 for the second voice command 840 .
  • FIG. 9 is a diagram illustrating an example of a screen interface related to setting of personalized voice information according to various example embodiments of the present disclosure.
  • the control module 160 of the electronic device 100 may output, to the display 150 , a setting screen as illustrated in a screen 901 .
  • the setting screen may include items related to voice function setting, such as an external server use item, a personalization function operation item, and a voice output item.
  • a virtual reset button 911 may be assigned to the personalization function operation item in relation to personalization function setting or application.
  • the control module 160 may support resetting of the voice data information 131 or the personalized voice information 133 obtained in relation to personalization function setting or application. In relation to this operation, the control module 160 may output, to the display 150 , a popup window 931 related to initialization as illustrated in a screen 903 .
  • the popup window 931 may include, for example, a message for providing a guide on initialization and an authentication information input area for user authentication.
  • the control module 160 may output a menu screen related to personalization function operation as illustrated in a screen 905 .
  • the menu screen may include, for example, items for selecting at least one application to which a personalized voice function is to be applied.
  • the menu screen may include an entire function item 951 , a password-set function item 953 , and a user customized item 955 .
  • the entire function item 951 may be a restrictive item for allowing only a specific speaker to use, through a voice function, all functions supported by applications installed in the electronic device 100 .
  • the electronic device 100 may operate a voice function based on speech information of various users without specifying a speaker.
  • the password-set function item 953 may be a restrictive item for allowing function items related to a secure function to be used based on a voice function and speech information of a specific speaker. According to an example embodiment of the present disclosure, when the password-set function item 953 is selected, the electronic device 100 may provide items of functions that require password authentication when operated according to user designation or items of functions that require password authentication for an application operating schedule among provided applications. A specific function may be excluded from the password-set function item 953 when a password set in an application is released.
  • the user customized item 955 may enable a user to specify an application item to be used based on a voice function and speech information of a specific speaker. If the user customized item 955 is selected, the electronic device 100 may output a list of applications supported by the electronic device 100 . Here, the electronic device 100 may automatically remove the password-set function item 953 from a list related to the user customized item 955 to display the list.
  • a voice function operating method may include storing at least a portion of a plurality of pieces of speech information used for voice recognition, selecting speaker speech information from at least a portion of the plurality of pieces of speech information based on mutual similarity, and generating voice recognition information to be registered as personalized voice information based on the speaker speech information selected.
  • the method further comprises at least one of collecting the speech information for a specified time or collecting the speech information until a specified number of candidate data is satisfied.
  • the method may further include outputting a message for applying the voice recognition information to the voice recognition.
  • the method may further include generating multi-condition training models of the plurality of pieces of speech information, and applying the multi-condition training models to determine the voice recognition information to be registered as the personalized voice information.
  • the generating may include generating multi-condition training models of pieces of the speaker speech information, and applying the multi-condition training models to determine the voice recognition information to be registered as the personalized voice information.
  • the method may further include collecting other speech information input from a specific speaker corresponding to the personalized voice information, and adapting a model of the personalized voice information using the other speech information of the specific speaker.
  • the adapting may include extracting a phonemic sample corresponding to a registered phonemic model included in the personalized voice information from the speech information input from the specific speaker to use the phonemic sample in adapting the registered phonemic model.
  • the method may further include outputting, if new speech information newly input is not a speech of the specific speaker corresponding to the personalized voice information, a message of unavailability of execution of a function according to the new speech information, and selectively executing the function according to the type of the function requested by the new speech information.
  • the executing the function may include not performing the function if the function is a specified secure function and performing the function if the function is a non-secure function and is not specified.
  • the method may further include outputting a setting screen for setting at least one function item to be executed based on a voice function in response to a speech information input from a speaker specified based on the personalized voice information.
  • a voice function operating method may include collecting pieces of speech information as candidate data, clustering the candidate data based on mutual similarity, and registering specified personalized voice information to be used to restrict execution of a function based on whether specified speech information is input, based on candidate data with the same similarity, the number of which is relatively large.
  • FIG. 10 is a block diagram illustrating an example of an electronic device according to various example embodiments of the present disclosure.
  • the electronic device 100 may include a control module (e.g., including a processor including processing circuitry) 1060 and a microphone module (e.g., including at least one microphone) 1040 .
  • a control module e.g., including a processor including processing circuitry
  • a microphone module e.g., including at least one microphone
  • the microphone module 1040 may include, for example, first to Nth microphones 40 _ 1 to 40 _N.
  • the first to Nth microphones 40 _ 1 to 40 _N may be connected to, for example, the control module 1060 .
  • the first to Nth microphones 40 _ 1 to 40 _N may be arranged at one side of the electronic device 100 so as to be spaced apart from each other by a certain distance.
  • the microphone module 1060 may control at least one of the microphones included in the microphone module 1040 .
  • the control module 1060 may enable the first microphone 40 _ 1 and may analyze an audio signal collected by the first microphone 40 _ 1 .
  • the control module 1060 may use audio signals collected through the first microphone 40 _ 1 as the voice data information 131 .
  • the control module 1060 may also collect pieces of speech information corresponding to the voice data information 131 using the first to Nth microphones 40 _ 1 to 40 _N.
  • the control module 1060 may use the first microphone 40 _ 1 alone to collect the voice data information 131 , and may use the first to Nth microphones 40 _ 1 to 40 _N to adapt (or enhance) the personalized voice information 133 .
  • the electronic device 100 may enable the first microphone 40 _ 1 and may check whether speech information corresponding to specified speech reference information (e.g., “hi galaxy”) is collected.
  • the electronic device 100 may use, for adapting the personalized voice information 133 , additional speech information collected in a state in which the other microphones are enabled after the speech information corresponding to the speech reference information is collected.
  • the electronic device 100 may support execution of a voice function according to the speech information collected by the microphones 40 _ 1 to 40 _N.
  • the control module 1060 may support a voice function using the first microphone 40 _ 1 alone. Furthermore, in a state in which the personalized voice function is applied, the control module 1060 may detect speech information corresponding to the speech reference information using the first microphone 40 _ 1 , and may collect additional speech information using the microphones 40 _ 1 to 40 _N.
  • the control module 1060 may collect speech information and may perform analysis on whether the collected speech information corresponds to the speech reference information using the first microphone 40 _ 1 alone. In the state in which the personalized voice function is applied, the control module 1060 may detect speech information corresponding to the speech reference information using a plurality of microphones (e.g., the first and second microphones 40 _ 1 and 40 _ 2 ). Furthermore, in the state in which the personalized voice function is applied, the control module 1060 may enable the first to Nth microphones 40 _ 1 to 40 _N to control collection of additional speech information, if speech information corresponding to the speech reference information is collected.
  • the electronic device 100 may control operation of the microphones 40 _ 1 to 40 _N in consideration of efficient use of power or in order to collect more clear speech information.
  • FIG. 11 is a block diagram illustrating another example of an electronic device according to various example embodiments of the present disclosure.
  • the electronic device 100 may include a control module (e.g., including a processor including processing circuitry) 1160 and a microphone module (e.g., including at least one microphone) 1040 .
  • a control module e.g., including a processor including processing circuitry
  • a microphone module e.g., including at least one microphone
  • the microphone module 1040 may include first to Nth microphones 40 _ 1 to 40 _N in a similar manner to that described above with reference to FIG. 10 .
  • the plurality of microphones 40 _ 1 to 40 _N may be connected to the control module 1160 .
  • the first microphone 40 _ 1 from among the plurality of microphones 40 _ 1 to 40 _N may be connected to a low-power processing module 1163 .
  • the Nth microphone 40 _ 1 from among the plurality of microphones 40 _ 1 to 40 _N may be connected to a main control module 1161 .
  • the second to Nth microphones 40 _ 2 to 40 _N may be connected to both the low-power processing module 1163 and the main control module 1161 .
  • the first microphone 40 _ 1 may be connected to not only the low-power processing module 1163 but also the main control module 1161 . Accordingly, the first microphone 40 _ 1 may transfer a collected audio signal to the low-power processing module 1163 , or, if the main control module 1161 is in a woken state, the first microphone 40 _ 1 may transfer the collected audio signal to the main control module 1161 or both the low-power processing module 1163 and the main control module 1161 .
  • the control module 1160 may include the main control module 1161 and the low-power processing module 1163 .
  • the low-power processing module 1163 may, for example, be a processor (e.g., including processing circuitry) driven with relatively low power compared to the main control module 1161 .
  • the low-power processing module 1163 may be a chip dedicated to audio signal processing, a sensor hub, or a chip dedicated to speech information processing.
  • the low-power processing module 1163 may be independently driven while the main control module 1161 is in a sleep mode, so as to control driving of the first microphone 40 _ 1 included in the microphone module 1040 and analyze an audio signal collected by the first microphone 40 _ 1 .
  • the low-power processing module 1163 may analyze whether the audio signal collected by the first microphone 40 _ 1 is speech information corresponding to a voice, or is speech information corresponding to specified speech reference information, or is speech information spoken by a specific speaker. If the speech information satisfies a specified condition, the low-power processing module 1163 may wake the main control module 1161 . In this operation, the low-power processing module 1163 may perform control so that the second to Nth microphones 40 _ 2 to 40 _N which are in a disabled state is enabled.
  • the main control module 1161 may be woken by the low-power processing module 1163 after remaining in a sleep mode in consideration of efficient use of power.
  • the main control module 1161 may enable the second to Nth microphones 40 _ 2 to 40 _N, and may collect and analyze additional speech information.
  • the main control module 1161 may control collection of the voice data information 131 for collected pieces of speech information, registration of the personalized voice information 133 , and restrictive execution of a voice function according to application of a personalized voice function, as described above with respect to the control module 150 .
  • module used herein may represent, for example, a unit including one of hardware (including hardware circuitry), software and firmware or a combination thereof.
  • the term “module” may be interchangeably used with the terms “unit”, “logic”, “logical block”, “component” and “circuit”.
  • the “module” may be a minimum unit of an integrated component or may be a part thereof.
  • the “module” may be a minimum unit for performing one or more functions or a part thereof.
  • the “module” may be implemented mechanically or electronically.
  • the “module” may include at least one of processing circuitry, hardware circuitry, firmware, an application-specific integrated circuit (ASIC) chip, a field-programmable gate array (FPGA), and a programmable-logic device for performing some operations, which are known or will be developed.
  • ASIC application-specific integrated circuit
  • FPGA field-programmable gate array
  • At least a part of devices e.g., modules or functions thereof
  • methods e.g., operations
  • a computer-readable storage medium in the form of a program module.
  • the module or program module according to various example embodiments of the present disclosure may include at least one of the above-mentioned elements, or some elements may be omitted or other additional elements may be added. Operations performed by the module, the program module or other elements according to various example embodiments of the present disclosure may be performed in a sequential, parallel, iterative or heuristic way. Furthermore, some operations may be performed in another order or may be omitted, or other operations may be added.
  • the type of voice function that may be operated for each speaker or the type of an application executable by voice recognition may be handled in a speaker-dependent manner.
  • the security related to a voice function of an electronic device may be secured.

Abstract

An electronic device is provided. The electronic device includes a memory configured to store at least a portion of a plurality of pieces of speech information used for voice recognition, and a processor operatively connected to the memory, wherein the processor selects speaker speech information from at least a portion of the plurality of pieces of speech information based on mutual similarity, and generates voice recognition information to be registered as personalized voice information based on the speaker speech information.

Description

    CROSS-REFERENCE TO RELATED APPLICATION
  • This application is based on and claims priority under 35 U.S.C. §119 to a Korean patent application filed on Feb. 11, 2015 in the Korean Intellectual Property Office and assigned Serial number 10-2015-0020786, the disclosure of which is incorporated by reference herein in its entirety.
  • TECHNICAL FIELD
  • The present disclosure relates to operation of a voice function in an electronic device.
  • BACKGROUND
  • An electronic device which includes a microphone or the like provides a function of collecting and recognizing a user's voice. For example, recent electronic devices provide a function of recognizing a user's voice and outputting information corresponding to a recognized voice.
  • Meanwhile, according to a typical voice function, only the contents of a collected voice are recognized and a service corresponding thereto is provided. Therefore, a voice function providing method of a typical electronic device may provide a specific function regardless of a person who inputs a voice.
  • SUMMARY
  • Accordingly, an aspect of the present disclosure is to provide a voice function operating method for supporting a voice function of an electronic device so that the voice function is operated in a user (i.e., speaker)-dependent manner, and an electronic device supporting the same.
  • Another aspect of the present disclosure is to provide a voice function operating method for selectively providing a voice function based on the type of an input audio signal, and an electronic device supporting the same.
  • In accordance with an aspect of the present disclosure, an electronic device is provided. The electronic device may include a memory for storing at least a portion of a plurality of pieces of speech information used for voice recognition, and a control module (or a processor) configured to generate voice recognition information based on at least a portion of the plurality of pieces of speech information, wherein the control module may be configured to select speaker speech information from at least a portion of the plurality of pieces of speech information based on mutual similarity, and may be configured to generate the voice recognition information to be registered as personalized voice information based on the speaker speech information.
  • In accordance with another aspect of the present disclosure, a voice function operating method is provided. The voice function operating method may include storing at least a portion of a plurality of pieces of speech information used for voice recognition, selecting speaker speech information from at least a portion of the plurality of pieces of speech information based on mutual similarity, and generating voice recognition information to be registered as personalized voice information based on the speaker speech information selected.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • These and other aspects and advantages of the disclosure will become more apparent and readily appreciated from the following detailed description, taken in conjunction with the accompanying drawings, in which like reference numerals refer to like elements, and wherein:
  • FIG. 1 is a diagram illustrating an example personalized voice function providing environment according to various example embodiments of the present disclosure;
  • FIG. 2 is a block diagram illustrating an example of an electronic device supporting a voice function according to various example embodiments of the present disclosure;
  • FIG. 3 is a block diagram illustrating an example of a control module according to various example embodiments of the present disclosure;
  • FIG. 4 is a diagram illustrating an example candidate group handling method related to speaker-dependent setting according to various example embodiments of the present disclosure;
  • FIG. 5 is a diagram illustrating an example personalized voice information update according to various example embodiments of the present disclosure;
  • FIG. 6 is a flowchart illustrating an example method of personalized voice during operation of a voice function according to various example embodiments of the present disclosure;
  • FIG. 7 is a flowchart illustrating an example personalized voice information update method according to various example embodiments of the present disclosure;
  • FIG. 8 is a diagram illustrating an example of a screen interface related to execution of a personalized voice function according to various example embodiments of the present disclosure;
  • FIG. 9 is a diagram illustrating an example of a screen interface related to setting of personalized voice information according to various example embodiments of the present disclosure;
  • FIG. 10 is a block diagram illustrating an example of an electronic device according to various example embodiments of the present disclosure; and
  • FIG. 11 is a block diagram illustrating another example of an electronic device according to various example embodiments of the present disclosure.
  • DETAILED DESCRIPTION
  • Hereinafter, various example embodiments of the present disclosure will be described in greater detail with reference to the accompanying drawings. However, it should be understood that the present disclosure is not limited to specific example embodiments, but rather includes various modifications, equivalents and/or alternatives of various example embodiments of the present disclosure. Regarding description of the drawings, like reference numerals may refer to like elements.
  • The term “have”, “may have”, “include”, “may include”, “comprise”, or the like used herein indicates the existence of a corresponding feature (e.g., a number, a function, an operation, or an element) and does not exclude the existence of an additional feature.
  • The term “A or B”, “at least one of A and/or B”, or “one or more of A and/or B” may include all possible combinations of items listed together. For example, the term “A or B”, “at least one of A and B”, or “at least one of A or B” may indicate all the cases of (1) including at least one A, (2) including at least one B, and (3) including at least one A and at least one B.
  • The term “first”, “second” or the like used herein may modify various elements regardless of the order and/or priority thereof, but does not limit the elements. For example, “a first user device” and “a second user device” may indicate different user devices regardless of the order or priority. For example, without departing the scope of the present disclosure, a first element may be referred to as a second element and vice versa.
  • It will be understood that when a certain element (e.g., a first element) is referred to as being “operatively or communicatively coupled with/to” or “connected to” another element (e.g., a second element), the certain element may be coupled to the other element directly or via another element (e.g., a third element). However, when a certain element (e.g., a first element) is referred to as being “directly coupled” or “directly connected” to another element (e.g., a second element), there may be no intervening element (e.g., a third element) between the element and the other element.
  • The term “configured (or set) to” may be interchangeably used with the term, for example, “suitable for”, “having the capacity to”, “designed to”, “adapted to”, “made to”, or “capable of”. The term “configured (or set) to” may not necessarily have the meaning of “specifically designed to”. In some examples, the term “device configured to” may indicate that the device “may perform” together with other devices or components. For example, the term “processor configured (or set) to perform A, B, and C” may represent a dedicated processor (e.g., an embedded processor) for performing a corresponding operation, processing circuitry or a general-purpose processor (e.g., a CPU or an application processor) for executing at least one software program stored in a memory device to perform a corresponding operation.
  • The terminology used herein is only used for describing example embodiments and is not intended to limit the scope of other embodiments. The terms of a singular form may include plural forms unless otherwise specified. The terms used herein, including technical or scientific terms, have the same meanings as understood by those skilled in the art. Commonly-used terms defined in a dictionary may be interpreted as having meanings that are the same as or similar to contextual meanings defined in the related art, and should not be interpreted in an idealized or overly formal sense unless explicitly defined otherwise. The terms defined herein should not be such interpreted to exclude the various example embodiments of the present disclosure.
  • Hereinafter, an electronic device according to various example embodiments of the present disclosure will be described with reference to the accompanying drawings. The term “user” used herein may refer to a person who uses an electronic device or may refer to a device (e.g., an artificial electronic device) that uses an electronic device.
  • FIG. 1 is a diagram illustrating an example personalized voice function providing environment according to various example embodiments of the present disclosure.
  • Referring to FIG. 1, the personalized voice function providing environment may provide a first-state voice function module 10 s of an electronic device for receiving audio signals input by a plurality of speakers 10 a to 10 c in relation of a speaking independent setting. The first-state voice function module 10 s may include, for example, at least one of a hardware module comprising hardware circuitry, a firmware module comprising firmware, or a software module related to provision of a voice function prior to application of a personalized voice function. At least one of the speakers 10 a to 10 c may input a voice (or speech information) using the first-state voice function module 10 s.
  • According to various example embodiments of the present disclosure, the first-state voice function module 10 s may perform a voice command function (e.g., a function of recognizing a collected voice, analyzing a voice command based on a result of recognition, and outputting information or performing an available function by an electronic device based on a result of analysis) based on a voice (or speech information) input by the speakers 10 a to 10 c. In relation to this operation, the speakers 10 to 10 c may, for example, input a voice (or a speech or speech information) using at least one microphone included in the first-state voice function module 10 s.
  • The first-state voice function module 10 s may collect candidate data (including, for example, speaker speech information or speech information of each speaker) on the speakers 10 a to 10 c without performing speaker identification in a state in which a personalized voice function (e.g., a function of restricting use of functions of an electronic device differentially specified for each speaker) is not applied. A candidate data collecting operation may be automatically performed based on a specified condition. For example, the candidate data collecting operation may be automatically performed while a voice function is performed. Furthermore, the candidate data collecting operation may be automatically performed while a microphone activating operation is performed. According to various example embodiments of the present disclosure, the candidate data collecting operation may be performed for data obtained through successful voice recognition.
  • According to an example embodiment of the present disclosure, the first-state voice function module 10 s may collect first candidate data 11 a related to the first speaker 10 a. Furthermore, the first-state voice function module 10 s may collect second candidate data 11 b related to the second speaker 10 b and third candidate data 11 c related to the third speaker 10 c. The first-state voice function module 10 s may perform voice function personalization processing (or voice recognition function personalization processing) if at least a specified number of candidate data are collected or collection of candidate data is completed for a specified time. For example, the first-state voice function module 10 s may analyze a plurality of candidate data and may register, as personalized voice information, a speaker recognition model (including, for example, voice recognition information or voice recognition model information) including the first candidate data 11 a related to the first speaker 10 a. Accordingly, the first-state voice function module 10 s may be operated as (or changed into) a second-state voice function module 10 p. The first-state voice function module 10 s may store collected candidate data locally (e.g., in a memory thereof). Alternatively, the first-state voice function module 10 s may, for example, provide the collected candidate data to a specified server device. In the example where the collected candidate data are transmitted to the server device, recognition model training for candidate data may, for example, also be performed in the server device.
  • If the speech information of speakers is collected while a voice recognition function is performed, the second-state voice recognition module 10 p may analyze the collected speech information and may compare an analysis result with the registered personalized voice information. If it is determined, as a result of the comparison, that the speech information corresponds to a speaker recognition model registered as the personalized voice information, the second-state voice function module 10 p may handle execution of a function corresponding to the analysis result of the input speech information. If the result of the comparison indicates, for example, that the input speech information is speech information of the second speaker 10 b or the third speaker 10 c different from the speaker recognition model registered as the personalized voice information (e.g., the speech information of the first speaker 10 a), the second-state voice function module 10 p may not perform a function corresponding to the speech information or may perform a limited function based on a specified policy. When performing the limited function, the second-state voice function module 10 p may output a function execution unavailability message or a limited function execution message. As described above, the personalized voice function providing environment according to various example embodiments of the present disclosure may handle execution of a function of an electronic device in a speaker-dependent manner (e.g., only a voice (or speech information) of a specific speaker is handled as valid information, or another speaker's voice (or speech information) is restrictively handled) based on registration of the personalized voice information.
  • FIG. 2 is a block diagram illustrating an example of an electronic device supporting a voice function according to various example embodiments of the present disclosure.
  • Referring to FIG. 2, an electronic device 100 may include, for example, a communication interface (e.g., including communication circuitry) 110, a memory 130, a microphone module (e.g., including a microphone or microphone circuitry) 140, a display (e.g., including a display panel and/or display processing circuitry) 150, and a control module (e.g., including a processor including processing circuitry) 160.
  • The electronic device 100 may collect candidate data using the microphone module 140 and may operate the control module 160, so as to process the candidate data, register personalized voice information (e.g., a specific speaker recognition model), and/or apply the personalized voice information. Based on this process, the electronic device 100 may handle a personalized voice function for supporting a speaker-dependent function.
  • The communication interface 110 may handle a communication function of the electronic device 100. For example, the communication interface 110 may establish a communication channel to a server device or the like in relation to a call function, a video call function, or the like of the electronic device 100. To this end, the communication interface 110 may include at least one communication module or communication chip/circuitry for supporting various communication standards such as 2G, 3G, 4G, LTE, 5G, etc. Furthermore, the communication interface 110 may include at least one antenna covering a single frequency band or a multi-frequency band. According to various example embodiments of the present disclosure, the communication interface 110 may establish a short-range communication channel to another electronic device in relation to a data transfer function or a call function of the electronic device 100.
  • According to an example embodiment of the present disclosure, the communication interface 110 may be operated in association with a voice function. For example, the communication interface 110 may establish a communication channel in relation to the voice function such as a call function or a voice-recognition-based message sending/receiving function. Furthermore, in relation to a voice command function, the communication interface 110 may establish a communication channel to a server device for analyzing a voice (or speech information) and providing information based on a result of analysis.
  • According to various example embodiments of the present disclosure, the communication interface 110 may be restrictively operated in relation to application of a personalized voice function. For example, the communication interface 110 may be enabled based on a speech information input corresponding to a speaker recognition model registered as personalized voice information. Alternatively, the communication interface 110 may establish a communication channel to a specified server device (e.g., a web server device for management of financial information, stock information, or specific information) in response to a speech information input from a specific recognized speaker.
  • The memory 130 may store various information related to operation of the electronic device 100. For example, the memory 130 may store an operating system required for operating the electronic device 100, at least one program related to support for a user function, etc. According to an example embodiment of the present disclosure, the memory 130 may store a personalized voice program to support a personalized voice function. Furthermore, the memory 130 may store voice data information 131 and personalized voice information 133 related to operation of the personalized voice program.
  • The voice data information 131 may include a voice signal (e.g., speech information) input from at least one speaker or an audio signal collected when the microphone module 140 is enabled. According to an example embodiment of the present disclosure, pieces of speech information from which a noise or a band other than a human voice band has been removed may be stored as candidate data of the voice data information 131. According to an example embodiment of the present disclosure, the voice data information 131 may include pieces of speech information, of which a speech interval has a length of at least a specified time, as a plurality of candidate data. Furthermore, the voice data information 131 may include a specified number of pieces of speech information as candidate data or may include pieces of speech information collected for a specified time as candidate data. A function of collecting the voice data information 131 may, for example, be automatically performed when the microphone module 140 is enabled in relation to execution of a voice function. Furthermore, this function may be automatically ended on completion of collecting the voice data information 131. According to various example embodiments of the present disclosure, the function of collecting the voice data information 131 may be automatically performed if specified voice recognition is successful, and may be automatically ended immediately after the collection is completed or after elapse of a specified time.
  • The personalized voice information 133 may be related to candidate data selected by applying a specified algorithm or process to the voice data information 131. For example, the personalized voice information 133 may be a speaker recognition model generated from candidate data related to a specific speaker (e.g., candidate data having a relatively large population in the voice data information 131) from among the plurality of candidate data included in the voice data information 131. Alternatively, the personalized voice information 133 may be candidate models obtained by modeling the candidate data related to the specific speaker. Alternatively, the personalized voice information 133 may be any one of the candidate data of the specific speaker, or information obtained by combining audio features detected from each candidate data, or a speaker recognition model including the audio features.
  • According to an example embodiment of the present disclosure, the personalized voice information 133 may include at least one phonemic model (e.g., a signal or information obtained by dividing speech information by phoneme such as h, ai, g, ae, l, ax, k, s, iy) constituting speech information (e.g., a signal or information obtained by speaking speech reference information such as, for example, ‘high galaxy’ by a specific speaker) obtained by speaking speech reference information (e.g., readable specified information such as characters or numbers, for example, ‘high galaxy’) by a specific speaker. Furthermore, even if a speaker speaks the same speech reference information, different phonemic models of various forms (e.g., phonemic signals or pieces of information with different pitches, tones, or timbres with respect to the same phonemic model such as ‘ha’) may be obtained with respect to the same reference phoneme (e.g., information obtained by dividing speech reference information by phoneme, for example, hi, ga, lax, sy, etc.), depending on a throat state of the speaker or an environment. For example, “h-a” or “h-ai” may be collected as a phonemic model corresponding to a reference phoneme “hi”. Here, “h-a” or “h-ai” may be collected as different phonemic models with various pitches, tones, or timbres for each situation. As described above, the personalized voice information 133 may include at least one phonemic model included in speech information obtained by speaking specified speech reference information (e.g., at least one specified word, phrase, clause, sentence, etc.), so that, with respect to one reference phoneme, one or more phonemic models for each situation may be associated or one reference phoneme may be indicated.
  • The microphone module 140 may include at least one microphone. In the case where one microphone is disposed, the microphone module 140 may enable the microphone in response to control by the control module 160, and may transfer a collected audio signal to the control module 160 through the enabled microphone. Alternatively, the microphone module 140 may remain in a turned on state and may collect an audio signal while the electronic device 100 is supplied with power or the control module 160 is operated, in response to control by the control module 160. According to various example embodiments of the present disclosure, the microphone module 140 may include a plurality of microphones. The microphone module 140 may be automatically enabled, for example, when candidate data corresponding to the voice data information 131 are collected. For example, if the electronic device 100 is in a turned on state, the electronic device 100 may collect speech information corresponding to candidate data by automatically enabling the microphone module 140 for a specified time or until a specified number of candidate data is satisfied in order to collect candidate data. Alternatively, if the microphone module 140 is enabled (e.g., enabled as a voice function is performed), the electronic device 100 may determine whether it is required to collect candidate data so as to automatically collect speech information.
  • The display 150 may output various screens related to operation of the electronic device 100. For example, the display 150 may output a lock screen, a menu screen, a home screen, a screen on which at least one icon is disposed, a screen to which a background image is output, a specific function execution screen, or the like. According to an example embodiment of the present disclosure, the display 150 may output a screen related to execution of a voice function. For example, the display 150 may output a screen related to execution of a voice command function, a screen related to execution of a voice recording function, a screen related to execution of a voice call function, a screen related to execution of a voice recognition function, or the like in response to execution of a corresponding application.
  • Furthermore, the display 150 may output at least one information (e.g., a text, an image, or the like) related to operation of a personalized voice function. For example, the display 150 may output at least one of an icon, a menu, an indicator, or a guide text related to setting of the personalized voice function. Furthermore, the display 150 may output a message, a text, an indicator, or the like for notifying application of the personalized voice function. Moreover, the display 150 may output a personalized voice function setting screen in response to control by a user input. Additionally, or alternatively, the electronic device 100 may further include various information output units such as a speaker, a vibration module, a lamp, etc. The information output units may output various information related to operation of the personalized voice function using an audio, at least one specified vibration pattern, or at least one specified flickering pattern.
  • The control module 160 may be configured to perform signal flow control, signal processing control, and information processing in relation to operation of the electronic device 100. For example, the control module 160 may be configured to control setting of the personalized voice function (e.g., setting for collecting the voice data information 131 for registering the personalized voice information 133). The control module 160 may be configured to handle extraction and registration of the personalized voice information 133 on completion of collecting the voice data information 131. The control module 160 may be configured to handle application of the personalized voice function based on the registered personalized voice information 133. Based on the above-mentioned control, the control module 160 may be configured to allow a specified voice function to be applied in response to speech information input from a specific speaker or may limit a voice function (e.g., allow access to only a part of the function or prevent the function from being executed) in response to speech information input from a non-specific speaker.
  • FIG. 3 is a block diagram illustrating an example of a control module according to various example embodiments of the present disclosure.
  • Referring to FIG. 3, the control module 160 may include a microphone control module 161, a voice data collecting module 163, an information processing module 165, and an information updating module 167. Each of the foregoing modules may, for example, be embodied by processor including processing circuitry configured to perform the operations of the various modules.
  • The microphone control module 161 may be configured to control enablement and audio signal collection of the microphone 140. For example, if the electronic device 100 is in a turned-on state, the microphone control module 161 may maintain a turned-on state (e.g., always turned-on state) of the microphone module 140 based on a setting. In the case where a plurality of microphones is included in the microphone module 140, the microphone control module 161 may control operation of the microphones.
  • According to an example embodiment of the present disclosure, if an audio signal is collected from the microphone module 140, the microphone control module 161 may transfer the collected audio signal to the voice data collecting module 163. In this operation, the microphone control module 161 may, for example, transfer the collected audio signal to the voice data collecting module 163 if the collected audio signal is a signal (or speech information) of a frequency band of a voice of a human being, or may treat (or ignore) the collected audio signal as a noise if, for example, the collected audio signal has a frequency outside the voice frequency band. Alternatively, the microphone control module 161 may transfer the collected audio signal to the voice data collecting module 163 regardless of a frequency band of the collected audio signal. According to various example embodiments of the present disclosure, the microphone control module 161 may transfer, to the voice data collecting module 163, only data from which a voice has been successfully recognized.
  • The microphone control module 161 may be configured to control collecting candidate data related to setting of the personalized voice function is automatically performed when the microphone module 140 is enabled. For example, if the microphone module 140 is enabled in order to execute a voice call function, a voice command function, a voice recognition function, a voice recording function, or the like, the microphone control module 161 may determine whether the personalized voice information 133 is registered. If the personalized voice information 133 is not registered, the microphone control module 161 may automatically collect pieces of speech information to be used as the voice data information 131 and may transfer the speech information to the voice data collecting module 163. If the personalized voice information 133 is registered, the microphone control module 161 may be configured to terminate collection of the speech information to be used as the voice data information 131 automatically.
  • In the example where the microphone control module 161 provides an audio signal regardless of a frequency band thereof, the voice data collecting module 163 may, for example, analyze whether the audio signal has been generated from a human speech. Furthermore, the voice data collecting module 163 may collect pieces of speech information corresponding to a voice frequency band as preliminary candidate group information. In the example where the microphone control module 161 is configured to transmit speech information, a speech information classifying operation of the voice data collecting module 163 may be skipped.
  • The voice data collecting module 163 may be configured to classify preliminary candidate data in the preliminary candidate group which satisfy a specified condition as candidate data of the voice data information 131. For example, the voice data collecting module 163 may classify only preliminary candidate data of which lengths (e.g., speech time) are at least a specified length as the candidate data of the voice data information 131. Furthermore, the voice data collecting module 163 may, for example, classify only preliminary candidate data related to specified speech reference information as the candidate data.
  • According to various example embodiments of the present disclosure, the voice data collecting module 163 may specify the number of candidate data or a time in relation to collection of the voice data information 131. For example, the voice data collecting module 163 may be configured to collect the voice data information 131 for a specified time after a specific event occurs (e.g., after the electronic device 100 is assigned specified personal information (e.g., a personal telephone number provided by a service provider) or after the electronic device 100 firstly accesses a specified base station). Alternatively, if the voice data collecting module 163 is turned on after being turned off for a specified time, the voice data collecting module 163 may be configured to collect the voice data information 131 for a specified time. Alternatively, the voice data collecting module 163 may be configured to collect the voice data information 131 until a specified number of candidate data are collected after setting of the personalized voice function is started. The number of candidate data may be changed based on a setting of a personalized voice function policy or may be changed by user's setting. The voice data collecting module 163 may provide, to the information processing module 165, the voice data information 131 including the specified number of candidate data or candidate data collected for a specified time.
  • The information processing module 165 may be configured to select the personalized voice information 133 from the voice data information 131. For example, the information processing module 165 may select arbitrary candidate data from the voice data information 131 and may perform voice feature (e.g., a unique voice feature of each speaker, such as a timbre) comparison between the selected candidate data and another candidate data. The information processing module 165 may classify (e.g., by clustering) candidate data by performing the feature comparison. For example, an unsupervised learning method such as vector quantization may be used. The information processing module 165 may select candidate data, the number of which is relatively large, from among classified candidate data. The arbitrary candidate data may be selected from among, for example, initially collected candidate data, lastly collected candidate data, and candidate data collected in a specified certain time slot.
  • The information processing module 165 may be configured to register selected candidate data as the personalized voice information 133. In this operation, the information processing module 165 may provide a guide on whether to register the personalized voice information 133, and may, for example, request user approval. For example, the information processing module 165 may provide a popup window providing a query on whether to register specified candidate data as the personalized voice information 133, and may handle registration of the personalized voice information 133 based on a user confirmation. The information processing module 165 may be configured to output time information about collection times of the candidate data or voice recognition information of the candidate data output together with the candidate data in order to differentiate the candidate data.
  • When a specified voice function such as a voice command function is performed, the information processing module 165 may be configured to perform speaker identification based on collected speech information and the registered personalized voice information 133. The information processing module 165 may be configured to differentiate a function to be performed based on a result of speaker identification. For example, in the case where speech information of a speaker registered in the personalized voice information 133 is collected, the information processing module 165 may perform a function to be performed in response to speech information recognition. Alternatively, in the case where speech information of a speaker not registered in the personalized voice information 133 is collected, the information processing module 165 may notify that information output or function execution corresponding to speech information is unable to be performed.
  • The information processing module 165 may be configured to perform multi-condition training while performing modeling based on data included in the voice data information 131. In relation to this operation, the information processing module 165 may handle various effects for the data included in the voice data information 131. For example, the information processing module 165 may provide a specified sound effect to the data included in the voice data information 131 and may generate candidate data based on the sound effect, or may generate candidate data with which a specified noise is combined. The information processing module 165 may extract a speaker model to be registered as the personalized voice information 133, by applying multi-condition-trained candidate data (e.g., data to which a specified sound effect is added or data to which a noise is added) together with data included in other voice data information 131. According to various example embodiments of the present disclosure, the information processing module 165 may generate multi-condition training models in relation to candidate data included in a cluster having a relatively large number of candidate data after, for example, clustering candidate data included in the voice data information 131. Furthermore, the information processing module 165 may be configured so that multi-condition training models generated based on candidate data included, for example, in a cluster of a specific speaker are used for determining a speaker recognition model.
  • The information processing module 165 may use a universal background model (UBM) during a speaker modeling process for candidate data included in the voice data information 131. UBM information may include a statistical model generated based on features of speech information of various persons. The UBM information may be generated based on non-speaker data during a process of calculating a speaker recognition model of a speaker specified in the voice data information 131. The non-speaker data may, for example, be differentiated from speaker data based on the above-mentioned clustering method.
  • The information updating module 167 may be configured to handle modification, adaptation or enhancement of the personalized voice information 133. In relation to this operation, the information updating module 167 may request and receive, from the microphone control module 161, an audio signal collected by the microphone module 140, and may extract information to which the personalized voice information 133 is to be adapted. For example, the information updating module 167 may check whether the collected audio signal includes user's speech information (including at least one of a wakeup audio signal related to a voice function or a voice command audio signal). In the example where the speech information is included in the collected audio signal, the information updating module 167 may check whether phonemes corresponding to phonemic models included in the specified personalized voice information are included in the collected speech information. In this operation, the information updating module 167 may collect new phonemic samples corresponding to the phonemic models included in the personalized voice information 133 by performing voice recognition on the collected speech information, and may perform phonemic model training based on the collected phonemic samples. Furthermore, the information updating module 167 may perform enhancement (or adaption or the like) of the phonemic models of the personalized voice information 133 according to the phonemic model training.
  • The information updating module 167 may check an adaptation ratio (or an adaptation degree or an enhancement ratio) of the personalized voice information 133 adapted using the collected speech information. For example, the information updating module 167 may determine whether a frequency of information update of the personalized voice information 133 by newly collected speech information is equal to or higher than a specified value. If the newly collected speech information is already obtained speech information, additional update may not occur. The information updating module 167 may determine that the adaptation ratio is high if the update frequency is high (e.g., the number of pieces of speech information used for update from among a certain number of collected pieces of speech information is at least a specified value), or may determine that the adaptation ratio is low if the update frequency is low and may terminate adaptation of the personalized voice information 133.
  • The information updating module 167 may automatically collect speech information when the microphone module 140 is enabled in relation to adaptation of the personalized voice information 133. If a function of adapting the personalized voice information 133 is ended (e.g., the adaptation ratio is equal to or lower than a specified condition), the information updating module 167 may automatically end collection of speech information related to adaptation of the personalized voice information 133. The information updating module 167 may be configured so that specified information is output through the display 150 in relation to starting or automatic ending of adaptation-related speech information collection.
  • FIG. 4 is a diagram illustrating an example candidate group handling method related to speaker-dependent setting according to various example embodiments of the present disclosure.
  • Referring to FIG. 4, the electronic device 100 may collect a specified number of pieces of the voice data information 131 or may collect the voice data information 131 for a specified time. The collected voice data information 131 may include, for example, pieces of speech information 401 a to 401 c corresponding to candidate data spoken by three speakers. If collection of the pieces of speech information 401 a to 401 c is completed, the electronic device 100 may classify the pieces of speech information 401 a to 401 c.
  • In relation to this operation, the electronic device 100 may select any one arbitrary piece of speech information 401 from among the collected pieces of speech information 401 a to 401 c based on a specified condition. If the arbitrary speech information 401 is selected, the electronic device 100 may convert the arbitrary speech information 401 into a first temporary model 460 a. If the first temporary model 460 a is generated, the electronic device 100 may compare the first temporary model 460 a with the pieces of speech information 401 a to 401 c, and may assign a score to each of the pieces of speech information 401 a to 401 c. For example, the electronic device 100 may assign a low score to speech information similar to the first temporary model 460 a, and may assign a high score to speech information having no similarity with the first temporary model 460 a. The electronic device 100 may sort the pieces of speech information 401 a to 401 c in order of score.
  • Furthermore, the electronic device 100 may cluster the pieces of speech information 401 a to 401 c in order of score as illustrated in the center of FIG. 4. As illustrated in FIG. 4, three data from among pieces of the first speech information 401 a spoken by a first speaker and one piece of data from among pieces of the second speech information 401 b spoken by a second speaker may be clustered as one group. Furthermore, one piece of the first speech information 401 a spoken by the first speaker, the second speech information 401 b, and the third speech information 401 c may be clustered as separate groups respectively.
  • The electronic device 100 may detect a second temporary model 460 b using pieces of information 403 clustered with pieces of speech information having low scores. Furthermore, the electronic device 100 may compare the pieces of speech information 401 a to 401 c with the second temporary model 460 b generated based on the clustered pieces of information 403. Accordingly, as illustrated in FIG. 4, the first speech information 401 a obtains lowest scores (or scores equal to or higher than a specified threshold), and the second speech information 401 b and the third speech information 401 c obtain relatively high scores (or scores equal to or lower than the specified threshold). The electronic device 100 may re-perform clustering based on the scores, thereby obtaining a cluster including pieces of the first speech information 401 a, a cluster including pieces of the second speech information 401 b, and a cluster including the third speech information 401 c, as illustrated in FIG. 4. Based on the above result, the electronic device 100 may register the cluster including the pieces of the first speech information 401 a as the personalized voice information 133.
  • FIG. 5 is a diagram illustrating an example personalized voice information update according to various example embodiments of the present disclosure.
  • Referring to FIG. 5, the personalized voice information 133 of a specific speaker may be audio information corresponding to speech reference information “Hi Galaxy”. In this example, as described above, the personalized voice information 133 may include phonemic models for each of “h-ai-g-ae-l-ax-k-s-iy” as illustrated in FIG. 5. According to an example embodiment of the present disclosure, the personalized voice information 133 may include, for example, a “ha” registration phonemic model 501, as a phonemic model. Furthermore, the personalized voice information 133 may include a registration frequency model 510 related to the corresponding registration phonemic model 501 when the speaker speaks “Hi Galaxy”.
  • The electronic device 100 may enable the microphone module 140 based on a specified condition. As illustrated in FIG. 5, the microphone 140 may collect audio information obtained by speaking speech reference information such as “How's the weather?” by a specific speaker. In this example, the electronic device 100 may extract phonemic models “h-aw-s-th-ax-w-eh-th-er” for the speech reference information. The electronic device 100 may collect a new phonemic model 503 of the same “ha” from among the extracted phonemic models. Furthermore, the electronic device 100 may collect a new frequency model 530 corresponding to the new phonemic model 503.
  • In response to the same phonemic model “ha”, the electronic device 100 may store the new phonemic model 503 and the new frequency model 530 in association with the registration phonemic model 501 and the registration frequency model 510, or may integrate and store the foregoing models and frequencies as one phonemic model group. As described above, the electronic device 100 may extract a phonemic model and a frequency model from speech information spoken by a specific speaker so as to extend a model group of the registered personalized voice information 133. Based on this extended model group, the electronic device 100 may more accurately recognize specified speech reference information registered as the personalized voice information 133 even if a speaker speaks the speech reference information in various situations.
  • As described above, according to various example embodiments of the present disclosure, an electronic device according to an example embodiment of the present disclosure may include a memory for storing at least a portion of a plurality of pieces of speech information used for voice recognition, and a control module for generating voice recognition information based on at least a portion of the plurality of pieces of speech information, wherein the control module may select speaker speech information from at least a portion of the plurality of pieces of speech information based on mutual similarity, and may generate the voice recognition information to be registered as personalized voice information based on the speaker speech information.
  • According to various example embodiments of the present disclosure, the control module may be configured so that a message for applying the voice recognition information to the voice recognition is output.
  • According to various example embodiments of the present disclosure, the control module may be configured so that the pieces of speech information are collected for a specified time or until a specified number of the pieces of speech information is satisfied.
  • According to various example embodiments of the present disclosure, the control module may be configured to generate multi-condition training models of the plurality of pieces of speech information, and may use the multi-condition training models to determine the voice recognition information to be registered as the personalized voice information.
  • According to various example embodiments of the present disclosure, the control module may be configured to generate multi-condition training models of pieces of the speaker speech information, and may use the multi-condition training models to determine the voice recognition information to be registered as the personalized voice information.
  • According to various example embodiments of the present disclosure, the control module may be configured so that other speech information input from a specific speaker corresponding to the personalized voice information is collected and a model of the personalized voice information is adapted.
  • According to various example embodiments of the present disclosure, the control module may be configured so that a phonemic sample corresponding to a registered phonemic model included in the personalized voice information is extracted from the speech information input from the specific speaker and is used to adapt the registered phonemic model.
  • According to various example embodiments of the present disclosure, in the example where new speech information newly input is not a speech of the specific speaker corresponding to the personalized voice information, the control module may be configured so that a message of unavailability of function execution based on the new speech information is output or may selectively control the function execution based on the type of a function requested by the new speech information.
  • According to various example embodiments of the present disclosure, the control module may be configured so that the function is not performed if the function is a specified secure function or the function is performed if the function is a non-secure function not specified.
  • According to various example embodiments of the present disclosure, the control module may be configured so that a setting screen is output for setting at least one function item to be executed based on a voice function in response to a speech information input from a speaker specified based on the personalized voice information.
  • As described above, according to various example embodiments of the present disclosure, an electronic device according to an example embodiment of the present disclosure may include a memory for storing voice data information including pieces of speech information as candidate data, and a control module configured so that one piece of speaker-related information is selected from the candidate data, wherein the control module may be configured so that the candidate data are clustered based on mutual similarity, and specified personalized voice information is registered to be used to restrict execution of a function based on whether specified speech information is input, based on candidate data with the same similarity, the number of which is relatively large.
  • FIG. 6 is a flowchart illustrating an example method of personalized voice during operation of a voice function according to various example embodiments of the present disclosure.
  • Referring to FIG. 6, in operation 601, if an event occurs, the control module 160 of the electronic device 100 may be configured to determine whether the event is related to setting of a personalized voice function. For example, the control module 160 may be configured to determine whether the event is for executing a specified function for personalized voice, or is related to automatic execution of a personalized voice function, or is for executing specified function such as a voice recognition function.
  • If the event is not related to setting of the personalized voice function, the control module 160 may be configured to control execution of a function based on the type of the event that has occurred in operation 603. For example, the control module 160 may check the type of the event, and may handle playback of a music file, transfer of a specified file, execution of a call function, or execution of a web access function based on the type of the event.
  • If the event is related to setting of the personalized voice function, the control module 160 may collect candidate data as the voice data information 131 in operation 605. In relation to this operation, the control module 160 may enable the microphone 140 if the electronic device 100 is in a turned-on state or at a specified time. The control module 160 may collect a specified number of candidate data at a specified period, or in real time, or when an audio signal having a specified intensity or higher occurs. According to an example embodiment of the present disclosure, the control module 160 may be configured to perform a candidate group collecting operation until the number of candidate data becomes a specified number. According to an example embodiment of the present disclosure, the control module 160 may be configured to automatically enable the microphone module 140 for a specified time (e.g., one hour, one day, one week, one month, or the like) after the electronic device 100 is purchased, so as to collect candidate data. Alternatively, the control module 160 may be configured to collect candidate data until specified number of candidate data are collected or for a specified time, when a voice function (e.g., a call function, a voice recognition function, a recording function, a voice command function, or the like) is operated.
  • In operation 607, the control module 160 may be configured to process the voice data information 131 and may extract the personalized voice information 133. For example, the control module 160 may be configured to extract clusters including candidate data spoken by the same speaker by performing comparison between collected pieces of the voice data information 131 with a temporary model and performing clustering of the collected pieces of the voice data information 131. The control module 160 may be configured to compare data of the extracted clusters so as to extract candidate data of a cluster having a largest number of data and register the extracted candidate data as the personalized voice information 133.
  • In operation 609, the control module 160 may be configured to handle application of personalized voice information. If the personalized voice information 133 is registered, the control module 160 may be configured to compare speaker speech information input thereafter with data of the personalized voice information 133 to check similarity therebetween. Furthermore, if the similarity satisfies a specified condition (e.g., a similarity degree is equal to or higher than a specified value), the control module 160 may recognize the input speech information as speech information of a specific speaker. If it is determined that the input speech information is the speech information of the specific speaker, the control module 160 may be configured to control a voice function for the speech information. For example, the control module 160 may perform voice recognition on the speech information, and may control execution of a specified function based on a voice recognition result. Alternatively, the control module 160 may support at least one of retrieval and output of internal information of the electronic device 100 with respect to the voice recognition result or retrieval and output of information using an external server device in relation to the voice recognition result.
  • If the input speech information is not the speech information of the specific speaker, the control module 160 may be configured to output a guide text for notifying that a speaker of the input speech information is not the specific speaker, or may support execution of a specified function according to a user's setting or a set policy. For example, the control module 160 may perform retrieval and output of information related to the result of voice recognition from the speech information using an external server device. Alternatively, in the case where the speaker of the input speech information is not the specific speaker, the control module 160 may be configured to check the type of information or the type of a function to be performed by the speech information based on the user's setting or policy, and may restrictively or selectively perform function execution or information output.
  • FIG. 7 is a flowchart illustrating an example personalized voice information update method according to various example embodiments of the present disclosure.
  • Referring to FIG. 7, in operation 701, the control module 160 may be configured to determine whether a personalized voice function is currently executed or an event that has occurred is related to execution of the personalized voice function. If the personalized voice function is not currently executed or there is no occurrence of the related event, the control module 160 may support execution of a specified function or control of a specified state in operation 703. For example, the control module 160 may support a camera function or a music playback function according to the type of the event. Alternatively, the control module 160 may maintain a sleep mode.
  • If there is a setting (e.g., a setting for automatically supporting an always-on state) related to execution of the personalized voice information or an event (e.g., an event of requesting enablement of the microphone module 140 in relation to execution of the personalized voice function) occurs, the control module 160 may be configured to collect adaptation (or enhancement) information in operation 705. For example, the control module 160 may be configured to enable the microphone module 140 and may collect speech information having a specified length or longer or speech information corresponding to specified speech reference information.
  • In operation 707, the control module 160 may be configured to perform personalized voice information adaptation. According to an example embodiment of the present disclosure, the control module 160 may be configured to collect phonemic models from various information spoken by a specific speaker, and may store or integrate the collected models in association with phonemic models having the same reference phonemes as those of phonemic models registered as the personalized voice information 133. Alternatively, the control module 160 may be configured to collect only speech information corresponding to the specified speech reference information, and may manage phonemic models corresponding to the same reference phonemes in the collected speech information by integrating the phonemic models into one model group.
  • In operation 709, the control module 160 may determine whether an adaption ratio (or an adaptation degree or an enhancement ratio) satisfies a specified condition. For example, the control module 160 may be configured to check the degree of similarity between the phonemic models in the collected speech information and phonemic models being managed and an information update ratio based on the degree of similarity, and may specify the adaptation ratio based on the update ratio or update frequency. If the adaptation ratio does not satisfy the specified condition, the process may return to operation 701 so that the control module 160 may re-perform operation 701 and the following operations. If the adaptation ratio satisfies the specified condition, the control module 160 may end a personalized voice information adaptation function.
  • FIG. 8 is a diagram illustrating an example of a screen interface related to execution of a personalized voice function according to various example embodiments of the present disclosure.
  • Referring to FIG. 8, the control module 160 of the electronic device 100 may be configured to output, to the display 150, a screen corresponding to activation of a voice function (e.g., a voice command function) as illustrated in a screen 801. In this operation, if a personalization function is not currently applied, the control module 160 may output a guide message 811 for providing a notification that the personalization function is being set. The guide message 811 may include at least one of a text or an image for notifying that candidate group information is being collected in relation to setting of the personalized voice function. Output of the guide message 811 may be skipped based on a setting or a user input. Alternatively, as illustrated in FIG. 8, the control module 160 may output, to a specified area (e.g., an indicator area), a first indicator 810 for notifying that the personalization function is being set.
  • If a specific audio signal is input while the personalization function is being set, the control module 160 may be configured to determine whether a collected audio signal is speech information corresponding to a voice by checking a frequency band of the audio signal. If the audio signal is the speech information, the control module 160 may collect it as the voice data information 131. Alternatively, even if the audio signal is the speech information, the control module 160 may determine whether a specified condition (e.g., speech information having at least a certain length or speech information corresponding to specified speech reference information) is satisfied. The control module 160 may be configured to collect pieces of the speech information satisfying the specified condition as the voice data information 131. According to various example embodiments of the present disclosure, the control module 160 may collect an audio signal as the voice data information 131 or may collect an audio signal of which a signal existence state is maintained for at least a certain length as the voice data information 131. Furthermore, if the voice data information 131 is collected by as much as a specified amount or for a specified time, the control module 160 may evaluate the collected voice data information 131 with respect to division of speech information or correspondence to speech reference information.
  • If the personalized voice information 133 is registered since collecting and processing of the voice data information 131 for executing a personalized voice function are completed, the control module 160 may output a guide message 831 for notifying that the personalization function is being applied, as illustrated in a screen 803. The guide message 831 may include at least one of a text or an image indicating that the personalized voice function is being applied. Output of the guide message 831 may be skipped based on a setting or a user control input. Alternatively, the control module 160 may output, to a specified area (e.g., an indicator area), a second indicator 830 for notifying that the personalized voice function is being applied.
  • The control module 160 may be configured to perform training for voice modeling, after sufficient voice samples are obtained for a specified time or a specified number or a specified amount of sufficient voice samples are obtained. If, for example, it is determined that a sufficient speaker recognition performance is obtained since a training result brings about a specified amount (e.g., equal to or larger than a specified sample number or specified reliability), the control module 160 may provide, to a user, a recommendation or selection message for inducing the user to use a personalized voice recognition function. In this operation, the control module 160 may request user's approval (e.g., confirmation according to a popup message output) for updating a model.
  • If an audio signal is input while the personalization function is applied, the control module 160 may analyze the input audio signal. Based on a result of audio signal analysis, the control module 160 may support function execution or restrictive function execution. For example, if a first voice command 820 is collected, the control module 160 may analyze the first voice command 820 and may classify it as a request for non-secure function execution. According to an example embodiment of the present disclosure, in the case where the analyzed first voice command 820 includes a non-specified word (e.g., weather, news, bus information, etc.), the control module 160 may classify the first voice command 820 as a request for non-secure function execution. Alternatively, in the case where the first voice command 820 does not include a specified word (e.g., cost, card, mail, message, call history, etc.), the control module 160 may classify the first voice command 820 as a request for secure function execution. Alternatively, the control module 160 may determine whether the type of an application to be executed by the first voice command 820 is a secure function or a non-secure function. In relation to this operation, the electronic device 100 may include classification information on a secure function or a non-secure function for each application type.
  • If the first voice command 820 for a non-secure function or a function not specified by a user as a speaker-dependent function is collected, the control module 160 may collect and output information for the first voice command 820. For example, as illustrated in a screen 805, the control module 160 may output first voice recognition information 851 corresponding to the first voice command 820, and may output first execution information 853 as a result of performing a function or retrieval corresponding to the first voice recognition information 851.
  • If a second voice command 840 is collected while the personalized voice function is applied, the control module 160 may perform speaker analysis (e.g., comparison with the personalized voice information 133) on the second voice command 840, and may process the second voice command 840 only if analyzed speaker information indicates a registered speaker. For example, if it is determined that a speaker indicated as a result of analysis is not a registered speaker, the control module 160 may output a message related to unavailability of processing the second voice command 840.
  • According to various example embodiments of the present disclosure, the control module 160 may evaluate the collected second voice command 840, and may determine whether the second voice command 840 is related to a secure function or a function specified as a speaker-dependent function. If the second voice command 840 is related to a non-secure function or a function not specified as a speaker-dependent function, the control module 160 may handle execution of a function based on the second voice command 840 without additionally checking the personalized voice information 133. Alternatively, if the second voice command 840 is related to a secure function or a speaker-dependent function, the control module 160 may identify a speaker of the second voice command 840 using the personalized voice information 133. Furthermore, if it is determined that the speaker of the second voice command 840 is a specific speaker, the control module 160 may execute a function corresponding to the second voice command 840. If the second voice command 840 is not speech information input from a specific speaker, the control module 160 may output, in response to the second voice command 840, a restrictive message 873 of user identification or unavailability of function execution. For example, the control module 160 may selectively output second voice recognition information 871 for the second voice command 840.
  • FIG. 9 is a diagram illustrating an example of a screen interface related to setting of personalized voice information according to various example embodiments of the present disclosure.
  • Referring to FIG. 9, if an event related to voice function setting occurs, the control module 160 of the electronic device 100 may output, to the display 150, a setting screen as illustrated in a screen 901. The setting screen may include items related to voice function setting, such as an external server use item, a personalization function operation item, and a voice output item. As illustrated in FIG. 9, a virtual reset button 911 may be assigned to the personalization function operation item in relation to personalization function setting or application.
  • If the virtual reset button 911 is selected, the control module 160 may support resetting of the voice data information 131 or the personalized voice information 133 obtained in relation to personalization function setting or application. In relation to this operation, the control module 160 may output, to the display 150, a popup window 931 related to initialization as illustrated in a screen 903. The popup window 931 may include, for example, a message for providing a guide on initialization and an authentication information input area for user authentication.
  • In screen 901, if a menu item 913 is selected in relation to personalization function operation, the control module 160 may output a menu screen related to personalization function operation as illustrated in a screen 905. The menu screen may include, for example, items for selecting at least one application to which a personalized voice function is to be applied. For example, the menu screen may include an entire function item 951, a password-set function item 953, and a user customized item 955.
  • The entire function item 951 may be a restrictive item for allowing only a specific speaker to use, through a voice function, all functions supported by applications installed in the electronic device 100. In the case where the entire function item 951 is not set, the electronic device 100 may operate a voice function based on speech information of various users without specifying a speaker.
  • The password-set function item 953 may be a restrictive item for allowing function items related to a secure function to be used based on a voice function and speech information of a specific speaker. According to an example embodiment of the present disclosure, when the password-set function item 953 is selected, the electronic device 100 may provide items of functions that require password authentication when operated according to user designation or items of functions that require password authentication for an application operating schedule among provided applications. A specific function may be excluded from the password-set function item 953 when a password set in an application is released.
  • The user customized item 955 may enable a user to specify an application item to be used based on a voice function and speech information of a specific speaker. If the user customized item 955 is selected, the electronic device 100 may output a list of applications supported by the electronic device 100. Here, the electronic device 100 may automatically remove the password-set function item 953 from a list related to the user customized item 955 to display the list.
  • As described above, according to various example embodiments of the present disclosure, a voice function operating method according to an example embodiment of the present disclosure may include storing at least a portion of a plurality of pieces of speech information used for voice recognition, selecting speaker speech information from at least a portion of the plurality of pieces of speech information based on mutual similarity, and generating voice recognition information to be registered as personalized voice information based on the speaker speech information selected.
  • According to various example embodiments of the present disclosure, the method further comprises at least one of collecting the speech information for a specified time or collecting the speech information until a specified number of candidate data is satisfied.
  • According to various example embodiments of the present disclosure, the method may further include outputting a message for applying the voice recognition information to the voice recognition.
  • According to various example embodiments of the present disclosure, the method may further include generating multi-condition training models of the plurality of pieces of speech information, and applying the multi-condition training models to determine the voice recognition information to be registered as the personalized voice information.
  • According to various example embodiments of the present disclosure, the generating may include generating multi-condition training models of pieces of the speaker speech information, and applying the multi-condition training models to determine the voice recognition information to be registered as the personalized voice information.
  • According to various example embodiments of the present disclosure, the method may further include collecting other speech information input from a specific speaker corresponding to the personalized voice information, and adapting a model of the personalized voice information using the other speech information of the specific speaker.
  • According to various example embodiments of the present disclosure, the adapting may include extracting a phonemic sample corresponding to a registered phonemic model included in the personalized voice information from the speech information input from the specific speaker to use the phonemic sample in adapting the registered phonemic model.
  • According to various example embodiments of the present disclosure, the method may further include outputting, if new speech information newly input is not a speech of the specific speaker corresponding to the personalized voice information, a message of unavailability of execution of a function according to the new speech information, and selectively executing the function according to the type of the function requested by the new speech information.
  • According to various example embodiments of the present disclosure, the executing the function may include not performing the function if the function is a specified secure function and performing the function if the function is a non-secure function and is not specified.
  • According to various example embodiments of the present disclosure, the method may further include outputting a setting screen for setting at least one function item to be executed based on a voice function in response to a speech information input from a speaker specified based on the personalized voice information.
  • As described above, according to various example embodiments of the present disclosure, a voice function operating method according to an example embodiment of the present disclosure may include collecting pieces of speech information as candidate data, clustering the candidate data based on mutual similarity, and registering specified personalized voice information to be used to restrict execution of a function based on whether specified speech information is input, based on candidate data with the same similarity, the number of which is relatively large.
  • FIG. 10 is a block diagram illustrating an example of an electronic device according to various example embodiments of the present disclosure.
  • Referring to FIG. 10, the electronic device 100 may include a control module (e.g., including a processor including processing circuitry) 1060 and a microphone module (e.g., including at least one microphone) 1040.
  • The microphone module 1040 may include, for example, first to Nth microphones 40_1 to 40_N. The first to Nth microphones 40_1 to 40_N may be connected to, for example, the control module 1060. The first to Nth microphones 40_1 to 40_N may be arranged at one side of the electronic device 100 so as to be spaced apart from each other by a certain distance.
  • The microphone module 1060 may control at least one of the microphones included in the microphone module 1040. For example, at a time of setting a personalized voice function, the control module 1060 may enable the first microphone 40_1 and may analyze an audio signal collected by the first microphone 40_1. Furthermore, the control module 1060 may use audio signals collected through the first microphone 40_1 as the voice data information 131. The control module 1060 may also collect pieces of speech information corresponding to the voice data information 131 using the first to Nth microphones 40_1 to 40_N. Alternatively, the control module 1060 may use the first microphone 40_1 alone to collect the voice data information 131, and may use the first to Nth microphones 40_1 to 40_N to adapt (or enhance) the personalized voice information 133.
  • In the example where the microphone module 1040 is required to be maintained in a turned-on state as an always-on function is executed, the electronic device 100 may enable the first microphone 40_1 and may check whether speech information corresponding to specified speech reference information (e.g., “hi galaxy”) is collected. The electronic device 100 may use, for adapting the personalized voice information 133, additional speech information collected in a state in which the other microphones are enabled after the speech information corresponding to the speech reference information is collected. In this operation, the electronic device 100 may support execution of a voice function according to the speech information collected by the microphones 40_1 to 40_N.
  • In a state in which a personalized voice function is not applied, the control module 1060 may support a voice function using the first microphone 40_1 alone. Furthermore, in a state in which the personalized voice function is applied, the control module 1060 may detect speech information corresponding to the speech reference information using the first microphone 40_1, and may collect additional speech information using the microphones 40_1 to 40_N.
  • Alternatively, in the state in which the personalized voice function is not applied, the control module 1060 may collect speech information and may perform analysis on whether the collected speech information corresponds to the speech reference information using the first microphone 40_1 alone. In the state in which the personalized voice function is applied, the control module 1060 may detect speech information corresponding to the speech reference information using a plurality of microphones (e.g., the first and second microphones 40_1 and 40_2). Furthermore, in the state in which the personalized voice function is applied, the control module 1060 may enable the first to Nth microphones 40_1 to 40_N to control collection of additional speech information, if speech information corresponding to the speech reference information is collected.
  • As described above, the electronic device 100 may control operation of the microphones 40_1 to 40_N in consideration of efficient use of power or in order to collect more clear speech information.
  • FIG. 11 is a block diagram illustrating another example of an electronic device according to various example embodiments of the present disclosure.
  • Referring to FIG. 11, the electronic device 100 may include a control module (e.g., including a processor including processing circuitry) 1160 and a microphone module (e.g., including at least one microphone) 1040.
  • The microphone module 1040 may include first to Nth microphones 40_1 to 40_N in a similar manner to that described above with reference to FIG. 10. The plurality of microphones 40_1 to 40_N may be connected to the control module 1160. For example, the first microphone 40_1 from among the plurality of microphones 40_1 to 40_N may be connected to a low-power processing module 1163. The Nth microphone 40_1 from among the plurality of microphones 40_1 to 40_N may be connected to a main control module 1161. Meanwhile, the second to Nth microphones 40_2 to 40_N may be connected to both the low-power processing module 1163 and the main control module 1161. Furthermore, the first microphone 40_1 may be connected to not only the low-power processing module 1163 but also the main control module 1161. Accordingly, the first microphone 40_1 may transfer a collected audio signal to the low-power processing module 1163, or, if the main control module 1161 is in a woken state, the first microphone 40_1 may transfer the collected audio signal to the main control module 1161 or both the low-power processing module 1163 and the main control module 1161.
  • The control module 1160 may include the main control module 1161 and the low-power processing module 1163.
  • The low-power processing module 1163 may, for example, be a processor (e.g., including processing circuitry) driven with relatively low power compared to the main control module 1161. For example, the low-power processing module 1163 may be a chip dedicated to audio signal processing, a sensor hub, or a chip dedicated to speech information processing. The low-power processing module 1163 may be independently driven while the main control module 1161 is in a sleep mode, so as to control driving of the first microphone 40_1 included in the microphone module 1040 and analyze an audio signal collected by the first microphone 40_1. For example, the low-power processing module 1163 may analyze whether the audio signal collected by the first microphone 40_1 is speech information corresponding to a voice, or is speech information corresponding to specified speech reference information, or is speech information spoken by a specific speaker. If the speech information satisfies a specified condition, the low-power processing module 1163 may wake the main control module 1161. In this operation, the low-power processing module 1163 may perform control so that the second to Nth microphones 40_2 to 40_N which are in a disabled state is enabled.
  • In operation of a voice function, the main control module 1161 may be woken by the low-power processing module 1163 after remaining in a sleep mode in consideration of efficient use of power. In this example, the main control module 1161 may enable the second to Nth microphones 40_2 to 40_N, and may collect and analyze additional speech information. The main control module 1161 may control collection of the voice data information 131 for collected pieces of speech information, registration of the personalized voice information 133, and restrictive execution of a voice function according to application of a personalized voice function, as described above with respect to the control module 150.
  • The term “module” used herein may represent, for example, a unit including one of hardware (including hardware circuitry), software and firmware or a combination thereof. The term “module” may be interchangeably used with the terms “unit”, “logic”, “logical block”, “component” and “circuit”. The “module” may be a minimum unit of an integrated component or may be a part thereof. The “module” may be a minimum unit for performing one or more functions or a part thereof. The “module” may be implemented mechanically or electronically. For example, the “module” may include at least one of processing circuitry, hardware circuitry, firmware, an application-specific integrated circuit (ASIC) chip, a field-programmable gate array (FPGA), and a programmable-logic device for performing some operations, which are known or will be developed.
  • At least a part of devices (e.g., modules or functions thereof) or methods (e.g., operations) according to various example embodiments of the present disclosure may be implemented as instructions stored in a computer-readable storage medium in the form of a program module.
  • The module or program module according to various example embodiments of the present disclosure may include at least one of the above-mentioned elements, or some elements may be omitted or other additional elements may be added. Operations performed by the module, the program module or other elements according to various example embodiments of the present disclosure may be performed in a sequential, parallel, iterative or heuristic way. Furthermore, some operations may be performed in another order or may be omitted, or other operations may be added.
  • According to various example embodiments of the present disclosure, the type of voice function that may be operated for each speaker or the type of an application executable by voice recognition may be handled in a speaker-dependent manner.
  • Therefore, according to various example embodiments of the disclosure, the security related to a voice function of an electronic device may be secured.
  • The above example embodiments of the present disclosure are illustrative and not limitative. Various alternatives and equivalents are possible. Other additions, subtractions, or modifications will be apparent in view of the present disclosure and are intended to fall within the scope of the appended claims.

Claims (20)

What is claimed is:
1. An electronic device comprising:
a memory configured to store a plurality of pieces of speech information used for voice recognition; and
a processor including processing circuitry, the processor functionally connected to the memory,
wherein the processor is configured to select speaker speech information from at least portions of the plurality of pieces of speech information based on mutual similarity, and to generate voice recognition information to be registered as personalized voice information based on the speaker speech information.
2. The electronic device of claim 1, wherein the processor is configured to output a message providing a notification that an operation of applying the voice recognition information to voice recognition is being performed.
3. The electronic device of claim 1, wherein the processor is configured to collect the pieces of speech information for at least one of a specified time or until a specified number of the pieces of speech information are collected.
4. The electronic device of claim 1, wherein the processor is configured to generate multi-condition training models to which at least one of at least a part of a noise or a specified sound effect is applied to the plurality of pieces of speech information, and to use the multi-condition training models to determine the voice recognition information to be registered as the personalized voice information.
5. The electronic device of claim 1, wherein the processor is configured to generate multi-condition training models to which at least one of a noise or a specified sound effect is applied to pieces of the speaker speech information, and to determine, based on the multi-condition training models, the voice recognition information to be registered as the personalized voice information.
6. The electronic device of claim 1, wherein the processor is configured to collect other speech information input by a specific speaker corresponding to the personalized voice information and to adapt a model of the personalized voice information.
7. The electronic device of claim 6, wherein the processor is configured to extract a phonemic sample corresponding to a registered phonemic model included in the personalized voice information from the speech information input from the specific speaker, and to adapt the registered phonemic model using the phonemic sample.
8. The electronic device of claim 1, wherein, when new speech information newly input is not a speech of a specific speaker corresponding to the personalized voice information, the processor is configured to output a message of unavailability of execution of a function requested by the new speech information, or to selectively execute the function based on a type of the function requested by the new speech information.
9. The electronic device of claim 8, wherein the processor is configured to not perform the function if the function is a specified secure function or to perform the function if the function is a non-secure function.
10. The electronic device of claim 1, wherein the processor is configured to output a setting screen for setting at least one function item to be executed based on a voice function in response to a speech information input from a speaker specified based on the personalized voice information.
11. A voice function operating method comprising:
storing a plurality of pieces of speech information used for voice recognition;
selecting speaker speech information from at least portions of the plurality of pieces of speech information based on mutual similarity; and
generating voice recognition information to be registered as personalized voice information based on the speaker speech information selected.
12. The method of claim 11, further comprising at least one of:
collecting the speech information for a specified time; or
collecting the speech information until a specified number of candidate data is collected.
13. The method of claim 11, further comprising outputting a message providing a notification that an operation of applying the voice recognition information to the voice recognition is being performed.
14. The method of claim 11, further comprising:
generating multi-condition training models to which at least one of at least a part of a noise or a specified sound effect is applied to the plurality of pieces of speech information; and
applying the multi-condition training models to determine the voice recognition information to be registered as the personalized voice information.
15. The method of claim 11, wherein the generating comprises:
generating multi-condition training models to which at least a part of a noise or a specified sound effect is applied to pieces of the speaker speech information; and
applying the multi-condition training models to determine the voice recognition information to be registered as the personalized voice information.
16. The method of claim 11, further comprising:
collecting other speech information input by a specific speaker corresponding to the personalized voice information; and
adapting a model of the personalized voice information using the other speech information of the specific speaker.
17. The method of claim 16, wherein the adapting comprises extracting, from the speech information input from the specific speaker, a phonemic sample corresponding to a registered phonemic model included in the personalized voice information, and to use the phonemic sample in adapting the registered phonemic model.
18. The method of claim 11, further comprising:
outputting, if new speech information requesting a function is not a speech of the specific speaker corresponding to the personalized voice information, a message of unavailability of execution of the function based on the new speech information; and
selectively executing the function based on a type of the function requested by the new speech information.
19. The method of claim 18, wherein the executing the function comprises:
execution of the function is not performed if the function is a specified secure function; and
execution of the function is performed if the function is a non-secure function not specified.
20. The method of claim 11, further comprising at least one of:
outputting a setting screen for setting at least one function item to be executed based on a voice function in response to a speech information input by a speaker specified based on the personalized voice information; or
outputting the generated voice recognition information.
US15/017,957 2015-02-11 2016-02-08 Operating method for voice function and electronic device supporting the same Abandoned US20160232893A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/998,997 US10733978B2 (en) 2015-02-11 2018-08-20 Operating method for voice function and electronic device supporting the same

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2015-0020786 2015-02-11
KR1020150020786A KR102371697B1 (en) 2015-02-11 2015-02-11 Operating Method for Voice function and electronic device supporting the same

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/998,997 Continuation US10733978B2 (en) 2015-02-11 2018-08-20 Operating method for voice function and electronic device supporting the same

Publications (1)

Publication Number Publication Date
US20160232893A1 true US20160232893A1 (en) 2016-08-11

Family

ID=55349744

Family Applications (2)

Application Number Title Priority Date Filing Date
US15/017,957 Abandoned US20160232893A1 (en) 2015-02-11 2016-02-08 Operating method for voice function and electronic device supporting the same
US15/998,997 Active US10733978B2 (en) 2015-02-11 2018-08-20 Operating method for voice function and electronic device supporting the same

Family Applications After (1)

Application Number Title Priority Date Filing Date
US15/998,997 Active US10733978B2 (en) 2015-02-11 2018-08-20 Operating method for voice function and electronic device supporting the same

Country Status (5)

Country Link
US (2) US20160232893A1 (en)
EP (1) EP3057093B1 (en)
KR (1) KR102371697B1 (en)
CN (1) CN107210040B (en)
WO (1) WO2016129930A1 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180033438A1 (en) * 2016-07-26 2018-02-01 Samsung Electronics Co., Ltd. Electronic device and method of operating the same
US20180061412A1 (en) * 2016-08-31 2018-03-01 Samsung Electronics Co., Ltd. Speech recognition method and apparatus based on speaker recognition
US20180197540A1 (en) * 2017-01-09 2018-07-12 Samsung Electronics Co., Ltd. Electronic device for recognizing speech
US20190013039A1 (en) * 2016-03-10 2019-01-10 Brandon David Rumberg Analog voice activity detection
US20190025878A1 (en) * 2017-07-19 2019-01-24 Samsung Electronics Co., Ltd. Electronic device and system for deciding duration of receiving voice input based on context information
US20210022199A1 (en) * 2019-07-19 2021-01-21 Jvckenwood Corporation Radio apparatus, radio communication system, and radio communication method
US10931999B1 (en) * 2016-06-27 2021-02-23 Amazon Technologies, Inc. Systems and methods for routing content to an associated output device
US20210097158A1 (en) * 2018-01-17 2021-04-01 Samsung Electronics Co., Ltd. Method and electronic device for authenticating user by using voice command
JP2021529978A (en) * 2018-05-10 2021-11-04 エル ソルー カンパニー, リミテッドLlsollu Co., Ltd. Artificial intelligence service method and equipment for it
US11200904B2 (en) 2018-05-25 2021-12-14 Samsung Electronics Co., Ltd. Electronic apparatus, controlling method and computer readable medium
US11631400B2 (en) 2019-02-11 2023-04-18 Samsung Electronics Co., Ltd. Electronic apparatus and controlling method thereof
US11804228B2 (en) 2018-09-10 2023-10-31 Samsung Electronics Co., Ltd. Phoneme-based speaker model adaptation method and device

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10271093B1 (en) * 2016-06-27 2019-04-23 Amazon Technologies, Inc. Systems and methods for routing content to an associated output device
CN107147618B (en) 2017-04-10 2020-05-15 易视星空科技无锡有限公司 User registration method and device and electronic equipment
KR101995443B1 (en) * 2017-07-26 2019-07-02 네이버 주식회사 Method for verifying speaker and system for recognizing speech
CN110709924B (en) * 2017-11-22 2024-01-09 谷歌有限责任公司 Audio-visual speech separation
CN108022584A (en) * 2017-11-29 2018-05-11 芜湖星途机器人科技有限公司 Office Voice identifies optimization method
KR102629424B1 (en) * 2018-01-25 2024-01-25 삼성전자주식회사 Application processor including low power voice trigger system with security, electronic device including the same and method of operating the same
US10984795B2 (en) * 2018-04-12 2021-04-20 Samsung Electronics Co., Ltd. Electronic apparatus and operation method thereof
WO2020040775A1 (en) * 2018-08-23 2020-02-27 Google Llc Regulating assistant responsiveness according to characteristics of a multi-assistant environment
CN109065023A (en) * 2018-08-23 2018-12-21 广州势必可赢网络科技有限公司 A kind of voice identification method, device, equipment and computer readable storage medium
KR102623246B1 (en) * 2018-10-12 2024-01-11 삼성전자주식회사 Electronic apparatus, controlling method of electronic apparatus and computer readable medium
US11430448B2 (en) * 2018-11-22 2022-08-30 Samsung Electronics Co., Ltd. Apparatus for classifying speakers using a feature map and method for operating the same
CN110706706A (en) * 2019-11-01 2020-01-17 北京声智科技有限公司 Voice recognition method, device, server and storage medium
KR102392318B1 (en) * 2022-01-17 2022-05-02 주식회사 하이 A technique for identifying a dementia based on mixed tests

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6088669A (en) * 1997-01-28 2000-07-11 International Business Machines, Corporation Speech recognition with attempted speaker recognition for speaker model prefetching or alternative speech modeling
US6272463B1 (en) * 1998-03-03 2001-08-07 Lernout & Hauspie Speech Products N.V. Multi-resolution system and method for speaker verification
US20030033143A1 (en) * 2001-08-13 2003-02-13 Hagai Aronowitz Decreasing noise sensitivity in speech processing under adverse conditions
US20030088414A1 (en) * 2001-05-10 2003-05-08 Chao-Shih Huang Background learning of speaker voices
US6697779B1 (en) * 2000-09-29 2004-02-24 Apple Computer, Inc. Combined dual spectral and temporal alignment method for user authentication by voice
US8639516B2 (en) * 2010-06-04 2014-01-28 Apple Inc. User-specific noise suppression for voice quality improvements
US20140278435A1 (en) * 2013-03-12 2014-09-18 Nuance Communications, Inc. Methods and apparatus for detecting a voice command
US20150081295A1 (en) * 2013-09-16 2015-03-19 Qualcomm Incorporated Method and apparatus for controlling access to applications
US20160066113A1 (en) * 2014-08-28 2016-03-03 Qualcomm Incorporated Selective enabling of a component by a microphone circuit

Family Cites Families (42)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE3931638A1 (en) * 1989-09-22 1991-04-04 Standard Elektrik Lorenz Ag METHOD FOR SPEAKER ADAPTIVE RECOGNITION OF LANGUAGE
JP3014177B2 (en) * 1991-08-08 2000-02-28 富士通株式会社 Speaker adaptive speech recognition device
JPH07113838B2 (en) * 1991-12-20 1995-12-06 松下電器産業株式会社 Speech recognition method
US5895447A (en) * 1996-02-02 1999-04-20 International Business Machines Corporation Speech recognition using thresholded speaker class model selection or model adaptation
US5842165A (en) * 1996-02-29 1998-11-24 Nynex Science & Technology, Inc. Methods and apparatus for generating and using garbage models for speaker dependent speech recognition purposes
JP2991288B2 (en) * 1997-01-30 1999-12-20 日本電気株式会社 Speaker recognition device
US6014624A (en) * 1997-04-18 2000-01-11 Nynex Science And Technology, Inc. Method and apparatus for transitioning from one voice recognition system to another
US6389393B1 (en) * 1998-04-28 2002-05-14 Texas Instruments Incorporated Method of adapting speech recognition models for speaker, microphone, and noisy environment
US6487530B1 (en) * 1999-03-30 2002-11-26 Nortel Networks Limited Method for recognizing non-standard and standard speech by speaker independent and speaker dependent word models
US6374221B1 (en) * 1999-06-22 2002-04-16 Lucent Technologies Inc. Automatic retraining of a speech recognizer while using reliable transcripts
US6587824B1 (en) * 2000-05-04 2003-07-01 Visteon Global Technologies, Inc. Selective speaker adaptation for an in-vehicle speech recognition system
JP3818063B2 (en) * 2001-01-25 2006-09-06 松下電器産業株式会社 Personal authentication device
US20020143540A1 (en) 2001-03-28 2002-10-03 Narendranath Malayath Voice recognition system using implicit speaker adaptation
FI20010792A (en) 2001-04-17 2002-10-18 Nokia Corp Providing user-independent voice identification
JP2002366187A (en) * 2001-06-08 2002-12-20 Sony Corp Device and method for recognizing voice, program and recording medium
US7209881B2 (en) * 2001-12-20 2007-04-24 Matsushita Electric Industrial Co., Ltd. Preparing acoustic models by sufficient statistics and noise-superimposed speech data
US7353173B2 (en) * 2002-07-11 2008-04-01 Sony Corporation System and method for Mandarin Chinese speech recognition using an optimized phone set
CN1170239C (en) * 2002-09-06 2004-10-06 浙江大学 Palm acoustic-print verifying system
JP4253518B2 (en) * 2003-03-05 2009-04-15 シャープ株式会社 Voice input device, speaker identification device using the same, voice input method and speaker identification method using the same, voice input program, speaker identification program, and program recording medium
DE10313310A1 (en) * 2003-03-25 2004-10-21 Siemens Ag Procedure for speaker-dependent speech recognition and speech recognition system therefor
US7447633B2 (en) * 2004-11-22 2008-11-04 International Business Machines Corporation Method and apparatus for training a text independent speaker recognition system using speech data with text labels
US8255223B2 (en) 2004-12-03 2012-08-28 Microsoft Corporation User authentication by combining speaker verification and reverse turing test
US20060122837A1 (en) * 2004-12-08 2006-06-08 Electronics And Telecommunications Research Institute Voice interface system and speech recognition method
GB0513820D0 (en) * 2005-07-06 2005-08-10 Ibm Distributed voice recognition system and method
JP2007033901A (en) * 2005-07-27 2007-02-08 Nec Corp System, method, and program for speech recognition
CN1932974A (en) * 2005-09-13 2007-03-21 东芝泰格有限公司 Speaker identifying equipment, speaker identifying program and speaker identifying method
US20070156682A1 (en) * 2005-12-28 2007-07-05 Microsoft Corporation Personalized user specific files for object recognition
US7886266B2 (en) * 2006-04-06 2011-02-08 Microsoft Corporation Robust personalization through biased regularization
JP4728972B2 (en) * 2007-01-17 2011-07-20 株式会社東芝 Indexing apparatus, method and program
US20110054894A1 (en) * 2007-03-07 2011-03-03 Phillips Michael S Speech recognition through the collection of contact information in mobile dictation application
JP4812029B2 (en) * 2007-03-16 2011-11-09 富士通株式会社 Speech recognition system and speech recognition program
US7966171B2 (en) * 2007-10-31 2011-06-21 At&T Intellectual Property Ii, L.P. System and method for increasing accuracy of searches based on communities of interest
JP5024154B2 (en) * 2008-03-27 2012-09-12 富士通株式会社 Association apparatus, association method, and computer program
KR101056511B1 (en) 2008-05-28 2011-08-11 (주)파워보이스 Speech Segment Detection and Continuous Speech Recognition System in Noisy Environment Using Real-Time Call Command Recognition
US9418662B2 (en) 2009-01-21 2016-08-16 Nokia Technologies Oy Method, apparatus and computer program product for providing compound models for speech recognition adaptation
US9262612B2 (en) * 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
JP6024180B2 (en) * 2012-04-27 2016-11-09 富士通株式会社 Speech recognition apparatus, speech recognition method, and program
KR20130133629A (en) 2012-05-29 2013-12-09 삼성전자주식회사 Method and apparatus for executing voice command in electronic device
US8543834B1 (en) * 2012-09-10 2013-09-24 Google Inc. Voice authentication and command
US9070367B1 (en) * 2012-11-26 2015-06-30 Amazon Technologies, Inc. Local speech recognition of frequent utterances
US9117451B2 (en) * 2013-02-20 2015-08-25 Google Inc. Methods and systems for sharing of adapted voice profiles
KR102185564B1 (en) * 2014-07-09 2020-12-02 엘지전자 주식회사 Mobile terminal and control method for the mobile terminal

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6088669A (en) * 1997-01-28 2000-07-11 International Business Machines, Corporation Speech recognition with attempted speaker recognition for speaker model prefetching or alternative speech modeling
US6272463B1 (en) * 1998-03-03 2001-08-07 Lernout & Hauspie Speech Products N.V. Multi-resolution system and method for speaker verification
US6697779B1 (en) * 2000-09-29 2004-02-24 Apple Computer, Inc. Combined dual spectral and temporal alignment method for user authentication by voice
US20030088414A1 (en) * 2001-05-10 2003-05-08 Chao-Shih Huang Background learning of speaker voices
US20030033143A1 (en) * 2001-08-13 2003-02-13 Hagai Aronowitz Decreasing noise sensitivity in speech processing under adverse conditions
US8639516B2 (en) * 2010-06-04 2014-01-28 Apple Inc. User-specific noise suppression for voice quality improvements
US20140278435A1 (en) * 2013-03-12 2014-09-18 Nuance Communications, Inc. Methods and apparatus for detecting a voice command
US9361885B2 (en) * 2013-03-12 2016-06-07 Nuance Communications, Inc. Methods and apparatus for detecting a voice command
US20150081295A1 (en) * 2013-09-16 2015-03-19 Qualcomm Incorporated Method and apparatus for controlling access to applications
US20160066113A1 (en) * 2014-08-28 2016-03-03 Qualcomm Incorporated Selective enabling of a component by a microphone circuit

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190013039A1 (en) * 2016-03-10 2019-01-10 Brandon David Rumberg Analog voice activity detection
US10535365B2 (en) * 2016-03-10 2020-01-14 Brandon David Rumberg Analog voice activity detection
US10931999B1 (en) * 2016-06-27 2021-02-23 Amazon Technologies, Inc. Systems and methods for routing content to an associated output device
US20180033438A1 (en) * 2016-07-26 2018-02-01 Samsung Electronics Co., Ltd. Electronic device and method of operating the same
US10762904B2 (en) * 2016-07-26 2020-09-01 Samsung Electronics Co., Ltd. Electronic device and method of operating the same
US11404067B2 (en) * 2016-07-26 2022-08-02 Samsung Electronics Co., Ltd. Electronic device and method of operating the same
US20180061412A1 (en) * 2016-08-31 2018-03-01 Samsung Electronics Co., Ltd. Speech recognition method and apparatus based on speaker recognition
WO2018043991A1 (en) * 2016-08-31 2018-03-08 Samsung Electronics Co., Ltd. Speech recognition method and apparatus based on speaker recognition
US10762899B2 (en) * 2016-08-31 2020-09-01 Samsung Electronics Co., Ltd. Speech recognition method and apparatus based on speaker recognition
US11074910B2 (en) * 2017-01-09 2021-07-27 Samsung Electronics Co., Ltd. Electronic device for recognizing speech
US20180197540A1 (en) * 2017-01-09 2018-07-12 Samsung Electronics Co., Ltd. Electronic device for recognizing speech
US20190025878A1 (en) * 2017-07-19 2019-01-24 Samsung Electronics Co., Ltd. Electronic device and system for deciding duration of receiving voice input based on context information
US11048293B2 (en) * 2017-07-19 2021-06-29 Samsung Electronics Co., Ltd. Electronic device and system for deciding duration of receiving voice input based on context information
US20210097158A1 (en) * 2018-01-17 2021-04-01 Samsung Electronics Co., Ltd. Method and electronic device for authenticating user by using voice command
US11960582B2 (en) * 2018-01-17 2024-04-16 Samsung Electronics Co., Ltd. Method and electronic device for authenticating user by using voice command
JP2021529978A (en) * 2018-05-10 2021-11-04 エル ソルー カンパニー, リミテッドLlsollu Co., Ltd. Artificial intelligence service method and equipment for it
EP3779966A4 (en) * 2018-05-10 2021-11-17 Llsollu Co., Ltd. Artificial intelligence service method and device therefor
US11200904B2 (en) 2018-05-25 2021-12-14 Samsung Electronics Co., Ltd. Electronic apparatus, controlling method and computer readable medium
US11804228B2 (en) 2018-09-10 2023-10-31 Samsung Electronics Co., Ltd. Phoneme-based speaker model adaptation method and device
US11631400B2 (en) 2019-02-11 2023-04-18 Samsung Electronics Co., Ltd. Electronic apparatus and controlling method thereof
US20210022199A1 (en) * 2019-07-19 2021-01-21 Jvckenwood Corporation Radio apparatus, radio communication system, and radio communication method
US11770872B2 (en) * 2019-07-19 2023-09-26 Jvckenwood Corporation Radio apparatus, radio communication system, and radio communication method

Also Published As

Publication number Publication date
CN107210040A (en) 2017-09-26
KR20160098771A (en) 2016-08-19
CN107210040B (en) 2021-01-12
WO2016129930A1 (en) 2016-08-18
KR102371697B1 (en) 2022-03-08
US10733978B2 (en) 2020-08-04
EP3057093A2 (en) 2016-08-17
EP3057093A3 (en) 2016-12-07
US20190005944A1 (en) 2019-01-03
EP3057093B1 (en) 2020-08-19

Similar Documents

Publication Publication Date Title
US10733978B2 (en) Operating method for voice function and electronic device supporting the same
JP6857699B2 (en) Wake-up methods, equipment, equipment, storage media, and programs for voice dialogue equipment
US9953634B1 (en) Passive training for automatic speech recognition
EP2389672B1 (en) Method, apparatus and computer program product for providing compound models for speech recognition adaptation
US10224030B1 (en) Dynamic gazetteers for personalized entity recognition
CN105556920A (en) Method and apparatus for controlling access to applications
US20160372110A1 (en) Adapting voice input processing based on voice input characteristics
US10916249B2 (en) Method of processing a speech signal for speaker recognition and electronic apparatus implementing same
US11200903B2 (en) Systems and methods for speaker verification using summarized extracted features
CN110223687B (en) Instruction execution method and device, storage medium and electronic equipment
CN109272991A (en) Method, apparatus, equipment and the computer readable storage medium of interactive voice
CN107492153A (en) Attendance checking system, method, work attendance server and attendance record terminal
WO2020098523A1 (en) Voice recognition method and device and computing device
CN111599360B (en) Wake-up control method and device, storage medium and electronic equipment
CN111369992A (en) Instruction execution method and device, storage medium and electronic equipment
CN104281682A (en) File classifying system and method
US11893996B1 (en) Supplemental content output
CN116030817B (en) Voice wakeup method, equipment and storage medium
US11514920B2 (en) Method and system for determining speaker-user of voice-controllable device
CN117809625A (en) Terminal equipment and wake-up method for dual-mode verification
CN116229962A (en) Terminal equipment and voice awakening method
KR20220135398A (en) Speech procssing method and apparatus thereof
CN115691479A (en) Voice detection method and device, electronic equipment and storage medium
CN111477223A (en) Welding machine control method and device, terminal equipment and computer readable storage medium
CN116386627A (en) Display device and hotword recognition method

Legal Events

Date Code Title Description
AS Assignment

Owner name: SAMSUNG ELECTRONICS CO., LTD., KOREA, DEMOCRATIC P

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SUBHOJIT, CHAKLADAR;REEL/FRAME:037686/0727

Effective date: 20160114

AS Assignment

Owner name: SAMSUNG ELECTRONICS CO., LTD., KOREA, REPUBLIC OF

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE ASSIGNEE'S COUNTRY PREVIOUSLY RECORDED AT REEL: 037686 FRAME: 0727. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT;ASSIGNOR:SUBHOJIT, CHAKLADAR;REEL/FRAME:045519/0929

Effective date: 20160114

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION