WO2012144887A1 - Voice immersion smartphone application or headset for reduction of mobile annoyance - Google Patents

Voice immersion smartphone application or headset for reduction of mobile annoyance Download PDF

Info

Publication number
WO2012144887A1
WO2012144887A1 PCT/NL2012/000026 NL2012000026W WO2012144887A1 WO 2012144887 A1 WO2012144887 A1 WO 2012144887A1 NL 2012000026 W NL2012000026 W NL 2012000026W WO 2012144887 A1 WO2012144887 A1 WO 2012144887A1
Authority
WO
WIPO (PCT)
Prior art keywords
intensity
sound
foreground
signal
user
Prior art date
Application number
PCT/NL2012/000026
Other languages
French (fr)
Inventor
Hein FRANKEN
Original Assignee
Franken Hein
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Franken Hein filed Critical Franken Hein
Publication of WO2012144887A1 publication Critical patent/WO2012144887A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/58Anti-side-tone circuits
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/60Substation equipment, e.g. for use by subscribers including speech amplifiers
    • H04M1/6025Substation equipment, e.g. for use by subscribers including speech amplifiers implemented as integrated speech networks

Definitions

  • the invention is in the field for processing sound.
  • the invention relates to a method for providing a dynamic feedback signal closely resembling the speaking behaviour of the user of e.g. a smartphone or a headset that is connected thereto in social sensitive environments, as well as to a device.
  • US 2009017670 (Al) Apple Inc. discloses systems and methods for altering a cellular phone user's speech so that the speech can be less bothersome to third parties in the surrounding area and so that the user has more privacy. Sound cancellation can be used to cancel, reduce, or modify the user's voice so third parties cannot hear the voice as easily or so that the user's voice cannot be understood.
  • the user device can encourage the user to speak in a lower voice. The user device can accomplish this encouragement by indicating to the user their level of speech. In this manner, the user knows when he may lo ( was his voice and yet still provide an adequate volume of speech for the cellular phone. Additionally, the user device can encourage the user to speak in a lower voice by audibly playing back the user's voice in real time.
  • US 2004242160 (Al) Nokia discloses a mobile phone with a means of measuring a background sound level of an environment. When a user either initiates or receives a phone call, sound levels before and during the call are compared. Once it is decided that the voice is too loud based on a predetermined criteria, the phone gives the user a feedback indicating that the voice is too loud and potentially a disruption to other people. Furthermore, the present invention provides a feedback for a voice adjustment by utilizing a sidetone adaptive signal that filters the user own speech directly to an earpiece.
  • US2007/0021958 Al Erik Visser recites a method for improving the quality of a speech signal extracted from a noisy acoustic environment using multiple channels. When speech is detected a control signal is generated for post processing. What this invention misses is the proportional feedback to the user when too loud speaking in crowded places with minimal latency.
  • US2007/017810 A2 Philips recites a method for a headset for a communication device, the headset comprising at least one microphone adapted to detect audio signals, and a speaker adapted to reproduce the audio signals... for transmission to another communication device.
  • What this invention misses is the proportional feedback to the user when too loud speaking in crowded places with minimal latency.
  • the communication device further includes a sidetone feedback notifier for producing a notification signal, which may be an audible, a visual or tactile signal.
  • a notification signal which may be an audible, a visual or tactile signal.
  • Prior art solutions are further typically optimised in one aspect, for instance in optimizing block diagrams or flow charts illustrating the performance.
  • the present invention is aimed at overcoming one or more of the above mentioned problems without jeopardizing advantage effects.
  • the present invention relates in a first aspect to a method for providing a real time feedback signal to a user providing foreground sound input according to claim 1.
  • the method comprises the steps of:
  • At least two audio input means such as a microphone
  • the at least two input means being spaced apart at a mutual distance, such that a distance from a first audio input means to a source of foreground sound is substantially shorter from a distance from a second audio input means to the same source of foreground sound
  • obtaining a total sound signal intensity comprising one or more of foreground sound input intensity and background sound input intensity
  • obtaining a running average of the foreground sound input intensity and of the background sound input intensity such as a running average over 1 msec
  • the present invention solves one or more of the above mentioned problems and provides excellent performance, as is highlighted and detailed below.
  • real time it is meant that within processing time, typically within 0,1-10 msec, or in other words without a substantial delay, the method is carried out. In an example the processing time is less than 2 msec.
  • the feedback signal is provided to a user.
  • the user may be a person using a mobile telephone, a smartphone, etc.
  • the feedback signal provides a user for instance with information on relative sound intensity of his/her voice, relative to a background.
  • the use may subsequently adapt his behaviour, e.g. by lowering his voice.
  • At least two audio input means are provided, such as a microphone, being adapted to determine sound intensity.
  • decibel decibel
  • a ratio in decibels is ten times the logarithm to base 10 of the ratio of two power quantities.
  • a decibel is one tenth of a bel, a seldom-used unit.
  • the decibel is used for a wide variety of measurements in science and engineering, most prominently in acoustics, electronics, and control theory.
  • three or four microphones are provided, or even more. Therewith foreground sound and background sound can be determined more accurately.
  • the audio input means are spaced apart, such that a distance from a first audio input means to a source of foreground sound such as the users voice is substantially shorter from a distance from a second audio input means to that source.
  • the input means are spaced apart at a distance of at least a few cm, more preferably at least 5 cm, even more preferably at least 10 cm, in order to obtain superior results.
  • a foreground sound intensity received by a first audio input means is substantially different from a foreground sound intensity received by a second audio input means.
  • the difference in intensity is in an example at least 1 dB (A) , preferably at least 2 dB (A) , more preferably at least 3 dB (A) , such as at least 10 dB (A) .
  • the speaker relates to an ear speaker, such as one being present in a mobile phone, smartphone, or headset.
  • ear speaker such as one being present in a mobile phone, smartphone, or headset.
  • two or more speakers may be present, such as at least one speaker per ear.
  • the output means may also relate to an optical means, providing an optical signal, such as text, light, etc.
  • the output means may also provide a signal to any other sense, such as taste, smell or touch. As such one or more of a variety of feedback signals may be provided.
  • the present processor is typically a microprocessor or the like, capable of processing analogue and/or digital data.
  • the processor forms part of a device, the device further comprising other elements mentioned above in the present method.
  • the at least one audio input means will in use receive a sound signal.
  • This sound signal is referred to as "total sound signal”, which signal typically has an intensity.
  • total sound signal typically has an intensity.
  • the intensity is typically expressed in dB (A) , being a logarithmic scale.
  • the sound signal typically comprises various elements, being categorised in foreground elements and background elements.
  • a user of a headset will provide a foreground sound signal when speaking into the at least one audio input means, i.e. microphones.
  • a background sound signal will originate from other people speaking to each other, from information being made public, e.g. by a speaker, from traffic, such as cars and trains, from the wind, etc.
  • a source of background sound signal may be very close to a source of foreground sound signal, such as when a user is speaking into a headset, and a neighbouring person is talking.
  • a foreground sound signal need not be present, such as when a user is quiet.
  • a background sound signal need not be present, or at least be below a certain threshold, such as in a very quiet environment, e.g. in nature.
  • the total sound signal intensity can be split (or divided) into a background sound signal intensity and a foreground sound signal intensity.
  • the foreground sound signal intensity is determined by subtracting the background sound signal intensity from the total sound signal intensity.
  • a running average of the foreground sound input intensity and/or of the background sound input intensity such as a running average over 1 msec - 2 sec is obtained.
  • peaks in intensity are smoothened out.
  • a median value of the intensity is obtained, in order to correct for peaks possibly being present. It has been found experimentally that a running average over a time frame of 5-250 msec is typically sufficient. In an example an average of 100 msec is used.
  • the intensities can be compared.
  • the result of this comparison provides relative intensities, e.g. in terms of dB (A) .
  • An aim of the invention is, when switched on in a crowded place, to provide a feedback signal to a user, especially when the foreground sound signal intensity is relatively large compared to the background sound signal intensity.
  • a predetermined threshold is provided, which serves as a guide to determine if the foreground sound signal intensity is relatively large enough to provide a feedback signal. If the foreground sound signal intensity is larger than the sum of the background sound signal intensity and the predetermined threshold, a proportional feedback signal is provided.
  • the feedback signal is in an example chosen to be an audio sidetone signal.
  • an audio signal is provided by adding a foreground sound signal to the output means, at a certain intensity level.
  • the intensity level may depend on various parameters, such as relative difference between foreground and background intensities. At a large difference a stronger, more intense, signal may be provided, and vice versa.
  • the user hears his own voice, real time, i.e. less than a few msec later, at an intensity level as determined a short period earlier, e.g. 100 msec earlier.
  • the intensity of the feedback signal may decrease as a user lowers his voice, and vice versa.
  • the intensity is proportional to the foreground sound level.
  • the intensity is preferably not too low, as it will than possibly not be noticed. Also the intensity is preferably not too high, as it will become annoying and the like. A certain level is loud enough. In an example the upper intensity of the feedback signal is limited to a level far below a level where ear damage could occur.
  • the method may be provided as an application, such as a downloadable application.
  • the application may be provided with a switch, for activating or de-activating the application when the user is in a crowded place.
  • the application may be provided with means for calibration, in order to adapt the application for a specific device being used, such as type of headset, type of phone, brand of phone, brand of headset, varying circumstances, such as voice input intensity, background sound intensity, etc.
  • the present method further provides the first audio input means, the second audio input means and the means for providing the output signal as part of one device, such as a smartphone, telephone, mobile telephone, headset, computer, providing at least the first audio input means with a directional sensitivity, such as with a polar pattern, directing the first audio input means towards a mouth of the user, and aligning the directional sensitivity with a virtual axis that, when seen in top view parallel to a cranial axis of the user, is under an angle of 0-60 degrees with a forward-backward axis of the user.
  • one device such as a smartphone, telephone, mobile telephone, headset, computer
  • the first audio input means is adapted to pick an audio signal selectively from a spatially limited region and/or direction, such as a direction wherein a sound source, such as a mouth of the user, is located.
  • a sound source such as a mouth of the user
  • one of the input means is directed at a source of foreground sound input, such as a mouth of a user.
  • the input means is directed as indicated above.
  • a further advantage hereof is that a second input means, not having the directional sensitivity and/or input, receives a significant different input intensity. As a consequence hereof the reliability, accuracy, etc. of the device is improved significantly.
  • such a first audio input means is located as mentioned above, in order to receive a sound signal in an optimal way. Thereby also background input is reduced, which improves a quality of a signal transferred to a receiver of spoken information of the user.
  • the intensity of the background sound input is obtained when the running average foreground sound input intensity is less than the running average background sound intensity plus 3dB (A) , corrected for the characteristics of the microphones.
  • the intensity of the background sound input is frozen to the last but one logged value when the running average foreground sound input intensity is higher than the running average background sound intensity plus 3dB(A), corrected for the characteristics of the microphones, e.g. during the speaking period of the user.
  • it further comprises the step of filtering the foreground sound input and/or the background sound input, such as by filtering out low frequency and high frequency, thereby obtaining a frequency window.
  • the filtering improves the quality and accuracy of the method significantly. In an example it reduces noise and unwanted frequencies by some 10 dB (A) .
  • a low frequency threshold for the frequency window is 200 Hz, preferably 100 Hz, and wherein a high frequency threshold for the frequency window is 1.000 Hz, preferably 2.000 Hz, wherein preferably the frequency window is a weighted window correcting the sound intensity for the sensitivity of the human ear by an A-weighting curve.
  • a low frequency threshold can be set at 200 Hz, preferably at 100 Hz. It has been found experimentally that a high frequency threshold can be set at 1.000 Hz, preferably at 2.000 Hz. Thereby optimal results are obtained, e.g. in terms of noise reduction, accuracy, reliability, reduction of possibly annoying sounds, etc.
  • the intensity of the foreground sound input is amplified or reduced in view of the intensity of the background .sound input.
  • a receiver of the foreground sound signal such as a listener on an other end of a line to the voice of the user, can better pick up the signal.
  • a feedback signal such as a sidetone
  • a feedback signal is added in the means for output proportional in intensity of a difference between the intensity of the optionally averaged foreground sound input and the intensity of the optionally averaged background sound input plus the threshold, preferably with a negligible latency, such as less than 20 msec, preferably less than 5 msec, such as less than 2 msec, such as about 0,1 msec.
  • the latency is typically determined by electronic limitations, and is in an example virtually absent.
  • a first input means obtains a first sound intensity
  • a second input means obtains a second sound intensity
  • an optional difference in sound intensity is used to determine the foreground sound input intensity
  • the background sound input intensity is determined when the first and second sound intensities differ less than 3 dB (A) , preferably are substantially the same.
  • the first and second sound intensities are compared. If these are substantially the same, it is assumed no foreground sound intensity is present, as otherwise one of the input means, e.g. microphones, would detect a larger intensity than another. Thus the intensity obtained is regarded as the background sound input intensity.- Typically a difference between input intensities will be negligible small, as a source of background sound input will be relatively far away, and the at least one input means, e.g. microphones, will detect a similar input intensity, if not the same input intensity. If a difference is determined, the difference is attributed to a foreground sound input intensity, e.g. a user speaking into an input means. A previously, i.e. a few msec earlier, frozen determined background sound input level can now be used to determine the difference between the current
  • the feedback signal intensity comprises a delayed foreground sound input intensity, wherein the delay is preferably smaller than 500 msec, more preferably smaller than 250 msec, even more preferably smaller than 100 msec, such as smaller than 50 msec .
  • the intensities need to be determined, which relates to a process which inevitably involves some time, in the order of msecs . Therefore a result of such a process is always somewhat later available, at a delay in time. Providing the intensity of the feedback signal therefore is delayed, whereas the content of the signal is provided without (substantial) delay, i.e. real time.
  • the part of the foreground sound input intensity being added is proportional to the foreground sound input intensity, such as from l%-300% thereof, preferably from 5% - 200% thereof, more preferably from 30% - 125% thereof, such as from 50%- 100% thereof. It has been found experimentally that the intensity is preferably not too small and not too large as indicated above. Good results were obtained at the above levels.
  • the feedback signal intensity may be a linear, exponential, logarithmical, step-function, etc. of the foreground sound input intensity. Even further the intensity may vary, such as increase or decrease, likewise in time, i.e. become larger or smaller. Even further, the function, variation, time interval, etc. may be adjustable by a user. In a further example the feedback signal may be switched off or on, as desired by the user.
  • the maximum output is limited, e.g. in terms of maximum pressure of an output means, such as a speaker, and/or in view of possible ear damage. In other words, an upper limit is provided.
  • the predetermined threshold value is at least 3 dB (A) , preferably at least 5 dB (A) , more preferably at least 10 dB (A) , such as at least 20 dB (A) .
  • the threshold is preferably not set too low, as otherwise feedback is provided when a user speaks soft, though somewhat louder than a background sound intensity. It is noted that a level of 3 dB (A) reflects a factor two louder noise. Guided by the sidetone keeping his voice intensity below twice the volume of the background noise, the users voice will immerse in the background and can hardly be heard by people in the vicinity. In crowded places this will increase privacy and efficiency of the call and prevent irritations. At that point no adaptation of voice level seems needed.
  • no feedback signal is added to the output signal when the absolute value of the optional running average foreground sound signal intensity is below the sum of the threshold value and the absolute value of the optional running average of the frozen background sound signal intensity.
  • the present method is aimed at providing a preset default static minimal background sound signal intensity, and preventing a feedback signal if the background sound signal intensity is below the preset default static minimal background sound signal intensity.
  • the dynamic foreground sound signal that is proportionally added to the output signal for the ear speaker can not be heard by the caller on the other end of the line.
  • the present method when switched on the amplification of the noise cancelled foreground sound microphone signal towards the person on the other end of the line is automatically increased with pre-programmed steps when the running average continuous background signal stays in categorized volume ranges below the normal preset volume range of standard mobile communication.
  • a possible embodiment of the feedback circuitry is an analogue mix circuit. Steered by the digital delayed Delta (x) value, the circuit injects the analogue foreground sound signal from microphone 31 with enhancement factor Delta (x) to the Earplug signal.
  • analogue noise cancelling can be implemented by subtracting the analogue background signal from microphone 32 with factor Delta (x) from the analogue voice signal corrected for the characteristics of microphones 31 and 32.
  • the present method software is implemented in a headset positioned at the users ear, and where microphones and the ear speaker from part of the same headset, wherein the headset is connected via Bluetooth to the smartphone of which its microphones and ear speaker are switched off, where the distance between the voice microphone of the Headset and the ear speaker is at least 10 centimetre.
  • the present invention relates to a device according to claim 14.
  • the device comprises
  • At least two audio input means such as a microphone, the at least two input means being spaced apart at a mutual distance, such that a distance from a first audio input means to a source of foreground sound is substantially shorter from a distance from a second audio input means to the same source of foreground sound
  • At least one means for providing an output signal such as a speaker
  • the processor being adapted for processing sound input, providing an output signal and providing a feedback signal.
  • the device is selected from the group of smartphone, telephone, mobile telephone, headset, computer and combinations thereof .
  • Figure 1 shows a smartphone containing the Application according to the invention
  • Figure 2A shows a side view of the head of a user of a headset according to the invention
  • Figure 2B shows the opposite side of the headset according to figure 2A, as seen from the head of the user;
  • Figure 3 is a schematic top view of the user of the headset according to figures 2A and 2B.
  • Figure 4 shows the human hearing sensitivity with horizontal the frequency and vertical the intensity in dB.
  • Figure 5 shows the A, B and C-weighting curves.
  • Figure 1 shows a smartphone 30 on which the invention 40 is downloaded as Application.
  • the invention is implemented in the headset 10 as will be explained later.
  • the smartphone 30 typically comprises a housing 34 that carries an interactive display 35, a foreground microphone 31 to pick up the users voice and a background noise microphone 32 for measuring the background sound signal, an ear speaker 33 to reproduce the sound of the voice of the other end of the line into the users ear including the feedback sidetone.
  • the smartphone 30 is controlled by an inner electronic circuit comprising a microprocessor that can process in real-time digital signals, a rechargeable power supply and a wireless Bluetooth transceiver or optionally a connector for a cable that are connected to the microprocessor, and analogue- digital converters that are connected to the microprocessor to convert a digital output signal from the microprocessor into an analogue electronic signal to power the ear speaker 33 and to provide the microprocessor with a digital foreground sound signal from the foreground microphone 31 and ambient/ background sound microphone 32.
  • the foreground sound signal is proportional to an average voice sound pressure at the voice sound microphone 31 for the moments the user is speaking.
  • the background sound signal is proportional to an average background sound pressure at the background sound microphone 32.
  • a part of the background signal from microphone 32 is distracted from the foreground sound microphone 31, so called noise cancelling.
  • the above-mentioned corrections are adjusted for the respective microphone characteristics.
  • the user presses the embodiment "App" 40 on the display.
  • the embodiment can be enriched by adding the input of user settings: e.g. to adjust the threshold value of the feedback as well as feedback volume characteristics e.g. linear, logarithmic, exponential, constant, height of step-in volume etc.
  • feedback volume characteristics e.g. linear, logarithmic, exponential, constant, height of step-in volume etc.
  • the invention takes the following steps to keep the feedback sidetone signal continuously closely resembling reality:
  • Microphone 31 continuously measures the foreground sound intensity.
  • Microphone 32 continuously measures the background sound intensity.
  • the invention decides that the user is speaking.
  • Delta (x) is the maximum of the values zero and the difference of the absolute value of Foreground Signal Intensity (x) and the absolute value of Background Signal Intensity (x) .
  • Delta (x) MAX (0, ABS(FI(x))- ABS(BI(x))).
  • Delta (x) (Delta (x-1) + c * Delta (x) )/( 1+c) .
  • the value of c will be larger that the value of the progressing averages of the background- and foreground sound signals but must be optimised in the final design.
  • a personalised part of the foreground sound signal is added to the output signal for the ear speaker 12, whereby the user starts hearing his own voice in the ear speaker 12, mixed up with the sound of the caller on the other end of the line. It is crucial that any latency of the users own voice in his ear is kept minimal to prevent irritations. Typical maximum latency value is 2 msec.
  • This feedback functionality motivates the user to stop speaking too loud with respect to people in his vicinity, in other words to inhibit his Lombard reflex.
  • Delta (x-1) is set to zero after each ending of the users speaking slot. This means when the user remembers the inventions feedback during his last too loud sentence and starts speaking much softer next slot, he is immediately rewarded with the absence of the feedback in his ear. In case he continuous his too loud conversation, within seconds the progressing average provides the proportional feedback in his ear.
  • the invention contains a preset default threshold of background sound signal, the user remains able to continue speaking softly but normally in case the background sound signal falls to silence, preventing the user to get uncontrollable feedback or is being forced to whisper.
  • a possible embodiment of the feedback circuitry is an analogue mix circuit.
  • the circuit Steered by the digital slightly delayed Delta (x) value, the circuit injects the analogue foreground voice signal from microphone 31 with enhancement factor Delta (x) to the Earplug signal.
  • analogue noise cancelling can be reached by subtracting the analogue background signal from microphone, 32 with factor Delta (x) from the analogue voice signal.
  • the headset accessory 10 that is connected with the smartphone 30 via Bluetooth.
  • the smartphone 30 or optionally a normal cellular phone works with the headset with the functionality according to invention running on in the headset microprocessor.
  • the ear speaker 43 and the microphones 31 and 32 of the smartphone 30 are not in use during a phone call.
  • the microprocessor of the headset 10 is loaded with software to control the headset 10 according to the invention during a phone call.
  • the push button 8 on the headset 10 the user can switch on and off the "Voice Immersion" accurate behaviour feedback functionality according to the invention.
  • FIG 2A shows the head 1 of a user of a headset 10 according to the invention.
  • the headset 10 is shaped to fit around the ear 2.
  • the user carries the headset 10 around only one of his ears 2.
  • the headset 10 comprises an elongated, curved housing 11 and an ear speaker 12 that is connected to the housing 11 via a first rod 9.
  • the ear speaker 12 is partly inserted in the ear canal 3 of the, user.
  • the ear speaker 12 has a shotgun sound output directionality or polar pattern 13 that is aligned with the ear canal axis A. From the ear speaker 12 a curved second rod 14 extends along the cheek towards the mouth 4 of the user.
  • the carrying rod 14 carries a voice sound microphone 17 at its free end and a background sound microphone 15 situated between the voice sound microphone 17 and the ear speaker 12.
  • the voice sound microphone 17 has a shotgun sensitivity directionality or polar pattern 18 that directed towards the mouth 4 of the user to optimally pick up his voice.
  • the polar pattern 18 is aligned with an axis D that, when seen in top view parallel to the cranial axis of the user, is under an angle E of 0- 60 degrees with the forward-backward axis B of the user. In this top view the forward-backward axis is perpendicular to the ear canal axis A.
  • the distance C between the centre of the ear speaker 12 and the centre of the voice sound microphone 17 is typically larger than 10 centimetre to position the voice sound microphone 17 sufficiently close to the mouth 4 of the user to optimally pick up his voice.
  • the background sound microphone 15 has an omni-directional sensitivity directionality or polar pattern 16 that faces away from the user to pick up the background noise.
  • the headset 10 is further provided with a push button 8 that can be reached behind the ear 2.
  • the housing 11 comprises an inner space 20 wherein an electronic circuit has been enclosed.
  • the electronic circuit comprises a microprocessor, a rechargeable power supply and a wireless Bluetooth transceiver that are connected to the microprocessor, an electronic connection between the push button 8 and the microprocessor, and analogue-digital converters that are connected to the microprocessor to convert a digital output signal from the microprocessor into an analogue electronic signal to power the ear speaker 12 with the voice sound signal of the caller on the other end of the line and to provide the microprocessor with a digital voice sound signal from the voice sound microphone 17 and a digital background sound signal from the background microphone 15.
  • the voice sound signal is proportional to an average voice sound pressure at the voice sound microphone 17 when speaking.
  • the background sound signal is proportional to an average background sound pressure at the background sound microphone 15.
  • the headsets microprocessor is provided with a digital sound signal from the foreground sound microphone 17 and a digital background sound signal from the microphone 15. Continuously repeated the microprocessor samples the foreground- and the background sound signals to detect whether the user is speaking.
  • the absolute value of the running average background intensity is logged. As soon as is detected that the user is speaking, the last but one value of the background sound intensity is frozen for that speaking slot.
  • the factor Delta (x-1) being the difference between the absolute value of the running average Foreground Signal Intensity FI (x) and the frozen Background Signal Intensity BI(x-l), is set to zero.
  • the microprocessor needs time to acquire reliable background and foreground signal levels.
  • To calculate the first accurate Delta typically 1-10 seconds sample time is required in which the value stays zero.
  • Delta (x) is the maximum of the values zero and the difference of the absolute value of Foreground Signal Intensity (x) and the absolute value of Background Signal Intensity (x) .
  • Delta (x) MAX (0, ABS(FI(x))- ABS(BI(x))).
  • Delta (x) (Delta (x-1) + c * Delta (x) )/ (1+c) .
  • the value of c will be larger that the value of the progressing averages of the background- and foreground sound signals but must be optimised in the final design.
  • a personalised part of the foreground sound signal is added to the output signal for the ear speaker 12, whereby the user starts hearing his own voice in the ear speaker 12, mixed up with the sound of the caller on the other end of the line. It is crucial that any latency of the users own voice in his ear is kept minimal to prevent irritations. Typical maximum latency value is 2 msec.
  • This feedback functionality motivates the user to stop speaking too loud with respect to people in his vicinity.
  • Delta (x-1) is set to zero after each ending of the users speaking slot. This means when the user remembers the inventions feedback during his last too loud sentence and starts speaking much softer next slot, he is immediately rewarded with the absence of the feedback in his ear. In case he continuous his too loud conversation, within seconds the progressing average provides the proportional feedback in his ear.
  • the invention contains a preset default threshold of background sound signal, the user remains able to continue speaking softly but normally in case the background sound signal falls to silence, preventing the user to fall into whispering or uncontrollable feedback.
  • Figure 4 shows the human hearing sensitivity
  • Figure 5 shows a graph of an A-weighting curve, with horizontal the frequency and vertical the intensity in dB.

Abstract

Today' s billions of mobile phones causes a lot of people talking too loud at congested places. The French scientist Etienne Lombard discovered in 1909 that people have an involuntary reflex to increase the intensity of their voice when speaking in noisy environments. This so called Lombard reflex is studied to be too strong to be inhibited by providing instructions. Only feedback has shown results. The invention relates to a method for providing a dynamic feedback signal closely resembling the speaking behaviour of the user of a smartphone or a headset that is connected thereto in social sensitive environments, as well as to a device.

Description

Voice Immersion Smartphone Application or Headset for reduction of mobile annoyance
FIELD OF THE INVENTION
The invention is in the field for processing sound.
BACKGROUND OF THE INVENTION
Today' s billions of mobile phones cause a lot of people talking too loud at congested places. The French scientist Etienne Lombard discovered in 1909 that people have an involuntary reflex to increase the intensity of their voice when speaking in noisy environments. This so called Lombard reflex is studied to be too strong to be inhibited by providing instructions. Only feedback has shown results.
The invention relates to a method for providing a dynamic feedback signal closely resembling the speaking behaviour of the user of e.g. a smartphone or a headset that is connected thereto in social sensitive environments, as well as to a device.
US 2009017670 (Al) Apple Inc. discloses systems and methods for altering a cellular phone user's speech so that the speech can be less bothersome to third parties in the surrounding area and so that the user has more privacy. Sound cancellation can be used to cancel, reduce, or modify the user's voice so third parties cannot hear the voice as easily or so that the user's voice cannot be understood. Furthermore, the user device can encourage the user to speak in a lower voice. The user device can accomplish this encouragement by indicating to the user their level of speech. In this manner, the user knows when he may lo(wer his voice and yet still provide an adequate volume of speech for the cellular phone. Additionally, the user device can encourage the user to speak in a lower voice by audibly playing back the user's voice in real time.
US 2004242160 (Al) Nokia discloses a mobile phone with a means of measuring a background sound level of an environment. When a user either initiates or receives a phone call, sound levels before and during the call are compared. Once it is decided that the voice is too loud based on a predetermined criteria, the phone gives the user a feedback indicating that the voice is too loud and potentially a disruption to other people. Furthermore, the present invention provides a feedback for a voice adjustment by utilizing a sidetone adaptive signal that filters the user own speech directly to an earpiece.
Various problems are associated with the two prior art solutions above. For instance, it is difficult to distinguish whether the user is speaking if a second microphone is positioned at the same distance to the mouth as the first and sensitivities of microphones are not specified.
It is not always possible to provide an accurate and reliable feedback signal using only one microphone because of the mixing of foreground and background sound input. Often such a sidetone signal may be provided when not needed, and vice versa be absent when needed.
It is not possible to be understood by a listener when speaking at low voice intensity.
As measurements are not very accurate, it is difficult to adapt the above solutions to boundary conditions, such as an environment.
Background sound levels are not measured accurately. Therefore the application does not function properly.
Also foreground sound levels are not measured accurately. Therefore the application does not function properly. Further, US2007/0021958 Al Erik Visser recites a method for improving the quality of a speech signal extracted from a noisy acoustic environment using multiple channels. When speech is detected a control signal is generated for post processing. What this invention misses is the proportional feedback to the user when too loud speaking in crowded places with minimal latency.
Further, US2007/017810 A2 Philips recites a method for a headset for a communication device, the headset comprising at least one microphone adapted to detect audio signals, and a speaker adapted to reproduce the audio signals... for transmission to another communication device. What this invention misses is the proportional feedback to the user when too loud speaking in crowded places with minimal latency.
Further, WO2010/009345 Al Qualcomm recites a method for a communication device including multiple microphones. The communication device further includes a sidetone feedback notifier for producing a notification signal, which may be an audible, a visual or tactile signal. What this invention misses is the proportional feedback to the user when too loud speaking in crowded places with minimal latency.
The prior art solutions above typically correct for variations in boundary conditions, such as variations in background noise levels. If corrections are provided, these are delayed, and therefore annoying to a user.
Prior art solutions are further typically optimised in one aspect, for instance in optimizing block diagrams or flow charts illustrating the performance.
The present invention is aimed at overcoming one or more of the above mentioned problems without jeopardizing advantage effects. SUMMARY OF THE INVENTION
The present invention relates in a first aspect to a method for providing a real time feedback signal to a user providing foreground sound input according to claim 1. The method comprises the steps of:
providing at least two audio input means, such as a microphone, the at least two input means being spaced apart at a mutual distance, such that a distance from a first audio input means to a source of foreground sound is substantially shorter from a distance from a second audio input means to the same source of foreground sound,
providing at least one means for providing an output signal, such as a speaker, and at least one processor,
obtaining a total sound signal intensity comprising one or more of foreground sound input intensity and background sound input intensity,
obtaining the foreground sound input intensity and the background sound input intensity,
obtaining a running average of the foreground sound input intensity and of the background sound input intensity, such as a running average over 1 msec
''millisecond' - 2 sec,
comparing the intensity of the averaged foreground sound input and the intensity of the averaged background sound input, and
adding part of the foreground sound input signal as a feedback signal to the output signal in case the foreground sound input intensity is larger than an intensity comprising a predetermined threshold value and the intensity of the averaged frozen background sound input .
The present invention solves one or more of the above mentioned problems and provides excellent performance, as is highlighted and detailed below.
With the term "real time" it is meant that within processing time, typically within 0,1-10 msec, or in other words without a substantial delay, the method is carried out. In an example the processing time is less than 2 msec.
The feedback signal is provided to a user. The user may be a person using a mobile telephone, a smartphone, etc. The feedback signal provides a user for instance with information on relative sound intensity of his/her voice, relative to a background. The use may subsequently adapt his behaviour, e.g. by lowering his voice.
At least two audio input means are provided, such as a microphone, being adapted to determine sound intensity. To quantify a sound intensity decibel (dB) is used, being a logarithmic unit that indicates the ratio of a sound pressure level quantity (usually power or intensity in Watt/m2) relative to a specified or implied reference level. A ratio in decibels is ten times the logarithm to base 10 of the ratio of two power quantities. A decibel is one tenth of a bel, a seldom-used unit. The decibel is used for a wide variety of measurements in science and engineering, most prominently in acoustics, electronics, and control theory. In order to improve quality of the present method and device in terms of accuracy, reliability, etc., preferably three or four microphones are provided, or even more. Therewith foreground sound and background sound can be determined more accurately.
The audio input means are spaced apart, such that a distance from a first audio input means to a source of foreground sound such as the users voice is substantially shorter from a distance from a second audio input means to that source. In an example the input means are spaced apart at a distance of at least a few cm, more preferably at least 5 cm, even more preferably at least 10 cm, in order to obtain superior results. As a consequence a foreground sound intensity received by a first audio input means is substantially different from a foreground sound intensity received by a second audio input means. The difference in intensity is in an example at least 1 dB (A) , preferably at least 2 dB (A) , more preferably at least 3 dB (A) , such as at least 10 dB (A) .
In an example the speaker relates to an ear speaker, such as one being present in a mobile phone, smartphone, or headset. Clearly two or more speakers may be present, such as at least one speaker per ear.
It is noted that the output means may also relate to an optical means, providing an optical signal, such as text, light, etc. The output means may also provide a signal to any other sense, such as taste, smell or touch. As such one or more of a variety of feedback signals may be provided.
The present processor is typically a microprocessor or the like, capable of processing analogue and/or digital data. In an example the processor forms part of a device, the device further comprising other elements mentioned above in the present method.
The at least one audio input means will in use receive a sound signal. This sound signal is referred to as "total sound signal", which signal typically has an intensity. For an audio signal the intensity is typically expressed in dB (A) , being a logarithmic scale.
The sound signal typically comprises various elements, being categorised in foreground elements and background elements. In an example a user of a headset will provide a foreground sound signal when speaking into the at least one audio input means, i.e. microphones. In an example a background sound signal will originate from other people speaking to each other, from information being made public, e.g. by a speaker, from traffic, such as cars and trains, from the wind, etc. Sometimes a source of background sound signal may be very close to a source of foreground sound signal, such as when a user is speaking into a headset, and a neighbouring person is talking. It is noted that a foreground sound signal need not be present, such as when a user is quiet. Likewise, a background sound signal need not be present, or at least be below a certain threshold, such as in a very quiet environment, e.g. in nature.
By determining the background sound signal intensity, specifically when the user is quiet, the total sound signal intensity can be split (or divided) into a background sound signal intensity and a foreground sound signal intensity. In an example the foreground sound signal intensity is determined by subtracting the background sound signal intensity from the total sound signal intensity.
In an example a running average of the foreground sound input intensity and/or of the background sound input intensity, such as a running average over 1 msec - 2 sec is obtained. Thereby peaks in intensity are smoothened out. Preferably a median value of the intensity is obtained, in order to correct for peaks possibly being present. It has been found experimentally that a running average over a time frame of 5-250 msec is typically sufficient. In an example an average of 100 msec is used.
Once the foreground sound signal intensity and background sound signal intensity have been obtained, optionally averaged over time, the intensities can be compared. The result of this comparison provides relative intensities, e.g. in terms of dB (A) . An aim of the invention is, when switched on in a crowded place, to provide a feedback signal to a user, especially when the foreground sound signal intensity is relatively large compared to the background sound signal intensity. Thereto a predetermined threshold is provided, which serves as a guide to determine if the foreground sound signal intensity is relatively large enough to provide a feedback signal. If the foreground sound signal intensity is larger than the sum of the background sound signal intensity and the predetermined threshold, a proportional feedback signal is provided. The feedback signal is in an example chosen to be an audio sidetone signal. In a further example an audio signal is provided by adding a foreground sound signal to the output means, at a certain intensity level. The intensity level may depend on various parameters, such as relative difference between foreground and background intensities. At a large difference a stronger, more intense, signal may be provided, and vice versa. In an example the user hears his own voice, real time, i.e. less than a few msec later, at an intensity level as determined a short period earlier, e.g. 100 msec earlier. Adaptively the intensity of the feedback signal may decrease as a user lowers his voice, and vice versa. In an example the intensity is proportional to the foreground sound level. The intensity is preferably not too low, as it will than possibly not be noticed. Also the intensity is preferably not too high, as it will become annoying and the like. A certain level is loud enough. In an example the upper intensity of the feedback signal is limited to a level far below a level where ear damage could occur.
The method may be provided as an application, such as a downloadable application. The application may be provided with a switch, for activating or de-activating the application when the user is in a crowded place.
The application may be provided with means for calibration, in order to adapt the application for a specific device being used, such as type of headset, type of phone, brand of phone, brand of headset, varying circumstances, such as voice input intensity, background sound intensity, etc.
In an example of the present method it further provides the first audio input means, the second audio input means and the means for providing the output signal as part of one device, such as a smartphone, telephone, mobile telephone, headset, computer, providing at least the first audio input means with a directional sensitivity, such as with a polar pattern, directing the first audio input means towards a mouth of the user, and aligning the directional sensitivity with a virtual axis that, when seen in top view parallel to a cranial axis of the user, is under an angle of 0-60 degrees with a forward-backward axis of the user.
For a user one device offers improved usability. In an example the first audio input means is adapted to pick an audio signal selectively from a spatially limited region and/or direction, such as a direction wherein a sound source, such as a mouth of the user, is located. In an example one of the input means is directed at a source of foreground sound input, such as a mouth of a user. In order to obtain superior results the input means is directed as indicated above. A further advantage hereof is that a second input means, not having the directional sensitivity and/or input, receives a significant different input intensity. As a consequence hereof the reliability, accuracy, etc. of the device is improved significantly.
In a further example such a first audio input means is located as mentioned above, in order to receive a sound signal in an optimal way. Thereby also background input is reduced, which improves a quality of a signal transferred to a receiver of spoken information of the user.
In an example the intensity of the background sound input is obtained when the running average foreground sound input intensity is less than the running average background sound intensity plus 3dB (A) , corrected for the characteristics of the microphones.
In an example the intensity of the background sound input is frozen to the last but one logged value when the running average foreground sound input intensity is higher than the running average background sound intensity plus 3dB(A), corrected for the characteristics of the microphones, e.g. during the speaking period of the user.
In an example of the present method it further comprises the step of filtering the foreground sound input and/or the background sound input, such as by filtering out low frequency and high frequency, thereby obtaining a frequency window.
The filtering improves the quality and accuracy of the method significantly. In an example it reduces noise and unwanted frequencies by some 10 dB (A) .
In an example of the present method a low frequency threshold for the frequency window is 200 Hz, preferably 100 Hz, and wherein a high frequency threshold for the frequency window is 1.000 Hz, preferably 2.000 Hz, wherein preferably the frequency window is a weighted window correcting the sound intensity for the sensitivity of the human ear by an A-weighting curve.
It has been found experimentally that a low frequency threshold can be set at 200 Hz, preferably at 100 Hz. It has been found experimentally that a high frequency threshold can be set at 1.000 Hz, preferably at 2.000 Hz. Thereby optimal results are obtained, e.g. in terms of noise reduction, accuracy, reliability, reduction of possibly annoying sounds, etc.
In an example of the present method the intensity of the foreground sound input is amplified or reduced in view of the intensity of the background .sound input. As such a receiver of the foreground sound signal, such as a listener on an other end of a line to the voice of the user, can better pick up the signal.
In an example of the present method a feedback signal, such as a sidetone, is added in the means for output proportional in intensity of a difference between the intensity of the optionally averaged foreground sound input and the intensity of the optionally averaged background sound input plus the threshold, preferably with a negligible latency, such as less than 20 msec, preferably less than 5 msec, such as less than 2 msec, such as about 0,1 msec. The latency is typically determined by electronic limitations, and is in an example virtually absent.
As such the user perceives that the feedback signal is provided real time, not able to detect a delay between his voice provided and the feedback signal. Only an intensity may be different, as is aimed at. Experimentally good results were obtained with a delay of less than 2 msec, such as 1 msec. In an optimal configuration the latency was less than 0,02 msec.
In an example of the present method a first input means obtains a first sound intensity, wherein a second input means obtains a second sound intensity, wherein an optional difference in sound intensity is used to determine the foreground sound input intensity, and wherein the background sound input intensity is determined when the first and second sound intensities differ less than 3 dB (A) , preferably are substantially the same.
In the example the first and second sound intensities are compared. If these are substantially the same, it is assumed no foreground sound intensity is present, as otherwise one of the input means, e.g. microphones, would detect a larger intensity than another. Thus the intensity obtained is regarded as the background sound input intensity.- Typically a difference between input intensities will be negligible small, as a source of background sound input will be relatively far away, and the at least one input means, e.g. microphones, will detect a similar input intensity, if not the same input intensity. If a difference is determined, the difference is attributed to a foreground sound input intensity, e.g. a user speaking into an input means. A previously, i.e. a few msec earlier, frozen determined background sound input level can now be used to determine the difference between the current
" foreground sound input intensity, by subtracting this frozen background sound input intensity plus threshold. In an example of the present method the feedback signal intensity comprises a delayed foreground sound input intensity, wherein the delay is preferably smaller than 500 msec, more preferably smaller than 250 msec, even more preferably smaller than 100 msec, such as smaller than 50 msec .
As indicated above the intensities need to be determined, which relates to a process which inevitably involves some time, in the order of msecs . Therefore a result of such a process is always somewhat later available, at a delay in time. Providing the intensity of the feedback signal therefore is delayed, whereas the content of the signal is provided without (substantial) delay, i.e. real time.
In an example of the present method the part of the foreground sound input intensity being added is proportional to the foreground sound input intensity, such as from l%-300% thereof, preferably from 5% - 200% thereof, more preferably from 30% - 125% thereof, such as from 50%- 100% thereof. It has been found experimentally that the intensity is preferably not too small and not too large as indicated above. Good results were obtained at the above levels.
In an example the feedback signal intensity may be a linear, exponential, logarithmical, step-function, etc. of the foreground sound input intensity. Even further the intensity may vary, such as increase or decrease, likewise in time, i.e. become larger or smaller. Even further, the function, variation, time interval, etc. may be adjustable by a user. In a further example the feedback signal may be switched off or on, as desired by the user. Typically the maximum output is limited, e.g. in terms of maximum pressure of an output means, such as a speaker, and/or in view of possible ear damage. In other words, an upper limit is provided.
In an example of the present method the predetermined threshold value is at least 3 dB (A) , preferably at least 5 dB (A) , more preferably at least 10 dB (A) , such as at least 20 dB (A) . The threshold is preferably not set too low, as otherwise feedback is provided when a user speaks soft, though somewhat louder than a background sound intensity. It is noted that a level of 3 dB (A) reflects a factor two louder noise. Guided by the sidetone keeping his voice intensity below twice the volume of the background noise, the users voice will immerse in the background and can hardly be heard by people in the vicinity. In crowded places this will increase privacy and efficiency of the call and prevent irritations. At that point no adaptation of voice level seems needed.
In an example of the present method no feedback signal is added to the output signal when the absolute value of the optional running average foreground sound signal intensity is below the sum of the threshold value and the absolute value of the optional running average of the frozen background sound signal intensity.
In an example the present method is aimed at providing a preset default static minimal background sound signal intensity, and preventing a feedback signal if the background sound signal intensity is below the preset default static minimal background sound signal intensity.
In an example the present method the dynamic foreground sound signal that is proportionally added to the output signal for the ear speaker can not be heard by the caller on the other end of the line.
In an example the present method when switched on the amplification of the noise cancelled foreground sound microphone signal towards the person on the other end of the line is automatically increased with pre-programmed steps when the running average continuous background signal stays in categorized volume ranges below the normal preset volume range of standard mobile communication. In an example the present method if the microprocessor has issues meeting the latency demand because of the required real-time processing of the signals, a possible embodiment of the feedback circuitry is an analogue mix circuit. Steered by the digital delayed Delta (x) value, the circuit injects the analogue foreground sound signal from microphone 31 with enhancement factor Delta (x) to the Earplug signal. Optionally analogue noise cancelling can be implemented by subtracting the analogue background signal from microphone 32 with factor Delta (x) from the analogue voice signal corrected for the characteristics of microphones 31 and 32.
In an example the present method software is implemented in a headset positioned at the users ear, and where microphones and the ear speaker from part of the same headset, wherein the headset is connected via Bluetooth to the smartphone of which its microphones and ear speaker are switched off, where the distance between the voice microphone of the Headset and the ear speaker is at least 10 centimetre.
In a second aspect the present invention relates to a device according to claim 14. The device comprises
at least two audio input means, such as a microphone, the at least two input means being spaced apart at a mutual distance, such that a distance from a first audio input means to a source of foreground sound is substantially shorter from a distance from a second audio input means to the same source of foreground sound,
at least one means for providing an output signal, such as a speaker, and
at least one processor, the processor being adapted for processing sound input, providing an output signal and providing a feedback signal.
In an example according to the invention the device is selected from the group of smartphone, telephone, mobile telephone, headset, computer and combinations thereof .
The various aspects and features described and shown in the specification can be applied, individually, wherever possible. These individual aspects, in particular the aspects and features described in the attached dependent claims, can be made subject of divisional patent applications . BRIEF DESCRIPTION OF THE DRAWINGS
The invention will be elucidated on the basis of an exemplary embodiment shown in the attached drawings, wherein elements therein are of exemplary nature, in which:
Figure 1 shows a smartphone containing the Application according to the invention;
Figure 2A shows a side view of the head of a user of a headset according to the invention;
Figure 2B shows the opposite side of the headset according to figure 2A, as seen from the head of the user;
Figure 3 is a schematic top view of the user of the headset according to figures 2A and 2B.
Figure 4 shows the human hearing sensitivity with horizontal the frequency and vertical the intensity in dB.
Figure 5 shows the A, B and C-weighting curves.
DETAILED DESCRIPTION OF THE INVENTION
Only accurate feedback to the user with minimal latency will motivate him to inhibit the Lombard reflex. Guided by the sidetone keeping his voice intensity below twice (+3dB(A)) the volume of the background noise, the user' s voice will immerse in the background and can hardly be heard by people in the vicinity. In crowded places this will increase dramatically privacy and efficiency of the call and prevent irritations of citizens.
Figure 1 shows a smartphone 30 on which the invention 40 is downloaded as Application. Alternatively, the invention is implemented in the headset 10 as will be explained later.
The smartphone 30 typically comprises a housing 34 that carries an interactive display 35, a foreground microphone 31 to pick up the users voice and a background noise microphone 32 for measuring the background sound signal, an ear speaker 33 to reproduce the sound of the voice of the other end of the line into the users ear including the feedback sidetone. The smartphone 30 is controlled by an inner electronic circuit comprising a microprocessor that can process in real-time digital signals, a rechargeable power supply and a wireless Bluetooth transceiver or optionally a connector for a cable that are connected to the microprocessor, and analogue- digital converters that are connected to the microprocessor to convert a digital output signal from the microprocessor into an analogue electronic signal to power the ear speaker 33 and to provide the microprocessor with a digital foreground sound signal from the foreground microphone 31 and ambient/ background sound microphone 32. The foreground sound signal is proportional to an average voice sound pressure at the voice sound microphone 31 for the moments the user is speaking. The background sound signal is proportional to an average background sound pressure at the background sound microphone 32. To enhance the voice quality of the user to the other end of the line, a part of the background signal from microphone 32 is distracted from the foreground sound microphone 31, so called noise cancelling. The above-mentioned corrections are adjusted for the respective microphone characteristics.
To activate the invention, the user presses the embodiment "App" 40 on the display. The embodiment can be enriched by adding the input of user settings: e.g. to adjust the threshold value of the feedback as well as feedback volume characteristics e.g. linear, logarithmic, exponential, constant, height of step-in volume etc. By going back to the menu of the smartphone the App can be started and stopped before, during and after the user activates the smartphone telephone application 36 to make a call.
As a first functionality when switched on, the invention takes the following steps to keep the feedback sidetone signal continuously closely resembling reality:
a) Continuous comparison of Foreground and Background sound intensities to determine whether the user is speaking or not
b) Running average Background sound intensity and freeze
c) Determining Delta of Foreground minus frozen. Background sound intensity value a) Continuous comparison of Foreground and Background sounds .
Microphone 31 continuously measures the foreground sound intensity. Microphone 32 continuously measures the background sound intensity.
When the absolute value of the running average of the foreground intensity is more than 3dB (A) than the absolute value of the running average background intensity times m, the invention decides that the user is speaking.
If this is not the case, the user is not speaking.
Correction factor m is such that when the foreground sound source, usually the voice of the user, is silent and a remote sound source is loudly available, e.g. a machine at more than 4 meters of the invention, the intensity of the foreground sound microphone equals the intensity of the background sound microphone times m: FI = BI x m. b) Running average Background sound intensity and freeze^
When detected that the user is not speaking, the absolute
1 value of the running average background intensity is being logged. As soon as is detected that the user is speaking, the last but one value of the background sound intensity is frozen for that speaking slot.
c) Determining Delta of Foreground minus Background Signal When detected that the user is speaking, at the start x of the speaking slot the factor Delta (x-1), being the difference between the absolute value of the running average Foreground Signal Intensity FI (x) and the frozen Background Signal Intensity BI(x-l), is set to zero. The microprocessor needs time to acquire reliable background and foreground signal levels. To calculate the first accurate Delta, typically 1-10 seconds sample time is required in which the value stays zero.
When speaking the foreground signal intensity is calculated, corrected for the final acoustic specifications of the different microphones 31 and 32, and compared to the frozen BI(X-l) an accurate Delta (x) is calculated after every sample. To prevent Delta to become a negative value, Delta (x) is the maximum of the values zero and the difference of the absolute value of Foreground Signal Intensity (x) and the absolute value of Background Signal Intensity (x) . In formula: Delta (x) = MAX (0, ABS(FI(x))- ABS(BI(x))). To prevent irritating stochastic feedback to the user, also the value Delta is softened by running average: Delta (x) = (Delta (x-1) + c * Delta (x) )/( 1+c) . The value of c will be larger that the value of the progressing averages of the background- and foreground sound signals but must be optimised in the final design.
When Delta is more than a threshold value of 3 dB (A) , a personalised part of the foreground sound signal is added to the output signal for the ear speaker 12, whereby the user starts hearing his own voice in the ear speaker 12, mixed up with the sound of the caller on the other end of the line. It is crucial that any latency of the users own voice in his ear is kept minimal to prevent irritations. Typical maximum latency value is 2 msec.
The feedback of his own voice gets stronger when his voice is increasingly stronger compared to the average dynamic frozen background sound. The caller on the other end of the line continues clearly hearing the user without hearing the increasing feedback.
This feedback functionality motivates the user to stop speaking too loud with respect to people in his vicinity, in other words to inhibit his Lombard reflex.
The value of Delta (x-1) is set to zero after each ending of the users speaking slot. This means when the user remembers the inventions feedback during his last too loud sentence and starts speaking much softer next slot, he is immediately rewarded with the absence of the feedback in his ear. In case he continuous his too loud conversation, within seconds the progressing average provides the proportional feedback in his ear.
As the invention contains a preset default threshold of background sound signal, the user remains able to continue speaking softly but normally in case the background sound signal falls to silence, preventing the user to get uncontrollable feedback or is being forced to whisper.
As a second functionality, if the microprocessor of the invention has troubles meeting the latency demand because of the required processing of the signals, a possible embodiment of the feedback circuitry is an analogue mix circuit. Steered by the digital slightly delayed Delta (x) value, the circuit injects the analogue foreground voice signal from microphone 31 with enhancement factor Delta (x) to the Earplug signal.
Optionally analogue noise cancelling can be reached by subtracting the analogue background signal from microphone, 32 with factor Delta (x) from the analogue voice signal.
As a third functionality, when switched on the amplification of the noise cancelled voice microphone signal towards the person on the other end of the line is automatically increased with pre-programmed steps when the running average continuous background signal stays in categorized volume ranges below the normal preset volume range of standard mobile communication. This means that the softer speaking user in quiet places remains clearly audible on the other end of the line.
Above-mentioned functionalities one, two and three of the Application 20 can also be implemented in the headset accessory 10 that is connected with the smartphone 30 via Bluetooth. In this configuration the smartphone 30 or optionally a normal cellular phone works with the headset with the functionality according to invention running on in the headset microprocessor. In this wirelessly interconnected mode the ear speaker 43 and the microphones 31 and 32 of the smartphone 30 are not in use during a phone call.
The microprocessor of the headset 10 is loaded with software to control the headset 10 according to the invention during a phone call. With the push button 8 on the headset 10, the user can switch on and off the "Voice Immersion" accurate behaviour feedback functionality according to the invention.
Figure 2A shows the head 1 of a user of a headset 10 according to the invention. The headset 10 is shaped to fit around the ear 2. The user carries the headset 10 around only one of his ears 2. The headset 10 comprises an elongated, curved housing 11 and an ear speaker 12 that is connected to the housing 11 via a first rod 9. The ear speaker 12 is partly inserted in the ear canal 3 of the, user. The ear speaker 12 has a shotgun sound output directionality or polar pattern 13 that is aligned with the ear canal axis A. From the ear speaker 12 a curved second rod 14 extends along the cheek towards the mouth 4 of the user.
The carrying rod 14 carries a voice sound microphone 17 at its free end and a background sound microphone 15 situated between the voice sound microphone 17 and the ear speaker 12. The voice sound microphone 17 has a shotgun sensitivity directionality or polar pattern 18 that directed towards the mouth 4 of the user to optimally pick up his voice. The polar pattern 18 is aligned with an axis D that, when seen in top view parallel to the cranial axis of the user, is under an angle E of 0- 60 degrees with the forward-backward axis B of the user. In this top view the forward-backward axis is perpendicular to the ear canal axis A. The distance C between the centre of the ear speaker 12 and the centre of the voice sound microphone 17 is typically larger than 10 centimetre to position the voice sound microphone 17 sufficiently close to the mouth 4 of the user to optimally pick up his voice. The background sound microphone 15 has an omni-directional sensitivity directionality or polar pattern 16 that faces away from the user to pick up the background noise. The headset 10 is further provided with a push button 8 that can be reached behind the ear 2.
The housing 11 comprises an inner space 20 wherein an electronic circuit has been enclosed. The electronic circuit comprises a microprocessor, a rechargeable power supply and a wireless Bluetooth transceiver that are connected to the microprocessor, an electronic connection between the push button 8 and the microprocessor, and analogue-digital converters that are connected to the microprocessor to convert a digital output signal from the microprocessor into an analogue electronic signal to power the ear speaker 12 with the voice sound signal of the caller on the other end of the line and to provide the microprocessor with a digital voice sound signal from the voice sound microphone 17 and a digital background sound signal from the background microphone 15. The voice sound signal is proportional to an average voice sound pressure at the voice sound microphone 17 when speaking. The background sound signal is proportional to an average background sound pressure at the background sound microphone 15.
During a phone call with the attached headset, the headsets microprocessor is provided with a digital sound signal from the foreground sound microphone 17 and a digital background sound signal from the microphone 15. Continuously repeated the microprocessor samples the foreground- and the background sound signals to detect whether the user is speaking.
When detected that the user is not speaking, the absolute value of the running average background intensity is logged. As soon as is detected that the user is speaking, the last but one value of the background sound intensity is frozen for that speaking slot.
When detected that the user is speaking, at the start x of the speaking slot the factor Delta (x-1), being the difference between the absolute value of the running average Foreground Signal Intensity FI (x) and the frozen Background Signal Intensity BI(x-l), is set to zero. The microprocessor needs time to acquire reliable background and foreground signal levels. To calculate the first accurate Delta, typically 1-10 seconds sample time is required in which the value stays zero.
When speaking the foreground signal intensity is calculated, corrected for the final acoustic specifications of the different microphones 15 and 17, and compared to the frozen BI(X-l) an accurate Delta (x) is calculated after every sample. To prevent Delta to become a negative value, Delta (x) is the maximum of the values zero and the difference of the absolute value of Foreground Signal Intensity (x) and the absolute value of Background Signal Intensity (x) . In formula: Delta (x) = MAX (0, ABS(FI(x))- ABS(BI(x))). To prevent irritating stochastic feedback to the user, also the value Delta is softened by running average: Delta (x) = (Delta (x-1) + c * Delta (x) )/ (1+c) . The value of c will be larger that the value of the progressing averages of the background- and foreground sound signals but must be optimised in the final design.
When Delta is more than a threshold value of 3 dB (A) , a personalised part of the foreground sound signal is added to the output signal for the ear speaker 12, whereby the user starts hearing his own voice in the ear speaker 12, mixed up with the sound of the caller on the other end of the line. It is crucial that any latency of the users own voice in his ear is kept minimal to prevent irritations. Typical maximum latency value is 2 msec.
The feedback of his own voice gets stronger when his voice is increasingly stronger compared to the average dynamic background sound. The caller on the other end of the line continues clearly hearing the user without hearing the increasing feedback.
This feedback functionality motivates the user to stop speaking too loud with respect to people in his vicinity.
The value of Delta (x-1) is set to zero after each ending of the users speaking slot. This means when the user remembers the inventions feedback during his last too loud sentence and starts speaking much softer next slot, he is immediately rewarded with the absence of the feedback in his ear. In case he continuous his too loud conversation, within seconds the progressing average provides the proportional feedback in his ear.
As the invention contains a preset default threshold of background sound signal, the user remains able to continue speaking softly but normally in case the background sound signal falls to silence, preventing the user to fall into whispering or uncontrollable feedback.
Advantages of the headset embodiment over the "Application" embodiment are the superior acoustical characteristics of microphone 17 and the hands-free operation. Drawbacks are a potential higher price and user comfort .
Figure 4 shows the human hearing sensitivity and Figure 5 shows a graph of an A-weighting curve, with horizontal the frequency and vertical the intensity in dB.
It is to be understood that the above description is included to illustrate the operation of the preferred embodiments and is not meant to limit the scope of the invention. From the above discussion, many variations will be apparent to one skilled in the art that would yet be encompassed by the spirit and scope of the present invention .

Claims

C L A I M S
1. Method for providing a real time feedback signal to a user providing foreground sound input comprising the steps of:
providing at least two audio input means, such as a microphone, the at least two input means being spaced apart at a mutual distance, such that a distance from a first audio input means to a source of foreground sound is substantially shorter from a distance from a second audio input means to the same source of foreground sound,
providing at least one means for providing an output signal, such as a speaker, and at least one processor,
obtaining a total sound signal intensity (dB(A)) comprising one or more of foreground sound input intensity (dB(A)) and background sound input intensity (dB(A)),
obtaining the foreground sound input intensity and the background sound input intensity,
comparing the intensity of the averaged foreground sound input and the intensity of the averaged background sound input, and
adding part of the foreground sound input signal as a feedback signal to the output signal in case the foreground sound input intensity is larger than an intensity comprising a predetermined threshold value and the intensity of the averaged frozen background sound input,
wherein a feedback signal, such as a sidetone, is added in the means for output proportional in intensity of a difference between the intensity of the optionally averaged foreground sound input and the intensity of the frozen optionally averaged background sound input, preferably with a negligible latency, such as less than 20 msec, preferably less than 5 msec, such as less than 2 msec, such as about 0,1 msec.
2. Method according to claim 1, further providing the first audio input means, the second audio input means and the means for providing the output signal as part of one device, such as a smartphone, telephone, mobile telephone, headset, computer, providing at least the first audio input means with a directional sensitivity, such as with a polar pattern, directing the first audio input means towards a mouth of the user, and aligning the directional sensitivity with a virtual axis that, when seen in top view parallel to a cranial axis of the user, is under an angle of 0-60 degrees with a forward-backward axis of the user.
3. Method according to claim 1 or 2, wherein the intensity of the background sound input is obtained when the running average foreground sound input intensity is less than the running average background sound intensity plus 3dB (A) , corrected for the characteristics of the microphones .
4. Method according to any of the preceding claims, wherein the intensity of the background sound input is frozen to the last but one logged value when the running average foreground sound input intensity is higher than the running average background sound intensity plus 3dB (A) , corrected for the characteristics of the microphones, eg during the speaking period of the user.
5. Method according to any of the preceding claims, further comprising the step of filtering the foreground sound input and/or the background sound input, such as by filtering out low frequency and high frequency, thereby obtaining a frequency window, wherein preferably a low frequency threshold for the frequency window is 200 Hz, preferably 100 Hz, and wherein a high frequency threshold for the frequency window is 1.000 Hz, preferably 2.000 Hz, wherein preferably the frequency window is a weighted window correcting the sound intensity for the sensitivity of the human ear- by an A-weighting curve.
6. Method according to any of the preceding claims, wherein further intensity of the foreground sound input is amplified or reduced in view of the intensity of the background sound input .
7. Method according to any of the preceding claims, further obtaining a running average of the foreground sound input intensity and/or of the background sound input intensity, such as a running average over 1 msec - 2 sec.
8. Method according to any of the preceding claims, wherein a first input means obtains a first sound intensity, wherein a second input means obtains a second sound intensity, wherein an optional difference in sound intensity is used to determine the foreground sound input intensity, and wherein the background sound input intensity is determined when the first and second sound intensities differ less than 3 dB (A) , preferably are substantially the same .
9. Method according to any of the preceding claims, wherein the feedback signal comprises a delayed foreground sound input, wherein the latency is preferably smaller than 100 msec, more preferably smaller than 20 msec, even more preferably smaller than 2 msec, such as smaller than 0,1 msec .
10. Method according to any of the preceding claims, wherein the part of the foreground sound input intensity being added is proportional to the foreground sound input intensity, such as from 1% - 300% thereof, preferably from 5% - 200% thereof, more preferably from 30% - 125% thereof, such as from 50% - 100% thereof.
11. Method according to any of the preceding claims, wherein the predetermined threshold value is at least 3 dB (A) , preferably at least 5 dB (A) , more preferably at least 10 dB (A) , such as at least 20 dB (A) .
12. Method according to any one of the preceding claims, wherein no feedback signal is added to the output signal when the absolute value of the optional running average foreground sound signal intensity is below the sum of the threshold value and the absolute value of the optional running average of the frozen background sound signal intensity.
13. Method according to any one of the preceding claims, further providing a preset default static minimal background sound signal intensity, and preventing a feedback signal if the background sound signal intensity is below the preset default static minimal background sound signal intensity.
14. Device comprising at least two audio input means, such as a microphone, the at least two input means being spaced apart at a mutual distance, such that a distance from a first audio input means to a source of foreground sound is substantially shorter from a distance from a second audio input means to a source of foreground sound, at least one means for providing an output signal, such as a speaker, and
at least one processor, the processor being adapted for processing sound input, providing an output signal and providing a feedback signal, wherein the device further comprises software and a means for storing software, wherein the software is adapted to provide a feedback signal, such as a sidetone, to be added in the means for output proportional in intensity (dB (A) ) of a difference between the intensity of the optionally averaged foreground sound input (dB (A) ) and the intensity of the frozen optionally averaged background sound input (dB (A) ) , preferably with a negligible latency, such as less than 20 msec, preferably less than 5 msec, such as less than 2 msec, such as about 0,1 msec.
15. Device according to claim 14, selected from the group of smartphone, telephone, mobile telephone, headset, computer and combinations thereof, further providing the first audio input means, the second audio input means and the means for providing the output signal as part of one device, providing at least the first audio input means with a directional sensitivity, such as with a polar pattern, directing the first audio input means towards a mouth of the user, and aligning the directional sensitivity with a virtual axis that, when seen in top view parallel to a cranial axis of the user, is under an angle of 0-60 degrees with a forward-backward axis of the user.
PCT/NL2012/000026 2011-04-19 2012-04-13 Voice immersion smartphone application or headset for reduction of mobile annoyance WO2012144887A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
NLNL1038762 2011-04-19
NL1038762A NL1038762C2 (en) 2011-04-19 2011-04-19 Voice immersion smartphone application or headset for reduction of mobile annoyance.

Publications (1)

Publication Number Publication Date
WO2012144887A1 true WO2012144887A1 (en) 2012-10-26

Family

ID=46319883

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/NL2012/000026 WO2012144887A1 (en) 2011-04-19 2012-04-13 Voice immersion smartphone application or headset for reduction of mobile annoyance

Country Status (2)

Country Link
NL (1) NL1038762C2 (en)
WO (1) WO2012144887A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2538165A (en) * 2015-04-13 2016-11-09 Soundchip Sa Audio communication apparatus
US11804113B1 (en) 2020-08-30 2023-10-31 Apple Inc. Visual indication of audibility

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040242160A1 (en) 2003-05-30 2004-12-02 Nokia Corporation Mobile phone for voice adaptation in socially sensitive environment
US20070021958A1 (en) 2005-07-22 2007-01-25 Erik Visser Robust separation of speech signals in a noisy environment
US20070017810A1 (en) 2005-07-19 2007-01-25 Lee Hun-Joo Microfluidic device for electrochemically regulating pH of fluid therein and method of regulating pH of fluid using the microfluidic device
WO2007017810A2 (en) * 2005-08-11 2007-02-15 Koninklijke Philips Electronics N.V. A headset, a communication device, a communication system, and a method of operating a headset
US20090017670A1 (en) 2007-07-12 2009-01-15 Yamaha Corporation Electronic component and method of forming the same
WO2010009345A1 (en) 2008-07-16 2010-01-21 Qualcomm Incorporated Method and apparatus for providing audible, visual or tactile sidetone feedback notification to a user of a communication device with multiple microphones

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040242160A1 (en) 2003-05-30 2004-12-02 Nokia Corporation Mobile phone for voice adaptation in socially sensitive environment
US20070017810A1 (en) 2005-07-19 2007-01-25 Lee Hun-Joo Microfluidic device for electrochemically regulating pH of fluid therein and method of regulating pH of fluid using the microfluidic device
US20070021958A1 (en) 2005-07-22 2007-01-25 Erik Visser Robust separation of speech signals in a noisy environment
WO2007017810A2 (en) * 2005-08-11 2007-02-15 Koninklijke Philips Electronics N.V. A headset, a communication device, a communication system, and a method of operating a headset
US20090017670A1 (en) 2007-07-12 2009-01-15 Yamaha Corporation Electronic component and method of forming the same
WO2010009345A1 (en) 2008-07-16 2010-01-21 Qualcomm Incorporated Method and apparatus for providing audible, visual or tactile sidetone feedback notification to a user of a communication device with multiple microphones

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2538165A (en) * 2015-04-13 2016-11-09 Soundchip Sa Audio communication apparatus
US11804113B1 (en) 2020-08-30 2023-10-31 Apple Inc. Visual indication of audibility

Also Published As

Publication number Publication date
NL1038762C2 (en) 2012-10-22

Similar Documents

Publication Publication Date Title
US8897457B2 (en) Method and device for acoustic management control of multiple microphones
US8744091B2 (en) Intelligibility control using ambient noise detection
US8081780B2 (en) Method and device for acoustic management control of multiple microphones
CN101552823B (en) Volume management system and method
EP3777114B1 (en) Dynamically adjustable sidetone generation
US10121491B2 (en) Intelligent volume control interface
US20230328461A1 (en) Hearing aid comprising an adaptive notification unit
NL1038762C2 (en) Voice immersion smartphone application or headset for reduction of mobile annoyance.
JPH07221821A (en) Apparatus and method for attenuation of echo
US20070032259A1 (en) Method and apparatus for voice amplitude feedback in a communications device
JP2643877B2 (en) Telephone
US20220139414A1 (en) Communication device and sidetone volume adjusting method thereof
KR101482420B1 (en) Sound Controller of a Cellular Phone for Deafness and its method
TWI425818B (en) Volume management system and method
JPH09181817A (en) Portable telephone set
CN114446315A (en) Communication device and method for adjusting output side tone
JP4676460B2 (en) Voice communication device
JPH1023114A (en) Telephone set
JPH05110637A (en) Telephone set
JP2006270300A (en) Apparatus for controlling received sound volume
GB2538165A (en) Audio communication apparatus
JPH05235789A (en) Voice communication terminal equipment
JPH0818647A (en) Telephone set
TW201642675A (en) Communication apparatus and volume adjustment method thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12728329

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12728329

Country of ref document: EP

Kind code of ref document: A1