WO2002065456A1 - System and method for voice quality of service measurement - Google Patents

System and method for voice quality of service measurement Download PDF

Info

Publication number
WO2002065456A1
WO2002065456A1 PCT/JP2002/000658 JP0200658W WO02065456A1 WO 2002065456 A1 WO2002065456 A1 WO 2002065456A1 JP 0200658 W JP0200658 W JP 0200658W WO 02065456 A1 WO02065456 A1 WO 02065456A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice signal
preprocessing
pitch
circuit
indication
Prior art date
Application number
PCT/JP2002/000658
Other languages
French (fr)
Inventor
Kambiz Homayounfar
Original Assignee
Genista Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Genista Corporation filed Critical Genista Corporation
Publication of WO2002065456A1 publication Critical patent/WO2002065456A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/69Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for evaluating synthetic or decoded voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/90Pitch determination of speech signals

Definitions

  • the present invention relates generally to a system, method and apparatus for measuring the quality of a voice signal in a communications system, and more specifically to a system, method and apparatus for rating the perceptual quality of a voice signal.
  • QoS Quality of service
  • the QoS involves a perceptual measure of quality of user devices such as wireless phones.
  • the QoS relates to, for example, how the user perceives the quality of the phone call, which can include the accessibility of network communication, the retention of the communication, and the integrity of the communication.
  • the accessibility factor includes the ability to access the network and place a phone call, and can be affected by, for example, the number of callers within a cell of a wireless phone network.
  • the retention factor includes any interruptions to the call or loss of signal, resulting in a dropped call.
  • the integrity factor includes the voice quality of the communication, such as the clarity of it and elimination of static noise, and can also include other factors such as general customer service.
  • the present invention overcomes the aforementioned shortcomings of the prior art by providing a means to measure the voice quality of a communications signal without the use of a reference signal.
  • the Mono Voice- Quality (MonoVQ) measurement system that is disclosed is based on the acoustic properties of distorted speech, which may include: pitch statistics, pitch perturbations, loudness measures and speech and spectral measures. These measures strongly relate to the acoustic sound qualities of hearing sensation loudness, sharpness of sound, fluctuation strength and roughness of communication signals.
  • a method consistent with the present invention determines an indication of speech quality for a voice signal.
  • the method includes receiving an electronic voice signal, measuring a perceptual quality of the voice signal without use of a reference signal, and providing an indication of the measured perceptual quality.
  • An apparatus consistent with the present invention determines an indication of speech quality for a voice signal.
  • the apparatus includes a receiving circuit for receiving an electronic voice signal.
  • An analysis circuit in communication with the receiving circuit, measures a perceptual quality of the voice signal without use of a reference signal and provides an indication of the measured perceptual quality.
  • an object of the present invention to set forth an improved system and method that measures the voice quality of a speech signal.
  • FIGURE 1 is a generalized illustration of a communications network utilizing the principles of the present invention
  • FIGURE 2 is a block diagram of a system for voice quality measurement
  • FIGURE 3 is a graph of an exemplary original input signal
  • FIGURE 4 is a graph of an exemplary medium distortion input signal
  • FIGURE 5 is a graph of an exemplary high distortion input signal
  • FIGURE 6 is a graph of an exemplary pitch perturbation using the voice quality measurement for the original input signal of FIGURE 3;
  • FIGURE 7 is a graph of an exemplary pitch perturbation using the voice quality measurement for the medium distortion input signal of FIGURE 4.
  • FIGURE 8 is a graph of an exemplary pitch perturbation using the voice quality measurement for the high distortion input signal of FIGURE 5.
  • Communications network 100 is composed of a base station 110, base station transceiver 120 and mobile transceivers 130.
  • the signal processing and QoS measurement innovations may be implemented at various points throughout the network.
  • the circuitry used to measure the QoS of the signals may be embedded either in mobile transceivers 130, or in the network, e.g., the base station 110, or in both.
  • the QoS measurement innovations are best suited for implementation within digital signal processor (DSP) chips.
  • DSP digital signal processor
  • the real-time voice signal in digital form that is used to calculate the QoS is available within the memory of DSP chips commonly found in phones 130 and in base stations 110.
  • the network element that contains the real-time voice signal is the DSP chip of the Transcoding Unit (TU) 125 as illustrated in Figure 1.
  • TU Transcoding Unit
  • MOS mean opinion score
  • a MOS is the result of a subjective listening test, wherein listeners compare various samples generated from voice streams and assign a quality value of 1 to 5 thereto.
  • the MOS is an arithmetic mean determined from a large number of samples for a particular voice stream.
  • MOS scores have been used to characterize the relative quality of compressed voice streams, although MOS scores can also be used to determine the perceived transmission quality for wireless voice streams, subject to a variety of conditions.
  • land lines transmitting at 16 kbps typically have MOS scores of around 3.6-3.7; whereas, voice signals compressed to 4.75 kbps for wireless transmission have MOS scores around 3.2.
  • the present invention overcomes this shortcoming by providing a means to measure the signal quality in real-time without the use of a reference signal. Applicant has found that the MonoVQ measurement generated by the present invention corresponds to the industry standard MOS and can be used as a means to provide a reliable measure of the signal quality.
  • MonoVQ system 200 includes a source of a speech or voice signal 202, which can be intercepted, for example, at a base station processing a wireless communication.
  • a receiving circuit intercepts the voice signal 202 for processing.
  • the MonoVQ system may include an analog-to-digital converter (ADC) 204 that converts the speech source signal 205 into a corresponding digital signal, X, designated by the reference numeral 206, as illustrated in FIGURE 2.
  • ADC analog-to-digital converter
  • preprocessing circuits receive the digital voice signal and perform initial processing of the signal before the MonoVQ systems performs QoS analysis of the signal.
  • the preprocessing circuits may include a
  • BNC Background Noise Canceller
  • VAD Voice Activity Detector
  • BNC 208 receives the digital voice signal 206 and outputs a voice after noise removal signal x' 210 and a background noise level signal B 212 as illustrated in FIGURE 2.
  • VAD 214 receives the signal x' 210 and outputs a voice after silence removal signal x' ' 216 and a voice activity factor signal V 218.
  • Implementation of circuits ADC 204, BNC 208, and VAD 214 are known to those skilled in the art.
  • the preprocessing circuit can also include a plurality of circuits that receive the signal x'' and process it to output signals that are used to analyze the QoS of the original received voice signal 202.
  • a spectra circuit 220 receives the signal x'' 216 and outputs a spectral perturbation factor S 222 that is received by the analysis circuit 232.
  • a pitch circuit 224 receives the signal x' ? 216 and outputs a pitch perturbation factor P 226 that is received by the analysis circuit 232, which can include short-term and long-term pitch perturbation factors.
  • a loudness index L 230 is also generated at loudness 228 as illustrated in Figure 2.
  • Table one displays examples of mathematical equations that may be used to calculate the perturbation metrics.
  • f ⁇ represents instantaneous pitch frequency
  • T ⁇ represents the pitch period
  • i represents a frame
  • N is the total number of measurements.
  • the above metrics are applied to the spectra and pitch measurements of the signal and used to determine the MonoVQ for a voice signal.
  • the MonoVQ as discussed above, is closely related to the MOS of a signal and is used as an indicator for the quality of service for a voice signal.
  • the analysis circuit 232 receives the signals S 222, P 226, and L 230 from spectra circuit 220, pitch circuit 224, and loudness circuit 228 respectively, and the signals B 212 and V 218 from B ⁇ C 208 and VAD 214, respectively.
  • voice quality signal 234 provides an indication of the perceptual quality of the original received voice signal 202.
  • Voice quality signal 234 provides an indication of various perceptual qualities of the QoS integrity of the received signal such as, for example, a hearing sensation loudness, a sharpness of sound, a fluctuation strength or a roughness.
  • the inputs to analysis 232 are sampled at a frame rate of 20 milliseconds (ms) .
  • each frame corresponds to 160 speech samples.
  • the inputs to analysis 232 may be denoted as follows: Pitch Perturbation Index: Pi; Spectral Perturbation Index: Si; Loudness Perturbation Index: Li; Voice Activity Type: Vi; and Background Noise Level: Bi, where the letter i denotes the frame index.
  • the output of analysis 232 is a single number M for the entire speech signal that statistically relates to the MOS of the original signal. As discussed above, M and MOS are highly correlated when considered for a large set of speech utterances (signals)
  • M is a statistical function of an internal variable Ui-
  • the function of Uu is the average, but in other embodiments other functions such as the standard deviation or the Kurtosis of ⁇ j i may be also applicable.
  • M g[ ⁇ ⁇ ], where ⁇ ⁇ is derived from the input signals Pi, Si, Li, Vi, and Bi, as follows
  • the first step is to classify the speech sample of the sampled frame. Certain frames of a conversation affect the QoS of a signal more than others. Table 2 below shows how varying levels of significance are assigned to different frame samples. The significance of the sample is indicated by Vi .
  • Table 2 Phonetic Classification of Input Speech based on Voice Activity
  • Ai Vi x ( B s Si + BpPi + B L Li) .
  • a high Si spectral perturbation
  • a high Pi pitch perturbation
  • a stationary voiced sound like /a/ as in apple The factors B P , B and B s are for scaling purposes.
  • FIGURES 3-5 of the Drawings there are illustrated therein examples of an original input signal 300, a medium distortion input signal 400, and a high distortion input signal 500.
  • the examples in Figures 3, 4 and 5 are illustrative of the type of speech source signals 202 that are likely to be processed by the invention in actual operation.
  • the type of network conditions that may affect the distortion of the signal 202 may include, but are not limited to, handovers from one base station to another base station, movement of the mobile telephone at different speeds (acceleration and deceleration of vehicles) , poor radio signal reception, and turning around corners, i.e., knife edge effects that occur when a vehicle or pedestrian makes a right or left turn.
  • the varying levels of distortion within Figures 3, 4 and 5 can be seen in the signal amplitude loss at several locations. The varying levels of distortion cause the different shape of the signals i.e., if the three examples are superimposed, the distortion phenomena is clearly observed.
  • the pitch perturbation of the original input signal 300 includes smooth and regular pitch tracks, which indicate that the user is receiving a signal of only minor distortion.
  • the pitch perturbation of the medium distortion input signal 400 includes irregular pitch tracks. The irregular pitch tracks indicate the signal is experiencing a moderate level of distortion.
  • the pitch perturbation of the high distortion input signal 500 includes both abnormal and irregular pitch tracks. The abnormal and irregular pitch tracks indicate that the user is receiving a signal that is experiencing a high level of distortion.
  • the y-axis is the pitch (fundamental frequency) of the input signal in (Hz) Hertz.
  • the sound /aaaaaaa/ may give a pitch of 300 Hz.
  • the sound /eeeeee/ will give a different (lower) pitch in Hz. Sounds like /she/ are generally unvoiced so the pitch value would simply be blank.
  • the blank pitch values are indicated by the blank portions of the graph in Figures 6-8. Sharp deviations as shown in Figures 7 and particularly pronounced in Figure 8, show that something is abnormal with regards to the pitch of the signal.
  • the numerical computation of the pitch distortion via perturbation functions (Equations 1 and 2 from Table 1) provide data on the deterioration of perceptual quality of the signal and the related MOS.
  • the perturbation is computed inside the pitch box 224, and the output is the pitch perturbation index P 226, with P computed every frame of 20 milliseconds.
  • the spectra perturbation measurements are generated in the same manner as the pitch perturbation metrics described above.
  • the numerical computation of the spectra distortion via perturbation functions (Equations 1 and 2 from Table 1) provide data on the deterioration of perceptual quality of the signal and the related MOS.
  • the perturbation is computed inside the spectra box 220, and the output is the spectra perturbation index S 222.
  • the pitch perturbation and spectra perturbation measurements described above are generated at the pitch circuit 224 and spectra circuit 220 respectively and then passed on to the analysis circuit 232.
  • the analysis circuit 232 receives signal data from other circuits, e.g. BNC 208, VAD 214 and loudness circuit 228.
  • the analysis circuit 232 uses the perturbation measurements and other data to determine the voice quality M as discussed in Figure 2.
  • M is a directly related to the MOS of the signal.
  • the QoS of the signal is determined based upon measurements of an in- service signal in a non-intrusive manner without the use of a reference signal.
  • the present invention is applicable to communications networks. More specifically, the present invention provides a system, method and apparatus for measuring the quality of a voice signal in a communications network.

Abstract

A method and apparatus for determining an indication of perceptual speech quality for a voice signal. A receiving circuit receives a single electronic voice signal. An analysis circuit, coupled to the receiving circuit, measures a perceptual quality of the voice signal without use of a reference signal and provides an indication of the measured perceptual quality. The measured quality can include an indication of one or more of the following for the received voice signal: a hearing sensation loudness, a sharpness of sound, a fluctuation strength, or a roughness.

Description

DESCRIPTION
SYSTEM AND METHOD FOR VOICE QUALITY OF SERVICE
MEASUREMENT (Technical Field) The present invention relates generally to a system, method and apparatus for measuring the quality of a voice signal in a communications system, and more specifically to a system, method and apparatus for rating the perceptual quality of a voice signal.
(Background Art)
Quality of service (QoS) involves a perceptual measure of quality of user devices such as wireless phones. The QoS relates to, for example, how the user perceives the quality of the phone call, which can include the accessibility of network communication, the retention of the communication, and the integrity of the communication. The accessibility factor includes the ability to access the network and place a phone call, and can be affected by, for example, the number of callers within a cell of a wireless phone network. The retention factor includes any interruptions to the call or loss of signal, resulting in a dropped call. The integrity factor includes the voice quality of the communication, such as the clarity of it and elimination of static noise, and can also include other factors such as general customer service.
These various QoS measurements can affect revenue and customer relations for wireless service providers. For example, poor accessibility can result in lost potential calls and the corresponding revenue from such calls. Dropped calls, from poor retention, can likewise result in loss of on-line time and the revenue for that time. Poor integrity can result in callers "hanging up" and terminating a call early or can even result in users switching service providers. Therefore, monitoring and measuring QoS is important for service providers. Prior art algorithms exist for measuring the integrity aspect of QoS. These algorithms include Perceptual Speech Quality Measure (PSQM) and Bark Spectral Distortion (BSD) . However, current measuring systems require both the transmit voice signal and receive voice signal to perform the QoS analysis, which means that the current art only works effectively for intrusive, out-of-service applications. In particular, current voice quality measuring systems can only monitor one channel at a time in a base station, and the channel under test cannot be used for subscriber service. Currently, there does not exist a means to measure the voice quality of a communications signal without the use of both a transmitted voice signal and a received voice signal. Accordingly, a need exists for an improved system and method that provides voice quality measurement via a non-intrusive manner for in-service applications.
(Disclosure of Invention)
The present invention overcomes the aforementioned shortcomings of the prior art by providing a means to measure the voice quality of a communications signal without the use of a reference signal. The Mono Voice- Quality (MonoVQ) measurement system that is disclosed is based on the acoustic properties of distorted speech, which may include: pitch statistics, pitch perturbations, loudness measures and speech and spectral measures. These measures strongly relate to the acoustic sound qualities of hearing sensation loudness, sharpness of sound, fluctuation strength and roughness of communication signals.
A method consistent with the present invention determines an indication of speech quality for a voice signal. The method includes receiving an electronic voice signal, measuring a perceptual quality of the voice signal without use of a reference signal, and providing an indication of the measured perceptual quality.
An apparatus consistent with the present invention determines an indication of speech quality for a voice signal. The apparatus includes a receiving circuit for receiving an electronic voice signal. An analysis circuit, in communication with the receiving circuit, measures a perceptual quality of the voice signal without use of a reference signal and provides an indication of the measured perceptual quality.
It is, accordingly, an object of the present invention to set forth an improved system and method that measures the voice quality of a speech signal.
It is a further object of the invention to measure the voice quality of a speech signal by use of only the received signal for quality measurement.
It is a further object of the invention to measure the voice quality of a speech signal based upon pitch perturbation, loudness distortion, spectral perturbation, voice activity and background noise.
It is a further object of the invention to a measure voice quality based upon hearing sensation loudness, sharpness of sound, fluctuation of strength and roughness of the signal. Further objects of the invention are apparent from reviewing the Disclosure of the Invention, Best Mode For Carrying Out the Invention and the Claims set forth below,
(Brief Description of Drawings)
The inventions of this application are better understood in conjunction with the following drawings and detailed descriptions of the preferred embodiments. The various hardware and software elements used to carry out the invention are illustrated in the attached drawings in the form of block diagrams, flow charts, and other illustrations, in which:
FIGURE 1 is a generalized illustration of a communications network utilizing the principles of the present invention;
FIGURE 2 is a block diagram of a system for voice quality measurement;
FIGURE 3 is a graph of an exemplary original input signal; FIGURE 4 is a graph of an exemplary medium distortion input signal;
FIGURE 5 is a graph of an exemplary high distortion input signal; FIGURE 6 is a graph of an exemplary pitch perturbation using the voice quality measurement for the original input signal of FIGURE 3;
FIGURE 7 is a graph of an exemplary pitch perturbation using the voice quality measurement for the medium distortion input signal of FIGURE 4; and
FIGURE 8 is a graph of an exemplary pitch perturbation using the voice quality measurement for the high distortion input signal of FIGURE 5.
(Best Mode for Carrying Out the Invention)
The following detailed description is presented to enable any person skilled in the art to make and use the invention. For purposes of explanation, specific nomenclature is set forth to provide a thorough understanding of the present invention. However, it will be apparent to one skilled in the art that these specific details are not required to practice the invention. Descriptions of specific applications are provided only as representative examples. Various modifications to the preferred embodiments will be readily apparent to one skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the invention. The present invention is not intended to be limited to the embodiments shown, but is to be accorded the widest possible scope consistent with the principles and features disclosed herein.
With reference now to FIGURE 1 of the Drawings there is illustrated therein a mobile communications network, generally designated by the reference numeral 100, utilizing the principles of the present invention. Communications network 100 is composed of a base station 110, base station transceiver 120 and mobile transceivers 130.
It is to be appreciated that the signal processing and QoS measurement innovations may be implemented at various points throughout the network. For example, the circuitry used to measure the QoS of the signals may be embedded either in mobile transceivers 130, or in the network, e.g., the base station 110, or in both. The QoS measurement innovations are best suited for implementation within digital signal processor (DSP) chips. For example, the real-time voice signal in digital form that is used to calculate the QoS is available within the memory of DSP chips commonly found in phones 130 and in base stations 110. Within the base station, the network element that contains the real-time voice signal is the DSP chip of the Transcoding Unit (TU) 125 as illustrated in Figure 1. The primary measure of voice quality used within the telephone industry is the mean opinion score (MOS) . A MOS is the result of a subjective listening test, wherein listeners compare various samples generated from voice streams and assign a quality value of 1 to 5 thereto. The MOS is an arithmetic mean determined from a large number of samples for a particular voice stream. Typically, MOS scores have been used to characterize the relative quality of compressed voice streams, although MOS scores can also be used to determine the perceived transmission quality for wireless voice streams, subject to a variety of conditions. As a point of reference, land lines transmitting at 16 kbps typically have MOS scores of around 3.6-3.7; whereas, voice signals compressed to 4.75 kbps for wireless transmission have MOS scores around 3.2.
Since the traditional method of generating a MOS score requires a large sample of people listening to and rating a transmission, it is obviously not practical to make real-time measurements of voice quality. The present invention overcomes this shortcoming by providing a means to measure the signal quality in real-time without the use of a reference signal. Applicant has found that the MonoVQ measurement generated by the present invention corresponds to the industry standard MOS and can be used as a means to provide a reliable measure of the signal quality.
With reference now to FIGURE 2 of the Drawings, there is illustrated therein a block diagram of the Mono Voice-Quality (MonoVQ) system, generally designated by the reference numeral 200, utilizing the principles of the present invention. MonoVQ system 200 includes a source of a speech or voice signal 202, which can be intercepted, for example, at a base station processing a wireless communication. A receiving circuit intercepts the voice signal 202 for processing. For analog signals, the MonoVQ system may include an analog-to-digital converter (ADC) 204 that converts the speech source signal 205 into a corresponding digital signal, X, designated by the reference numeral 206, as illustrated in FIGURE 2.
Next, preprocessing circuits receive the digital voice signal and perform initial processing of the signal before the MonoVQ systems performs QoS analysis of the signal. The preprocessing circuits may include a
Background Noise Canceller (BNC) 208 and a Voice Activity Detector (VAD) 214 that utilize standard telephony algorithms.
BNC 208 receives the digital voice signal 206 and outputs a voice after noise removal signal x' 210 and a background noise level signal B 212 as illustrated in FIGURE 2. VAD 214 receives the signal x' 210 and outputs a voice after silence removal signal x' ' 216 and a voice activity factor signal V 218. Implementation of circuits ADC 204, BNC 208, and VAD 214 are known to those skilled in the art.
The preprocessing circuit can also include a plurality of circuits that receive the signal x'' and process it to output signals that are used to analyze the QoS of the original received voice signal 202. A spectra circuit 220 receives the signal x'' 216 and outputs a spectral perturbation factor S 222 that is received by the analysis circuit 232. A pitch circuit 224 receives the signal x'? 216 and outputs a pitch perturbation factor P 226 that is received by the analysis circuit 232, which can include short-term and long-term pitch perturbation factors. A loudness index L 230 is also generated at loudness 228 as illustrated in Figure 2.
= (EQl)
Figure imgf000012_0001
i = [
N- l l t + 2τf+τ )/4]| θ = [N- ly (EQ2) i-l +2τ. +τ-+1)/4]
1=2
Table 1
Table one displays examples of mathematical equations that may be used to calculate the perturbation metrics. In the equations, f± represents instantaneous pitch frequency, T± represents the pitch period, i represents a frame, and N is the total number of measurements. The above metrics are applied to the spectra and pitch measurements of the signal and used to determine the MonoVQ for a voice signal. The MonoVQ, as discussed above, is closely related to the MOS of a signal and is used as an indicator for the quality of service for a voice signal. The analysis circuit 232 receives the signals S 222, P 226, and L 230 from spectra circuit 220, pitch circuit 224, and loudness circuit 228 respectively, and the signals B 212 and V 218 from BΝC 208 and VAD 214, respectively. Based upon those input signals, analysis circuit 232 outputs a voice quality signal 234 providing an indication of the perceptual quality of the original received voice signal 202. Voice quality signal 234 provides an indication of various perceptual qualities of the QoS integrity of the received signal such as, for example, a hearing sensation loudness, a sharpness of sound, a fluctuation strength or a roughness.
In one embodiment, the inputs to analysis 232 are sampled at a frame rate of 20 milliseconds (ms) . For input speech sampled at 8000 Hz, each frame corresponds to 160 speech samples. By way of illustration, the inputs to analysis 232 may be denoted as follows: Pitch Perturbation Index: Pi; Spectral Perturbation Index: Si; Loudness Perturbation Index: Li; Voice Activity Type: Vi; and Background Noise Level: Bi, where the letter i denotes the frame index. The output of analysis 232 is a single number M for the entire speech signal that statistically relates to the MOS of the original signal. As discussed above, M and MOS are highly correlated when considered for a large set of speech utterances (signals)
M is a statistical function of an internal variable Ui- In one embodiment, the function of Uu is the average, but in other embodiments other functions such as the standard deviation or the Kurtosis of \j i may be also applicable. In general, M = g[ϋϋ], where ϋϋ is derived from the input signals Pi, Si, Li, Vi, and Bi, as follows The first step is to classify the speech sample of the sampled frame. Certain frames of a conversation affect the QoS of a signal more than others. Table 2 below shows how varying levels of significance are assigned to different frame samples. The significance of the sample is indicated by Vi .
Table 2: Phonetic Classification of Input Speech based on Voice Activity
Figure imgf000014_0001
Next, the Vi levels shown in Table 2 are used to assign perceptual significance to the psycho-acoustic perturbation functions Si, Pi, and Li, where Ai = Vi x ( BsSi + BpPi + BLLi) . For example, a high Si (spectral perturbation) during a stationary unvoiced sound like /sh/ is not as perceptually significant as a high Pi (pitch perturbation) during a stationary voiced sound like /a/ as in apple. The factors BP , B and Bs are for scaling purposes. Finally, the influence of background noise levels are accounted for where Ux = Ai + Z x Bi, with Z = 0.9. High values of Bi bias Ux toward high values, regardless of other perceptual metrics. This is because high background noise levels in cellular phones are the most objectionable form of perceptual problems. If Bi is low (close to zero) , background noise has less effect on perceptual quality.
With reference now to FIGURES 3-5 of the Drawings, there are illustrated therein examples of an original input signal 300, a medium distortion input signal 400, and a high distortion input signal 500. The examples in Figures 3, 4 and 5 are illustrative of the type of speech source signals 202 that are likely to be processed by the invention in actual operation. The type of network conditions that may affect the distortion of the signal 202 may include, but are not limited to, handovers from one base station to another base station, movement of the mobile telephone at different speeds (acceleration and deceleration of vehicles) , poor radio signal reception, and turning around corners, i.e., knife edge effects that occur when a vehicle or pedestrian makes a right or left turn. The varying levels of distortion within Figures 3, 4 and 5 can be seen in the signal amplitude loss at several locations. The varying levels of distortion cause the different shape of the signals i.e., if the three examples are superimposed, the distortion phenomena is clearly observed.
With reference now FIGURES 6-8 of the Drawings there are illustrated therein graphs of pitch perturbations for the original signal 300, the medium distortion signal 400 and high distortion signal 500, as determined using analysis circuit 232. As shown in FIGURE 6, the pitch perturbation of the original input signal 300 includes smooth and regular pitch tracks, which indicate that the user is receiving a signal of only minor distortion. As shown in FIGURE 7, the pitch perturbation of the medium distortion input signal 400 includes irregular pitch tracks. The irregular pitch tracks indicate the signal is experiencing a moderate level of distortion. As shown in FIGURE 8, the pitch perturbation of the high distortion input signal 500 includes both abnormal and irregular pitch tracks. The abnormal and irregular pitch tracks indicate that the user is receiving a signal that is experiencing a high level of distortion. The y-axis is the pitch (fundamental frequency) of the input signal in (Hz) Hertz. For example the sound /aaaaaaaa/ may give a pitch of 300 Hz. The sound /eeeeeee/ will give a different (lower) pitch in Hz. Sounds like /she/ are generally unvoiced so the pitch value would simply be blank. The blank pitch values are indicated by the blank portions of the graph in Figures 6-8. Sharp deviations as shown in Figures 7 and particularly pronounced in Figure 8, show that something is abnormal with regards to the pitch of the signal. The numerical computation of the pitch distortion via perturbation functions (Equations 1 and 2 from Table 1) provide data on the deterioration of perceptual quality of the signal and the related MOS. The perturbation is computed inside the pitch box 224, and the output is the pitch perturbation index P 226, with P computed every frame of 20 milliseconds.
It is to be appreciated that the spectra perturbation measurements are generated in the same manner as the pitch perturbation metrics described above. The numerical computation of the spectra distortion via perturbation functions (Equations 1 and 2 from Table 1) provide data on the deterioration of perceptual quality of the signal and the related MOS. The perturbation is computed inside the spectra box 220, and the output is the spectra perturbation index S 222.
The pitch perturbation and spectra perturbation measurements described above are generated at the pitch circuit 224 and spectra circuit 220 respectively and then passed on to the analysis circuit 232. Similarly, the analysis circuit 232 receives signal data from other circuits, e.g. BNC 208, VAD 214 and loudness circuit 228. The analysis circuit 232 then uses the perturbation measurements and other data to determine the voice quality M as discussed in Figure 2. M is a directly related to the MOS of the signal. Thus, the QoS of the signal is determined based upon measurements of an in- service signal in a non-intrusive manner without the use of a reference signal.
While the present invention has been described in connection with an exemplary embodiment, it will be understood that many modifications will be readily apparent to those skilled in the art, and this application is intended to cover any adaptations or variations thereof. For example, various types of circuit components for implementing the identified signal processing functions may be used without departing from the scope of the invention. This invention should be limited only by the claims and equivalents thereof. The present application claims priority from United States provisional patent application No. 60/267,569, entitled "System and Method for Voice Quality of Service Measurement", filed on February 9, 2001, which is incorporated herein by reference as if fully set forth.
(Industrial Applicability)
The present invention is applicable to communications networks. More specifically, the present invention provides a system, method and apparatus for measuring the quality of a voice signal in a communications network.

Claims

1. In a communications network, a method for determining an indication of speech quality in a voice signal received over said communications network, said method comprising the steps of: preprocessing said voice signal, wherein said preprocessing includes determining pitch and spectral perturbations of said voice signal; measuring a perceptual quality of the voice signal based at least in part on said pitch and spectral perturbations without use of a reference signal; and providing an indication of the measured perceptual quality.
2. The method according to claim 1 wherein said step of preprocessing said voice signal further includes determining a loudness measure of said voice signal.
3. The method according to claim 1 wherein said step of preprocessing said voice signal further includes determining the background noise level of said voice signal .
4. The method according to claim 1 wherein said step of preprocessing said voice signal further includes determining the voice activity factor of said voice signal .
5. The method according to claim 1 wherein said measuring step includes measuring the perceptual quality based upon one or more factors selected from the group consisting of: pitch perturbation, short-term pitch perturbation, long-term pitch perturbation, loudness distortion, spectral perturbation, voice activity factor and background noise level.
6. The method according to claim 1 wherein the providing step includes providing an indication of one or more perceptual qualities selected from the group consisting of: a hearing sensation loudness, a sharpness of sound, a fluctuation strength and a roughness.
7. The method according to claim 1 wherein said method for determining an indication of speech quality occurs at a base station.
8. The method according to claim 1 wherein said method for determining an indication of speech quality occurs at a mobile transceiver.
9. In a communications network, a system for determining an indication of speech quality in a voice signal received over said communications network, said system comprising: a plurality of communication devices; receiving means for receiving an electronic voice signal; preprocessing means, wherein said preprocessing includes determining pitch and spectral perturbations of said voice signal; measuring means for measuring the perceptual quality of the voice signal based at least in part on said pitch and spectral perturbations without use of a reference signal; and providing means for providing an indication of the measured perceptual quality.
10. The system according to claim 9 wherein said preprocessing means determines a loudness measure of said voice signal.
11. The system according to claim 9 wherein said preprocessing means determines the background noise level of said voice signal.
12. The system according to claim 9 wherein said preprocessing means further includes determining the voice activity factor of said voice signal.
13. The system according to claim 9 wherein said measuring means measures the perceptual quality based upon one or more factors selected from the group consisting of: pitch perturbation, short-term pitch perturbation, long-term pitch perturbation, loudness distortion, spectral distortion, voice activity factor and background noise level.
14. The system according to claim 9 wherein said preprocessing means determines the short-term pitch perturbation factor of said voice signal.
15. The system according to claim 9 wherein said preprocessing means determines the long-term pitch perturbation factor of said voice signal.
16. The system according to claim 9 wherein said providing means, provides an indication of one or more perceptual qualities selected from the group consisting of: a hearing sensation loudness, a sharpness of sound, a fluctuation strength and a roughness.
17. An apparatus for determining an indication of speech quality in a voice signal received over a communications network, said apparatus comprising: a receiving circuit for receiving an electronic voice signal; and an analysis circuit, coupled to said receiving circuit, for measuring a perceptual quality of the voice signal based at least in part on said pitch and spectral perturbations, without use of a reference signal and for providing an indication of the measured perceptual quality.
18. The apparatus according to claim 17 wherein said receiving circuit includes an analog-to-digital converter for receiving an analog voice signal and outputting a corresponding digital voice signal.
19. The apparatus according to claim 17, further including a preprocessing circuit, coupled between said receiving circuit and said analysis circuit, for performing preprocessing of said digital voice signal.
20. The apparatus according to claim 19 wherein said preprocessing circuit further includes a background noise canceller.
21. The apparatus according to claim 19 wherein said preprocessing circuit further includes a voice activity detector.
22. The apparatus according to claim 19 wherein said preprocessing circuit further includes a spectra circuit and a pitch circuit.
23. The apparatus according to claim 19 wherein said preprocessing circuit further includes a loudness circuit,
24. The apparatus according to claim 17 wherein said analysis circuit measures the perceptual quality based upon a short-term pitch perturbation factor of said digital voice signal.
25. The apparatus according to claim 17 wherein said analysis circuit measures the perceptual quality based upon a long-term pitch perturbation factor of said digital voice signal.
26. The apparatus according to claim 17 wherein said analysis circuit measures the perceptual quality based upon a background noise level related to said digital voice signal.
27. The apparatus according to claim 17 wherein said analysis circuit provides an indication of one or more perceptual qualities chosen from the group consisting of: a hearing sensation loudness, a sharpness of sound and a fluctuation strength and roughness.
28. The apparatus according to claim 17 wherein said analysis circuit is located in a mobile telephone.
29. The apparatus according to claim 17 wherein said analysis circuit is located in a network base station.
PCT/JP2002/000658 2001-02-09 2002-01-29 System and method for voice quality of service measurement WO2002065456A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US26756901P 2001-02-09 2001-02-09
US60/267,569 2001-02-09

Publications (1)

Publication Number Publication Date
WO2002065456A1 true WO2002065456A1 (en) 2002-08-22

Family

ID=23019330

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2002/000658 WO2002065456A1 (en) 2001-02-09 2002-01-29 System and method for voice quality of service measurement

Country Status (1)

Country Link
WO (1) WO2002065456A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004112002A1 (en) 2003-06-17 2004-12-23 Opticom, Dipl.-Ing. Michael Keyhl Gmbh Extraction of test signal sections for measuring the quality of an audio signal
EP1585111A1 (en) * 2004-04-05 2005-10-12 Lucent Technologies Inc. A real -time objective voice analyzer
US7830070B2 (en) 2008-02-12 2010-11-09 Bacoustics, Llc Ultrasound atomization system
US8055201B1 (en) 2006-07-21 2011-11-08 Nextel Communications Inc. System and method for providing integrated voice quality measurements for wireless networks
CN103716470A (en) * 2012-09-29 2014-04-09 华为技术有限公司 Method and device for speech quality monitoring

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000070604A1 (en) * 1999-05-18 2000-11-23 Mci Worldcom, Inc. Method and system for measurement of speech distortion from samples of telephonic voice signals

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000070604A1 (en) * 1999-05-18 2000-11-23 Mci Worldcom, Inc. Method and system for measurement of speech distortion from samples of telephonic voice signals
US20010014855A1 (en) * 1999-05-18 2001-08-16 Hardy William C. Method and system for measurement of speech distortion from samples of telephonic voice signals

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
AU O C ET AL: "A novel output-based objective speech quality measure for wireless communication", PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, XX, XX, vol. 1, 12 October 1998 (1998-10-12), pages 666 - 669, XP002159015 *
IMAIZUMI S ET AL: "ACOUSTIC AND PERCEPTUAL MODELLING OF THE VOICE QUALITY CAUSED BY FUNDAMENTAL FREQUENCY PERTURBATION", PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING (ICSLP). BANFF, OCT. 12 - 16, 1992, EDMONTON, UNIVERITY OF ALBERTA, CA, vol. 1, 12 October 1992 (1992-10-12), pages 133 - 136, XP000850079 *
JIN LIANG ET AL: "Output-based objective speech quality", VEHICULAR TECHNOLOGY CONFERENCE, 1994 IEEE 44TH STOCKHOLM, SWEDEN 8-10 JUNE 1994, NEW YORK, NY, USA,IEEE, 8 June 1994 (1994-06-08), pages 1719 - 1723, XP010123428, ISBN: 0-7803-1927-3 *
JOHN ANDERSON: "Methods for Measuring Perceptual Speech Quality passage", METHODS FOR MEASURING PERCEPTUAL SPEECH QUALITY, XX, XX, 1 March 2001 (2001-03-01), pages 1 - 34, XP002172414 *
WONHO YANG ET AL: "Improvement of MBSD by scaling noise masking threshold and correlation analysis with MOS difference instead of MOS", ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 1999. PROCEEDINGS., 1999 IEEE INTERNATIONAL CONFERENCE ON PHOENIX, AZ, USA 15-19 MARCH 1999, PISCATAWAY, NJ, USA,IEEE, US, 15 March 1999 (1999-03-15), pages 673 - 676, XP010328426, ISBN: 0-7803-5041-3 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004112002A1 (en) 2003-06-17 2004-12-23 Opticom, Dipl.-Ing. Michael Keyhl Gmbh Extraction of test signal sections for measuring the quality of an audio signal
US7680056B2 (en) 2003-06-17 2010-03-16 Opticom Dipl.-Ing M. Keyhl Gmbh Apparatus and method for extracting a test signal section from an audio signal
EP1585111A1 (en) * 2004-04-05 2005-10-12 Lucent Technologies Inc. A real -time objective voice analyzer
US8055201B1 (en) 2006-07-21 2011-11-08 Nextel Communications Inc. System and method for providing integrated voice quality measurements for wireless networks
US7830070B2 (en) 2008-02-12 2010-11-09 Bacoustics, Llc Ultrasound atomization system
CN103716470A (en) * 2012-09-29 2014-04-09 华为技术有限公司 Method and device for speech quality monitoring
EP2884493A4 (en) * 2012-09-29 2015-10-21 Huawei Tech Co Ltd Method and apparatus for voice quality monitoring

Similar Documents

Publication Publication Date Title
US5987320A (en) Quality measurement method and apparatus for wireless communicaion networks
EP0981888B1 (en) Testing telecommunications equipment
EP1206104B1 (en) Measuring a talking quality of a telephone link in a telecommunications network
Rix Perceptual speech quality assessment-a review
US5940792A (en) Nonintrusive testing of telecommunication speech by determining deviations from invariant characteristics or relationships
US6823302B1 (en) Real-time quality analyzer for voice and audio signals
US8731184B2 (en) Performance testing of echo cancellers using a white noise test signal
US7050924B2 (en) Test signalling
Ding et al. Non-intrusive single-ended speech quality assessment in VoIP
WO2002065456A1 (en) System and method for voice quality of service measurement
Moeller et al. Objective estimation of speech quality for communication systems
Ding et al. Measurement of the effects of temporal clipping on speech quality
US7412375B2 (en) Speech quality assessment with noise masking
Côté et al. Speech communication
JP4113481B2 (en) Voice quality objective evaluation apparatus and voice quality objective evaluation method
WO2000072306A1 (en) Real-time quality analyzer for voice and audio signals
Terekhov et al. Improved accuracy intrusive method for speech quality evaluation based on consideration of intonation impact
US20050228655A1 (en) Real-time objective voice analyzer
Egi et al. Objective quality evaluation method for noise-reduced speech
Somek et al. Speech quality assessment
KR100275478B1 (en) Objective speech quality measure method highly correlated to subjective speech quality
Olabisi et al. Dual Perspectives to the Assessment of Quality of Service of Transmitted Speech
Holub et al. Impact of end to end encryption on GSM speech transmission quality-a case study
Chan et al. Machine assessment of speech communication quality
BURNEY et al. THE PERCEPTUAL ANALYSIS MEASUREMENT SYSTEM FOR END-TO-END NETWORK SPEECH QUALITY

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 69(1)EPC

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP