WO2002065456A1

WO2002065456A1 - System and method for voice quality of service measurement

Info

Publication number: WO2002065456A1
Application number: PCT/JP2002/000658
Authority: WO
Inventors: Kambiz Homayounfar
Original assignee: Genista Corporation
Priority date: 2001-02-09
Filing date: 2002-01-29
Publication date: 2002-08-22

Abstract

A method and apparatus for determining an indication of perceptual speech quality for a voice signal. A receiving circuit receives a single electronic voice signal. An analysis circuit, coupled to the receiving circuit, measures a perceptual quality of the voice signal without use of a reference signal and provides an indication of the measured perceptual quality. The measured quality can include an indication of one or more of the following for the received voice signal: a hearing sensation loudness, a sharpness of sound, a fluctuation strength, or a roughness.

Description

DESCRIPTION

SYSTEM AND METHOD FOR VOICE QUALITY OF SERVICE

MEASUREMENT (Technical Field) The present invention relates generally to a system, method and apparatus for measuring the quality of a voice signal in a communications system, and more specifically to a system, method and apparatus for rating the perceptual quality of a voice signal.

(Background Art)

Quality of service (QoS) involves a perceptual measure of quality of user devices such as wireless phones. The QoS relates to, for example, how the user perceives the quality of the phone call, which can include the accessibility of network communication, the retention of the communication, and the integrity of the communication. The accessibility factor includes the ability to access the network and place a phone call, and can be affected by, for example, the number of callers within a cell of a wireless phone network. The retention factor includes any interruptions to the call or loss of signal, resulting in a dropped call. The integrity factor includes the voice quality of the communication, such as the clarity of it and elimination of static noise, and can also include other factors such as general customer service.

These various QoS measurements can affect revenue and customer relations for wireless service providers. For example, poor accessibility can result in lost potential calls and the corresponding revenue from such calls. Dropped calls, from poor retention, can likewise result in loss of on-line time and the revenue for that time. Poor integrity can result in callers "hanging up" and terminating a call early or can even result in users switching service providers. Therefore, monitoring and measuring QoS is important for service providers. Prior art algorithms exist for measuring the integrity aspect of QoS. These algorithms include Perceptual Speech Quality Measure (PSQM) and Bark Spectral Distortion (BSD) . However, current measuring systems require both the transmit voice signal and receive voice signal to perform the QoS analysis, which means that the current art only works effectively for intrusive, out-of-service applications. In particular, current voice quality measuring systems can only monitor one channel at a time in a base station, and the channel under test cannot be used for subscriber service. Currently, there does not exist a means to measure the voice quality of a communications signal without the use of both a transmitted voice signal and a received voice signal. Accordingly, a need exists for an improved system and method that provides voice quality measurement via a non-intrusive manner for in-service applications.

(Disclosure of Invention)

The present invention overcomes the aforementioned shortcomings of the prior art by providing a means to measure the voice quality of a communications signal without the use of a reference signal. The Mono Voice- Quality (MonoVQ) measurement system that is disclosed is based on the acoustic properties of distorted speech, which may include: pitch statistics, pitch perturbations, loudness measures and speech and spectral measures. These measures strongly relate to the acoustic sound qualities of hearing sensation loudness, sharpness of sound, fluctuation strength and roughness of communication signals.

A method consistent with the present invention determines an indication of speech quality for a voice signal. The method includes receiving an electronic voice signal, measuring a perceptual quality of the voice signal without use of a reference signal, and providing an indication of the measured perceptual quality.

An apparatus consistent with the present invention determines an indication of speech quality for a voice signal. The apparatus includes a receiving circuit for receiving an electronic voice signal. An analysis circuit, in communication with the receiving circuit, measures a perceptual quality of the voice signal without use of a reference signal and provides an indication of the measured perceptual quality.

It is, accordingly, an object of the present invention to set forth an improved system and method that measures the voice quality of a speech signal.

It is a further object of the invention to measure the voice quality of a speech signal by use of only the received signal for quality measurement.

It is a further object of the invention to measure the voice quality of a speech signal based upon pitch perturbation, loudness distortion, spectral perturbation, voice activity and background noise.

It is a further object of the invention to a measure voice quality based upon hearing sensation loudness, sharpness of sound, fluctuation of strength and roughness of the signal. Further objects of the invention are apparent from reviewing the Disclosure of the Invention, Best Mode For Carrying Out the Invention and the Claims set forth below,

(Brief Description of Drawings)

The inventions of this application are better understood in conjunction with the following drawings and detailed descriptions of the preferred embodiments. The various hardware and software elements used to carry out the invention are illustrated in the attached drawings in the form of block diagrams, flow charts, and other illustrations, in which:

FIGURE 1 is a generalized illustration of a communications network utilizing the principles of the present invention;

FIGURE 2 is a block diagram of a system for voice quality measurement;

FIGURE 3 is a graph of an exemplary original input signal; FIGURE 4 is a graph of an exemplary medium distortion input signal;

FIGURE 5 is a graph of an exemplary high distortion input signal; FIGURE 6 is a graph of an exemplary pitch perturbation using the voice quality measurement for the original input signal of FIGURE 3;

FIGURE 7 is a graph of an exemplary pitch perturbation using the voice quality measurement for the medium distortion input signal of FIGURE 4; and

FIGURE 8 is a graph of an exemplary pitch perturbation using the voice quality measurement for the high distortion input signal of FIGURE 5.

(Best Mode for Carrying Out the Invention)

The following detailed description is presented to enable any person skilled in the art to make and use the invention. For purposes of explanation, specific nomenclature is set forth to provide a thorough understanding of the present invention. However, it will be apparent to one skilled in the art that these specific details are not required to practice the invention. Descriptions of specific applications are provided only as representative examples. Various modifications to the preferred embodiments will be readily apparent to one skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the invention. The present invention is not intended to be limited to the embodiments shown, but is to be accorded the widest possible scope consistent with the principles and features disclosed herein.

With reference now to FIGURE 1 of the Drawings there is illustrated therein a mobile communications network, generally designated by the reference numeral 100, utilizing the principles of the present invention. Communications network 100 is composed of a base station 110, base station transceiver 120 and mobile transceivers 130.

It is to be appreciated that the signal processing and QoS measurement innovations may be implemented at various points throughout the network. For example, the circuitry used to measure the QoS of the signals may be embedded either in mobile transceivers 130, or in the network, e.g., the base station 110, or in both. The QoS measurement innovations are best suited for implementation within digital signal processor (DSP) chips. For example, the real-time voice signal in digital form that is used to calculate the QoS is available within the memory of DSP chips commonly found in phones 130 and in base stations 110. Within the base station, the network element that contains the real-time voice signal is the DSP chip of the Transcoding Unit (TU) 125 as illustrated in Figure 1. The primary measure of voice quality used within the telephone industry is the mean opinion score (MOS) . A MOS is the result of a subjective listening test, wherein listeners compare various samples generated from voice streams and assign a quality value of 1 to 5 thereto. The MOS is an arithmetic mean determined from a large number of samples for a particular voice stream. Typically, MOS scores have been used to characterize the relative quality of compressed voice streams, although MOS scores can also be used to determine the perceived transmission quality for wireless voice streams, subject to a variety of conditions. As a point of reference, land lines transmitting at 16 kbps typically have MOS scores of around 3.6-3.7; whereas, voice signals compressed to 4.75 kbps for wireless transmission have MOS scores around 3.2.

Since the traditional method of generating a MOS score requires a large sample of people listening to and rating a transmission, it is obviously not practical to make real-time measurements of voice quality. The present invention overcomes this shortcoming by providing a means to measure the signal quality in real-time without the use of a reference signal. Applicant has found that the MonoVQ measurement generated by the present invention corresponds to the industry standard MOS and can be used as a means to provide a reliable measure of the signal quality.

With reference now to FIGURE 2 of the Drawings, there is illustrated therein a block diagram of the Mono Voice-Quality (MonoVQ) system, generally designated by the reference numeral 200, utilizing the principles of the present invention. MonoVQ system 200 includes a source of a speech or voice signal 202, which can be intercepted, for example, at a base station processing a wireless communication. A receiving circuit intercepts the voice signal 202 for processing. For analog signals, the MonoVQ system may include an analog-to-digital converter (ADC) 204 that converts the speech source signal 205 into a corresponding digital signal, X, designated by the reference numeral 206, as illustrated in FIGURE 2.

Next, preprocessing circuits receive the digital voice signal and perform initial processing of the signal before the MonoVQ systems performs QoS analysis of the signal. The preprocessing circuits may include a

Background Noise Canceller (BNC) 208 and a Voice Activity Detector (VAD) 214 that utilize standard telephony algorithms.

BNC 208 receives the digital voice signal 206 and outputs a voice after noise removal signal x' 210 and a background noise level signal B 212 as illustrated in FIGURE 2. VAD 214 receives the signal x' 210 and outputs a voice after silence removal signal x' ' 216 and a voice activity factor signal V 218. Implementation of circuits ADC 204, BNC 208, and VAD 214 are known to those skilled in the art.

The preprocessing circuit can also include a plurality of circuits that receive the signal x'' and process it to output signals that are used to analyze the QoS of the original received voice signal 202. A spectra circuit 220 receives the signal x'' 216 and outputs a spectral perturbation factor S 222 that is received by the analysis circuit 232. A pitch circuit 224 receives the signal x'^? 216 and outputs a pitch perturbation factor P 226 that is received by the analysis circuit 232, which can include short-term and long-term pitch perturbation factors. A loudness index L 230 is also generated at loudness 228 as illustrated in Figure 2.

= (EQl)

i = [

N- l l _t + 2τ_f+τ )/4]| θ = [N- ly (EQ2) i-l +2τ. +τ-₊₁)/4]

1=2

Table 1

Table one displays examples of mathematical equations that may be used to calculate the perturbation metrics. In the equations, f_± represents instantaneous pitch frequency, T± represents the pitch period, i represents a frame, and N is the total number of measurements. The above metrics are applied to the spectra and pitch measurements of the signal and used to determine the MonoVQ for a voice signal. The MonoVQ, as discussed above, is closely related to the MOS of a signal and is used as an indicator for the quality of service for a voice signal. The analysis circuit 232 receives the signals S 222, P 226, and L 230 from spectra circuit 220, pitch circuit 224, and loudness circuit 228 respectively, and the signals B 212 and V 218 from BΝC 208 and VAD 214, respectively. Based upon those input signals, analysis circuit 232 outputs a voice quality signal 234 providing an indication of the perceptual quality of the original received voice signal 202. Voice quality signal 234 provides an indication of various perceptual qualities of the QoS integrity of the received signal such as, for example, a hearing sensation loudness, a sharpness of sound, a fluctuation strength or a roughness.

In one embodiment, the inputs to analysis 232 are sampled at a frame rate of 20 milliseconds (ms) . For input speech sampled at 8000 Hz, each frame corresponds to 160 speech samples. By way of illustration, the inputs to analysis 232 may be denoted as follows: Pitch Perturbation Index: Pi; Spectral Perturbation Index: Si; Loudness Perturbation Index: Li; Voice Activity Type: Vi; and Background Noise Level: Bi, where the letter i denotes the frame index. The output of analysis 232 is a single number M for the entire speech signal that statistically relates to the MOS of the original signal. As discussed above, M and MOS are highly correlated when considered for a large set of speech utterances (signals)

M is a statistical function of an internal variable Ui- In one embodiment, the function of Uu is the average, but in other embodiments other functions such as the standard deviation or the Kurtosis of \j _i may be also applicable. In general, M = g[ϋ_ϋ], where ϋ_ϋ is derived from the input signals Pi, Si, Li, Vi, and Bi, as follows The first step is to classify the speech sample of the sampled frame. Certain frames of a conversation affect the QoS of a signal more than others. Table 2 below shows how varying levels of significance are assigned to different frame samples. The significance of the sample is indicated by Vi .

Table 2: Phonetic Classification of Input Speech based on Voice Activity

Next, the Vi levels shown in Table 2 are used to assign perceptual significance to the psycho-acoustic perturbation functions Si, Pi, and Li, where Ai = Vi x ( B_sSi + BpPi + B_LLi) . For example, a high Si (spectral perturbation) during a stationary unvoiced sound like /sh/ is not as perceptually significant as a high Pi (pitch perturbation) during a stationary voiced sound like /a/ as in apple. The factors B_P , B and B_s are for scaling purposes. Finally, the influence of background noise levels are accounted for where U_x = Ai + Z x Bi, with Z = 0.9. High values of Bi bias U_x toward high values, regardless of other perceptual metrics. This is because high background noise levels in cellular phones are the most objectionable form of perceptual problems. If Bi is low (close to zero) , background noise has less effect on perceptual quality.

With reference now to FIGURES 3-5 of the Drawings, there are illustrated therein examples of an original input signal 300, a medium distortion input signal 400, and a high distortion input signal 500. The examples in Figures 3, 4 and 5 are illustrative of the type of speech source signals 202 that are likely to be processed by the invention in actual operation. The type of network conditions that may affect the distortion of the signal 202 may include, but are not limited to, handovers from one base station to another base station, movement of the mobile telephone at different speeds (acceleration and deceleration of vehicles) , poor radio signal reception, and turning around corners, i.e., knife edge effects that occur when a vehicle or pedestrian makes a right or left turn. The varying levels of distortion within Figures 3, 4 and 5 can be seen in the signal amplitude loss at several locations. The varying levels of distortion cause the different shape of the signals i.e., if the three examples are superimposed, the distortion phenomena is clearly observed.

With reference now FIGURES 6-8 of the Drawings there are illustrated therein graphs of pitch perturbations for the original signal 300, the medium distortion signal 400 and high distortion signal 500, as determined using analysis circuit 232. As shown in FIGURE 6, the pitch perturbation of the original input signal 300 includes smooth and regular pitch tracks, which indicate that the user is receiving a signal of only minor distortion. As shown in FIGURE 7, the pitch perturbation of the medium distortion input signal 400 includes irregular pitch tracks. The irregular pitch tracks indicate the signal is experiencing a moderate level of distortion. As shown in FIGURE 8, the pitch perturbation of the high distortion input signal 500 includes both abnormal and irregular pitch tracks. The abnormal and irregular pitch tracks indicate that the user is receiving a signal that is experiencing a high level of distortion. The y-axis is the pitch (fundamental frequency) of the input signal in (Hz) Hertz. For example the sound /aaaaaaaa/ may give a pitch of 300 Hz. The sound /eeeeeee/ will give a different (lower) pitch in Hz. Sounds like /she/ are generally unvoiced so the pitch value would simply be blank. The blank pitch values are indicated by the blank portions of the graph in Figures 6-8. Sharp deviations as shown in Figures 7 and particularly pronounced in Figure 8, show that something is abnormal with regards to the pitch of the signal. The numerical computation of the pitch distortion via perturbation functions (Equations 1 and 2 from Table 1) provide data on the deterioration of perceptual quality of the signal and the related MOS. The perturbation is computed inside the pitch box 224, and the output is the pitch perturbation index P 226, with P computed every frame of 20 milliseconds.

It is to be appreciated that the spectra perturbation measurements are generated in the same manner as the pitch perturbation metrics described above. The numerical computation of the spectra distortion via perturbation functions (Equations 1 and 2 from Table 1) provide data on the deterioration of perceptual quality of the signal and the related MOS. The perturbation is computed inside the spectra box 220, and the output is the spectra perturbation index S 222.

The pitch perturbation and spectra perturbation measurements described above are generated at the pitch circuit 224 and spectra circuit 220 respectively and then passed on to the analysis circuit 232. Similarly, the analysis circuit 232 receives signal data from other circuits, e.g. BNC 208, VAD 214 and loudness circuit 228. The analysis circuit 232 then uses the perturbation measurements and other data to determine the voice quality M as discussed in Figure 2. M is a directly related to the MOS of the signal. Thus, the QoS of the signal is determined based upon measurements of an in- service signal in a non-intrusive manner without the use of a reference signal.

While the present invention has been described in connection with an exemplary embodiment, it will be understood that many modifications will be readily apparent to those skilled in the art, and this application is intended to cover any adaptations or variations thereof. For example, various types of circuit components for implementing the identified signal processing functions may be used without departing from the scope of the invention. This invention should be limited only by the claims and equivalents thereof. The present application claims priority from United States provisional patent application No. 60/267,569, entitled "System and Method for Voice Quality of Service Measurement", filed on February 9, 2001, which is incorporated herein by reference as if fully set forth.

(Industrial Applicability)

The present invention is applicable to communications networks. More specifically, the present invention provides a system, method and apparatus for measuring the quality of a voice signal in a communications network.

Claims

1. In a communications network, a method for determining an indication of speech quality in a voice signal received over said communications network, said method comprising the steps of: preprocessing said voice signal, wherein said preprocessing includes determining pitch and spectral perturbations of said voice signal; measuring a perceptual quality of the voice signal based at least in part on said pitch and spectral perturbations without use of a reference signal; and providing an indication of the measured perceptual quality.

2. The method according to claim 1 wherein said step of preprocessing said voice signal further includes determining a loudness measure of said voice signal.

3. The method according to claim 1 wherein said step of preprocessing said voice signal further includes determining the background noise level of said voice signal .

4. The method according to claim 1 wherein said step of preprocessing said voice signal further includes determining the voice activity factor of said voice signal .

5. The method according to claim 1 wherein said measuring step includes measuring the perceptual quality based upon one or more factors selected from the group consisting of: pitch perturbation, short-term pitch perturbation, long-term pitch perturbation, loudness distortion, spectral perturbation, voice activity factor and background noise level.

6. The method according to claim 1 wherein the providing step includes providing an indication of one or more perceptual qualities selected from the group consisting of: a hearing sensation loudness, a sharpness of sound, a fluctuation strength and a roughness.

7. The method according to claim 1 wherein said method for determining an indication of speech quality occurs at a base station.

8. The method according to claim 1 wherein said method for determining an indication of speech quality occurs at a mobile transceiver.

9. In a communications network, a system for determining an indication of speech quality in a voice signal received over said communications network, said system comprising: a plurality of communication devices; receiving means for receiving an electronic voice signal; preprocessing means, wherein said preprocessing includes determining pitch and spectral perturbations of said voice signal; measuring means for measuring the perceptual quality of the voice signal based at least in part on said pitch and spectral perturbations without use of a reference signal; and providing means for providing an indication of the measured perceptual quality.

10. The system according to claim 9 wherein said preprocessing means determines a loudness measure of said voice signal.

11. The system according to claim 9 wherein said preprocessing means determines the background noise level of said voice signal.

12. The system according to claim 9 wherein said preprocessing means further includes determining the voice activity factor of said voice signal.

13. The system according to claim 9 wherein said measuring means measures the perceptual quality based upon one or more factors selected from the group consisting of: pitch perturbation, short-term pitch perturbation, long-term pitch perturbation, loudness distortion, spectral distortion, voice activity factor and background noise level.

14. The system according to claim 9 wherein said preprocessing means determines the short-term pitch perturbation factor of said voice signal.

15. The system according to claim 9 wherein said preprocessing means determines the long-term pitch perturbation factor of said voice signal.

16. The system according to claim 9 wherein said providing means, provides an indication of one or more perceptual qualities selected from the group consisting of: a hearing sensation loudness, a sharpness of sound, a fluctuation strength and a roughness.

17. An apparatus for determining an indication of speech quality in a voice signal received over a communications network, said apparatus comprising: a receiving circuit for receiving an electronic voice signal; and an analysis circuit, coupled to said receiving circuit, for measuring a perceptual quality of the voice signal based at least in part on said pitch and spectral perturbations, without use of a reference signal and for providing an indication of the measured perceptual quality.

18. The apparatus according to claim 17 wherein said receiving circuit includes an analog-to-digital converter for receiving an analog voice signal and outputting a corresponding digital voice signal.

19. The apparatus according to claim 17, further including a preprocessing circuit, coupled between said receiving circuit and said analysis circuit, for performing preprocessing of said digital voice signal.

20. The apparatus according to claim 19 wherein said preprocessing circuit further includes a background noise canceller.

21. The apparatus according to claim 19 wherein said preprocessing circuit further includes a voice activity detector.

22. The apparatus according to claim 19 wherein said preprocessing circuit further includes a spectra circuit and a pitch circuit.

23. The apparatus according to claim 19 wherein said preprocessing circuit further includes a loudness circuit,

24. The apparatus according to claim 17 wherein said analysis circuit measures the perceptual quality based upon a short-term pitch perturbation factor of said digital voice signal.

25. The apparatus according to claim 17 wherein said analysis circuit measures the perceptual quality based upon a long-term pitch perturbation factor of said digital voice signal.

26. The apparatus according to claim 17 wherein said analysis circuit measures the perceptual quality based upon a background noise level related to said digital voice signal.

27. The apparatus according to claim 17 wherein said analysis circuit provides an indication of one or more perceptual qualities chosen from the group consisting of: a hearing sensation loudness, a sharpness of sound and a fluctuation strength and roughness.

28. The apparatus according to claim 17 wherein said analysis circuit is located in a mobile telephone.

29. The apparatus according to claim 17 wherein said analysis circuit is located in a network base station.