US3549806A - Fundamental pitch frequency signal extraction system for complex signals - Google Patents

Fundamental pitch frequency signal extraction system for complex signals Download PDF

Info

Publication number
US3549806A
US3549806A US636390A US3549806DA US3549806A US 3549806 A US3549806 A US 3549806A US 636390 A US636390 A US 636390A US 3549806D A US3549806D A US 3549806DA US 3549806 A US3549806 A US 3549806A
Authority
US
United States
Prior art keywords
frequency
signal
amplitude
output
detector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
US636390A
Inventor
David E Wood
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
General Electric Co
Original Assignee
General Electric Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by General Electric Co filed Critical General Electric Co
Application granted granted Critical
Publication of US3549806A publication Critical patent/US3549806A/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/93Discriminating between voiced and unvoiced parts of speech signals

Definitions

  • This invention relates to pitch frequency signal extraction, and more particularly to extraction of the fundamental pitch frequency signal from a complex voice frequency signal.
  • voiced sounds are produced by air being forced out of the lungs and through the vocal tract.
  • the air stream, in passing through the vocal cords, is modulated so as to produce a spectrum rich in harmonics.
  • the size and shape of the throat and oral cavities as controlled by tongue and lip positions produce further modifications to this air stream by reinforcing or attenuating particular portions of the spectrum.
  • the final output energy is a function of the original excitation energy and resonance filtering in the cavities.
  • the basic pulsation rate of the air passing between the vocal cords is present in the output energy waveform as a series of repetitive complexes which repeat at this basic rate.
  • the fundamental periodicity for vocal pitch of the output energy which may be termed the fundamental pitch frequency, varies from about 65 to 500 impulses per second for different speakers. Correct reproduction of the vocal pitch has been found to be essential to the synthesis of natural sounding speech, as required in vocoder communication systems for example.
  • the fundamental pitch frequency signal may be readily extracted with a high degree of accuracy and minimum delay since, as I have discovered, a high amplitude peak occurs at the fundamental pitch frequency. Extraction of the fundamental pitch frequency signal is further facilitated by employment of a high signal sampling rate which enables detection of transitions from one phoneme to the next, as well as detection of unvoiced periods.
  • One object of this invention is to provide a method and apparatus for rapidly and accurately determining the fundamental pitch frequency of a complex voice frequency signal from which the fundamental pitch frequency signal may have been lost.
  • a method of extracting the fundamental pitch frequency of a complex voice frequency signal comprises the steps of separating the signal into a spectral band of its constituent sinusoidal frequencies, and repetitively detecting the lowest frequency in each band at which a relatively large amplitude peak occurs. Consistent detection of the same frequency at which this lowest frequency, relatively large amplitude peak occurs establishes this frequency as the fundamental pitch frequency of the signal. Interpolation between abrupt amplitude changes from one lowest frequency peak to the next enables reconstruction of the fundamental pitch frequency signal.
  • apparatus for extracting the fundamental pitch frequency signal of a complex voice frequency signal includes frequency analyzer means responsive to the-voice frequency signal for separating the voice frequency signal into a spectral bank of the sinusoidal signal constituents thereof, and signal amplitude peak detecting means responsive to the frequency analyzer means and producing a pulse at the lowest frequency at which a relatively large amplitude peak occurs in each spectral band produced by the analyzer means. Gating means responsive to the amplitude peak detecting means are provided for passing pulses produced by the amplitude peak detecting means to the output of the apparatus.
  • FIG. 1 is a block diagram of the complex signal fundamental pitch frequency extractor of the instant invention
  • FIG. 2 is a series of waveforms of analyzed voiced speech as produced by the frequency analyzer of FIG. 1;
  • FIG. 3 is a block diagram of the frequency spectrum peak detector of FIG. I;
  • FIGS. 4A4E are a series of waveforms to aid in the description of FIG. 3;
  • FIG. 5 is a block diagram of the vocal pitch detector of the system of FIG. 1;
  • FIG. 6 is a block diagram of the voicing decision logic circuitry used in the system in FIG. I; and FIG. 7 is a schematic diagram of the smoothing circuit of the system of FIG. 1.
  • FIG. 1 illustrates the basic system for extracting the fundamental pitch frequency signal from a complex voice frequency signal.
  • Electrical signals corresponding to speech and produced, for example, by a microphone or other type of transducer are supplied through a high pass filter 10 to a full wave rectifier 11.
  • the function of high pass filter I0 is to block low frequency signal components that might otherwise interfere with the fundamental pitch frequency.
  • the purpose of full wave rectifier 11 is to produce a signal that has a strong frequency component at the fundamental pitch frequency during voiced speed; that is, to regenerate the fundamental pitch frequency. This is essential since not only do many forms of telephone communications greatly attenuate or even completely block the fundamental pitch frequency, but high pass filter 10 itself may remove the fundamental pitch frequency from the signal.
  • Output of full wave rectifier 11 is supplied to a frequency analyzer 12, a first input of an amplitude ratio detector 13, and an absolute level threshold detector 14.
  • Frequency analyzer 12 may typically be that shown and described in D. E. Wood U.S. Pat. No. 3,243,703,issued Mar. 29, 1966 and assigned to the instant assignee.
  • the frequency analyzer repetitively scans an input signal at a fixed repetition rate and produces an output signal corresponding to the amplitude of the sinusoidal constituents or components of the input signal for each scan.
  • the output signal of the frequency analyzer corresponds to a sequence of scans, each scan ranging from one end of the scanned frequency spectrum to the other. This is illustrated in the analyzed voice speech waveforms shown in FIG.
  • the frequency analyzer can facilitate extraction of the fundamental pitch frequency provided this peak at the fundamental pitch frequency can be detected in each scan of the frequency analyzer by additional apparatus in the system of FIG. I.
  • Frequency analysis of the rectified speech is typically performed by frequency analyzer 12 over the range of 65 Hz to 500 Hz.
  • the fundamental pitch component for both male and female speakers falls within this range, so that it can be resolved and accurately defined.
  • the signal Prior to analysis of the input signal, the signal is heterodyned within the frequency analyzer to an intermediate frequency range.
  • the frequency analyzer typically includes a bank of i6 filters of 30 Hz bandwidth with a scanning repetition rate, in order of frequency through the filter bank, of 500 scans per second, each filter is sampled every 2 milliseconds.
  • Output of frequency analyzer 12 is supplied to a frequency spectrum peak detector 15 and a vocal pitch detector 16. Since the output of frequency. analyzer 12 is a continuous function of logarithmic amplitude versus frequency for each scan, the frequency of the fundamental sinusoidal component can be measured by detecting the lowest frequency in the spectrum scan where energy is a maximum, as indicated in FIG. 2 by the peaks occurring at the fundamental pitch frequency.
  • Frequency spectrum peak detector 15 detects high amplitude peaks by combining detection of zero amplitude slope with detection of a previously positive amplitude slope, so as to distinguish maxima from minima, and furnishes a gating signal to vocal pitch detector 16 upon detection of the lowest frequency in each spectrum scan at which such maximum occurs.
  • Pitch detection is accomplished in vocal pitch detector 16 by detecting, once voicing is established, the vocal pitch fundamental frequency. This frequency is indicated by the position of the lowest frequency high amplitude peak in the spectrum produced by analyzer 12. During the initial buildup of energy at the onset of voicing, the position of the lowest frequency high amplitude peak shifts erratically, and often appreciably, from one scan to the next. As voicing is established, however, the position of the lowest frequency high amplitude peak becomes relatively stable, and changes position only at rates resulting from changes in articulation. Thus, vocal pitch can be detected by measuring the position of the first high amplitude spectrum peak on successive scans; when the frequency change from scan to scan becomes sufficiently small, the frequency at which this spectrum peak occurs is the vocal pitch fundamental frequency.
  • the vocal pitch detector is designed to automatically hold the last fundamental pitch frequency value in the event the apparent pitch output is violently disturbed by bursts of interference or oddities of speech. causing it to change more than a specified amount from the previous scan.
  • a 2 millisecond interval between frequency scans is sufficiently short to allow smooth tracking of true pitch changes, while the hold provision ensures continuity of output during the period of voicing.
  • the vocal pitch detector searches for a new value of pitch; if a stable component is located and voicing continues to be indicated, the system effects a smooth transition from the held value of pitch to the new value, even though the transition is made abruptly by the vocal pitch detector.
  • ratio detector 13 responds to the ratio of the logarithm of the maximum peak amplitude in the rectified speech frequency spectrum to the rectified speech amplitude averaged over a period, which typically may be equal to 3 to 6 times the scan times the scan time offrequency analyzer 12, or about 6 to 12 milliseconds. This averaging period is suffciently long to avoid undue response to spurious transients in the signal.
  • the periodic amplitude modulation of voiced speech varies from at least 10 percent to nearly lOO percent of the signal, and appears in the rectified speech spectrum as a strong component. Unvoiced speech or noise exhibits no such strong frequency component. Thus, a considerable margin exists in the values of this ratio between voiced and unvoiced speech, with the higher ratios indicating voiced speech.
  • Output of amplitude ratio detector 13 is supplied to a first input of voicing decision logic circuitry 17, furnishing a parameter upon which a decision as to whether or not voicing is present may be based.
  • a second input to voicing decision logic 17 is supplied by absolute level threshold detector 14, which determines whether there is sufficient energy in the output of full wave rectifier 11 to validly assume that speech is actually occurring; unless there is sufficient energy in the speech input signal, there is no reason to operate the system. When all conditions for voicing are met at an acceptable confidence level, an output signal is produced indicative of presence of voicing.
  • a third input to voicing decision logic 17 is energized by output signals from vocal pitch detector 16 which indicate that stable tracking of the fundamental pitch frequency is occurring. Accordingly, an additional output signal from voicing decision logic circuitry 17 is supplied to a smoothing circuit i8, enabling the smoothing circuit to produce a fundamental pitch frequency output signal in response to signals from vocal pitch detector 16 during periods beginning when both voicing is present and stable tracking of the fundamental pitch frequency is taking place, and ending only when voicing disaptears.
  • the output signal supplied by vocal pitch detector 16 to smoothing circuit 18 comprises a voltage with stepwise amplitude changes from one scan to the next.
  • This voltage must represent a value corresponding to a stable fundamental pitch frequency, as determined by voicing decision logic circuitry i7, before the voicing decision logic circuitry can supply an output signal to smoothing circuit 18.
  • smoothing circuit l8 minimizes the effect of voltage steps in the output signal of pitch detector 16. This minimization continues until voicing next disappears. Since these steps are relatively closely spaced, such as at 2 millisecond intervals, it is preferable to use a short smoothing time constant of approximately twice this spacing, or 4-5 milliseconds. Thus, relatively little delay is added by the smoothing operation.
  • the system output signal When no voicing is detected, the system output signal is held constant at its value occurring at the instant voicing last disappeared; when stable voicing conditions are detected, the system output signal comprises the fundamental pitch frequency output signal. The end of voicing causes the fundamental pitch frequency output signal to remain at a constant amplitude.
  • the system of FIG. 1 produces an output voltage corresponding to the fundamental pitch frequency signal, during voicing.
  • the speech input signal is furnished to full wave rectifier 11 through high pass filter 10 which eliminates unwanted low frequency noise.
  • the full wave rectifier reinserts the fundamental pitch frequency in the event this frequency has been lost either in transmission or by attenuation in high pass filter 10.
  • the output of full wave rectifier 11 is broken into its constituent frequencies by frequency analyzer 12, which repetitively supplies scanned spectrums of frequency to vocal pitch detector 16 and frequency spectrum peak detector 15.
  • Frequency spectrum peak detector 15 supplies a gating signal to vocal pitch detector 16.
  • the first peak in each scan produced by frequency analyzer l2 and detected by frequency spectrum peak detector 15 is supplied by vocal pitch detector 16 to smoothing circuit 18.
  • smoothing circuit it? adds interpolation between the individual peaks furnished by vocal pitch detector 16 so as to provide a smoothly varying output signal of amplitude corresponding to the frequency of the fundamental pitch frequency signal.
  • voicing decision logic circuitry 17 which requires both presence of sufficient speech signal amplitude as determined by absolute level threshold detector 14, and indication of a sufficiently high amplitude lowest frequency spectrum peak relative to the short term average of the rectified speech produced by full wave rectifier ll, as determined by amplitude ratio detector 13. Additionally, when an indication of stable tracking of the selected fundamental pitch frequency as determined by vocal pitch'detector 16 is also provided, albeit even momentarily, voicing decision logic circuitry 17 produces a signal which al lows the amplitude of the fundamental pitch frequency, as
  • vocal pitch detector 16 sensed by vocal pitch detector 16 to be supplied through smoothing circuit 18 to the system output for as long as voicing continues.
  • FIG. 3 is a block diagram of frequency spectrum peak detector of FIG. 1.
  • the peak detector comprises a differentiator 21 receiving input signals from frequency analyzer 12 of FIG. 1.
  • Output signals from differentiator 21 are supplied to a hysteresis flip-flop 22.
  • This hysteresis flip-flop circuit which may comprise a high gain differential amplifier, is actuated in the positive direction only when the positive input signal exceeds a threshold; however, this circuit returns to its initial condition whenever the input signal returns to zero.
  • a circuit of this type is shown and described in D. E. Wood application Ser. No. 636,324 filed concurrently herewith, now U.S. Pat. No. 3,522,545, and assigned to the instant assignee.
  • Output signals from hysteresis flip-flop circuit 22 are furnished to a second differentiator 23 which, in turn, provides an output signal to the input of an inverter circuit 24.
  • Output signals from inverter circuit 24 are applied through a half wave rectifier 25 to the input of vocal pitch detector 16 shown in FIG. 1.
  • FIG. 4A represents the output signal of frequency analyzer 12 of FIG. 1 as supplied to the input of differentiator 21. This differentiator produces an output waveform as shown. in FIG. 43 wherein zero output voltage corresponds to the zero slope at the voltage maxima and minima in theoutput waveform of frequency analyzer 12.
  • the waveform of FIG. 4A is plotted both against time and against frequency, the waveform of FIG. 413, as well as those of FIGS. 4C-4E are plotted solely against time. For this reason, the waveform of FIG. 4A also illustrates two major peaks corresponding to the fundamental pitch frequency and the first harmonic thereof.
  • Output of differentiator circuit 21 drives flip-flop circuit 22 into an ON condition, whenever the amplitude of positive output voltage produced by the differentiator circuit exceeds a predetermined threshold level, as shown in FIG. 4B.
  • flip-flop circuit 22 returns to its OFF condition.
  • the output pulses generated by flip-flop circuit 22, which are illustrated in FIG. 4C, are differentiated by differentiator 23, so thatupon initiation of an output pulse by flip-flop circuit 22 a positive spike is generated by differentiator 23; similarly, upon completion of each output pulse by flip-flop circuit 22, a negative spike is generated.
  • the positive and negative spikes produced by differentiator 23, and which are illustrated in FIG..4D, are inverted by inverter circuit 24, and thereafter rectified by half wave rectifier 25, so that the output signal of peak detector 15 corresponds to positive voltage spikes generated each time hysteresis flip-flop circuit 22 switches to its OFF condition.
  • the peak detector output signal is illustrated by the waveforms of FIG. 4E.
  • FIG. 5 is a block diagram of vocal pitch detector 16, shown in FIG. 1.
  • Input signals from frequency analyzer 12 of FIG. 1, which comprise the repetitive frequency spectrums produced thereby, are applied to the input of an amplitude threshold detector circuit 40 which, in turn, passes signals above a predetermined threshold level to the signal input of a gate circuitp4l.
  • the control input of gate 41 is energized by signals from frequency spectrum peak detector 15 of FIG. 1, so that the gate is opened at the instant the frequency spectrum peak has been detected.
  • the output voltage of the frequency analyzer at this instant, provided the amplitude thereof exceeds the threshold set by threshold detector 41) is furnished to the input of a bistable multivibrator 42.
  • Bistable multivibrator 42 is reset by a signal from the frequency analyzer each time the frequency analyzer is internally actuated to scan a new spectrum of frequencies.
  • Rectangular output pulses produced by bistable multivibrator 42 are supplied to a differentiator circuit 43 which, through a half wave rectifier 39, provides narrow, sharp unipolar pulses to the inputs of a sample and hold circuit 44 and a narrow track gate circuit 45. These narrow, sharp pulses function as a sampling signal for sample and hold circuit 44 and as an input signal for narrow track gate circuit 45.
  • the reset signal for bistable multivibrator 42 which originates in frequency analyzer 12 and is preferably the scan sync signal as produced by the Nth cycle amplifier illustrated in the aforementioned U.S. Pat. No. 3,243,703, is also supplied to the control or sample input of a second sample and hold circuit 48, as well as to the input of a sawtooth generator circuit 46.
  • This sync signal initiates a new sawtooth voltage waveform each time it is supplied to the input of generator 46.
  • the sawtooth voltage wave is supplied to the signal input of sample and hold circuit 44, as well as to one input of an amplitude comparator circuit 49 which produces a narrow output pulse whenever the amplitude levels of applied input voltages are equal.
  • the sawtooth voltage wave is also furnished to the signal input of a third sample and hold circuit 47.
  • Output signals from sample and hold circuit 44 are provided to the signal input of sample and hold circuit 48.
  • Output signals from sample and hold circuit 48 are furnished to the second input to amplitude comparator circuit 49 which, in turn, supplies output pulses to the control input of narrow track gate circuit 45.
  • Output of narrow track gate circuit 45 is coupled to the control or sampling input of sample and hold circuit 47, as well as to voicing decision logic circuitry 17 shown in FIG. 1.
  • Output signals from sample and hold circuit 47 are supplied to the input of smoothing circuit 18 of FIG. 1.
  • Narrow track gate circuit 45 functions as a coincidence detector, and provides an output signal corresponding to the pulses supplied thereto from differentiator circuit 43 through half wave rectifier 39 as long as these pulses substantially coincide in time with the pulses from amplitude comparator circuit 49.
  • gate 45 must be sufficiently wide in time to follow changes in vocal pitch frequency which produce the characteristic intonation of speech, these changes are relatively slow in comparison with the frequency analyzer scan interval; hence, gate 45 generally prevents transients not connected with normal speech from appearing in the output signal thereof.
  • sample and hold circuit 47 the output pulses from narrow track gate circuit 45 sample the sawtooth waveforms supplied from sawtooth generator 46.
  • sample and hold circuit 47 provides a signal of amplitude corresponding to the amplitude of voltage produced by sawtooth generator 46 at the instant a pulse is supplied'by narrow track gate circuit 45.
  • the vocal pitch detector functions by accepting each scan from the frequency analyzer and passing the portion thereof of amplitude greater than a predetermined level through amplitude threshold detector 40 to gate 41. If, at the instant a peak is produced by peak detector 15 the amplitude of the spectrum exceeds the level set by amplitude threshold detec tor 40, thereby assuring that the detected peak is not merely a low level peak due to noise or some form of disturbance, gate 41 passes this instantaneous spectrum amplitude to bistable multivibrator 42 which, in turn, supplies a pulse to differentiator 43. The differentiator thus supplies output pulses in the form of narrow spikes while voicing occurs. A single spike occurs during each spectrum scan at the instant the first spectrum peak is detected. These spikes are rendered unipolar by rectifier 39. Because bistable multivibrator 42 is reset by the frequency analyzer only once per scan, only one spike is produced per scan.
  • Sawtooth generator 46 being synchronized with the frequency analyzer, produces repetitive sawtooth waves of duration equal to the scan plus the interval between two successive scans.
  • the sawtooth voltage amplitude at that instant is supplied by sample and hold circuit 44 to sample and hold circuit 48 and, upon occurrence of the next reset signal from frequency analyzer 12, is supplied to amplitude comparator 49.
  • comparator 49 Due to the short delay introduced by sample and hold circuit 48, which is comparable to but a fraction of the duration of each spectrum, comparator 49 receives a signal from sample and hold circuit 48 which represents frequency of the selected peak occurring in the preceding scan. Hence, in amplitude comparator 49, output of sample and hold circuit 48.
  • output signal information from circuit 45 by indicating: stable tracking of the fundamental pitch frequency signal constitutes a valid indicator of the presence of voicing, (11.25: hence is supplied to one input of voicing decision logic circuitry 17 as well as to the sampling or control input of sample and hold circuit 47.
  • Output of sample and hold circuit 67 constitutes a signal which may vary only slowly since, if the lowest frequency spectrum peaks occur at approximately regular times in suecessive scans, output pulses from narrow track gate circuit 435 occur at approximately regular times and hence sample the sawtooth voltage from generator 46 at approximately regular times. The sampled amplitude is thus approximately unchanged from sample to sample. In the event stable tracking ceases, as manifested by anticoincidence of pulses supplied to narrow track gate circuit 45, sample and hold circuit 47 continues to produce a steady signal of amplitude equal to that provided at the time of the most recent output pulse from gate 45.
  • voicing decision logic circuitry 17, shown in FIG. 6, comprises a first two-input AND gate 50 responsive to the continuous output signals both of absolute level threshold detector 14, and of amplitude ratio detector l3 through an amplitude threshold detector 51.
  • the AND gate provides a voicing indication output signal.
  • the voicing indication output signal is sup plied to one input of a second two-input AND gate 52, the second input of which is fulfilled by a continuous output signal supplied by an integrator circuit 54 in response to spikes produced by narrow track gate circuit 45 of vocal pitch detector 16.
  • Integrator 54 which may comprise the well-known i 1?
  • circuit type is preferably shunted by a diode 55 in order t: allow AND gate 52 to respond immediately upon initiation of a train of output pulses from narrow track gate 45. Output signals of AND gate 52, which are in the form of continuous signals, actuate a flip-flop 53 to its set condition. Output signals from flip-flop 52, when in the set condition, are supplied directly to smoothing circuit 18. Flip-flop 53 is reset through a NOT circuit 56 from the output ofAND gate 50.
  • voicing decision logic circuitry 17 produces a voicing indication output signal provided AND gate 50 receives signals both from absolute level threshold detector circuit 14, which requires that the output signal from full wave rectifier 11 is of sufficient amplitude as to make it likely that a voicing signal is being supplied to the system, and from amplitude ratio detector 13, which requires that the ratio of peak amplitude in each spectrum to the average amplitude threshold detector 51 so as to make it likely that the output signal from full wave rectifier 11 is a voice signal and not merely large amplitude noise.
  • AND gate 52 combines recognition of a voicing indication output signal, as produced by AND gate 50, with a stable tracking indication supplied from vocal pitch detector 16 through diode-shunted integrator [M in order to provide a large time constant for the smoothing circuit.
  • the logic supplied by AND gate 52 indicates that a peak at the same frequency in each spectrum is being detected, so that the fundamental pitch frequency corresponds to the frequency at which this peak is consistently detected.
  • AND gate 52 provides an output signal, an output is provided to smoothing circuit 18 from flip-flop circuit 53.
  • smoothing circuit 18 Upon completion of the output signal from AND gate 5'2 however, smoothing circuit 18 continues to receive a signal from flipflop 53 for as long as AND gate continues to produce an output voicing indication signal, so as to maintain the large time constant in smoothing circuit 18 for as long as voicing continues, even if stable tracking of the fundamental pitch frequency is no longer indicated by virtue of absence of pulses from narrow track gate 45 of the vocal pitch detector.
  • NOT circuit 56 resets flip-flop 53.
  • the flip-flop thus deenergizes smoothing circuit 18, removing the large constant thereof so as to allow the smoothing circuit to quickly change output voltage values while the system seeks a new spectrum peak.
  • sample and hold circuit 47 maintains the fundamental pitch frequency output signal constant.
  • FIG. 7 shows the circuitry of smoothing circuit 18 of FIG. I.
  • This circuitry comprises a series-connected resistance 60 and capacitance 61 which, together, function as an integrator with a time constant preferably of about two spectrum periods.
  • An INHIBIT gate 62 is connected in parallel with resistance 60, and is controlled by output signals from flip-flop 53 in voicing decision logic circuitry 17.
  • flip-flop 53 initially produces no output signal, so that gate 62 forms a short circuit across resistance 6t).
  • Input signals which may be received from sample and hold circuit 47 of vocal pitch detector 16 are thus passed directly through gate 62 to the output of smoothing circuit 18.
  • the fundamental pitch frequency output signal as produced by smoothing circuit 18 corresponds to what might be described as a stepwise or box car signal; that is, the signal is substantially a replica of the output signal produced by sample and hold circuit 47 of the vocal pitch detector.
  • an output signal is produced by AND gate 52 of voicing decision logic circuitry 17
  • a signal is provided at the output offlip-flop 53 which inhibits gate 62', that is, the path through the gate is open-circuited.
  • the effect of resistance 60 is thus inserted into smoothing circuit 18, establishing the large time constant therein.
  • the fundamental pitch frequency output signal therefore changes very slowly from this time on, until the circuit of gate 62 is once more closed.
  • the output signal produced by smoothing circuit 18 can change very rapidly, so that once voicing has again been detected by the system, along with stable tracking of the fundamental pitch frequency, the fundamental pitch frequency output voltage reaches an amplitude value corresponding to that of the fundamental pitch frequency very rapidly. This provides the system with a very fast response.
  • the foregoing describes a method and apparatus for rapidly and accurately determining, with a high degree of reliability, the fundamental pitch frequency signal of a complex voice frequency signal from which the fundamental pitch frequency signal may have been lost.
  • the system further enables real time determination of the fundamental pitch frequency signal of a complex voice frequency signal without computer assistance, even in the presence of signal disturbances.
  • a method of extracting the fundamental pitch frequency of a complex voice frequency signal comprising the steps of periodically separating said voice frequency signal into a spectrum of its constituent sinusoidal frequencies, repeatedly detecting the frequency of the lowest frequency high amplitude peak in each frequency spectrum of the signal, comparing the frequency of the lowest frequency high amplitude peak in a first spectrum with the frequency of the lowest frequency high amplitude peak in a succeeding spectrum, and equating the fundamental pitch frequency of said complex voice frequency signal to the frequency of the lowest frequency high amplitude peak of the spectrum upon detection of the lowest frequency high amplitude peaks at substantially identical frequencies in the compared spectrums.
  • the method of extracting the fundamental pitch frequency of a complex voice frequency signal of claim 1 including the preliminary step of full-wave rectifying said voice frequency signal prior to separating said voice frequency signal into a spectrum of constituent sinusoidal frequencies.
  • a method of extracting the fundamental pitch frequency signal of a complex voice frequency signal comprising the steps of separating the voice frequency signal into a spectrum of its constituent sinusoidal frequencies, repeatedly detecting the lowest frequency high amplitude peak in each frequency spectrum of the signal, storing a manifestation of the time at which said lowest frequency high amplitude peak occurs in a given spectrum, comparing the occurrence of the lowest frequency high amplitude peak of a first spectrum with the occurrence of the lowest frequency high amplitude peak of a succeeding spectrum, and providing an output signal corresponding to an interpolation of said peaks as long as said peaks occur at approximately the same time in each spectrum whereby said output signal represents a reconstructed version of the fundamental pitch frequency signal.
  • the method of extracting the fundamental pitch frequency signal of a complex voice frequency signal of claim 4 including the preliminary step of full-wave rectifying said voice frequency signal prior to separating said voice frequency signal into a spectrum of constituent sinusoidal frequencies.
  • a method of extracting the fundamental pitch frequency signal of a complex voice frequency signal comprising the steps of periodically separating said voice frequency signal into spectra of its constituent sinusoidal frequencies, detecting the lowest frequency high amplitude peak in each of said frequency spectra, storing a manifestation of the time at which said lowest frequency high amplitude peak occurs in a given spectrum, comparing the occurrence of the lowest frequency high amplitude peak of a first spectrum with the occurrence of the lowest frequency high amplitude peak of a succeeding spectrum, detecting the amplitude of said voice frequency signal, and providing an output signal corresponding to an interpolation of said peaks as long as said peaks occur at approximately the same time in each of said spectra and the amplitude of said voice frequency signal exceeds a predetermined level.
  • Apparatus for extracting the fundamental pitch frequency of a complex voice frequency signal comprising: frequency analyzer means repetitively separating the signal into a spectrum of sinusoidal signal constituents; peak detector means responsive to said frequency analyzer means and producing a pulse at the lowest frequency at which a high amplitude peak occurs in each spectrum produced by said analyzer means; means for storing a manifestation of the time at which said lowest frequency high amplitude pulse occurs in a given spectrum; means for comparing the occurrence of the lowest frequency high amplitude pulse of a first spectrum with the occurrence of the lowest frequency high amplitude peak of a succeeding spectrum; and gating means responsive to said comparison means for passing the output of each peak to the output of said apparatus upon occurrence of said pulses at approximately the same time in successive spectra.
  • the apparatus of claim 10 including smoothing circuit means responsive to said gating means for interpolating the output signal between successive ones of said peaks.
  • the apparatus of claim 10 including full-wave rectifier means coupling said voice frequency signal to said frequency analyzer means.
  • the apparatus of claim 13 including high pass filter means coupling said voice frequency signal to said full-wave rectifier means.

Description

United States Patent [72] Inventor David E. Wood 3,327,058 6/1967 Coker 179/1 Schenectady, N.Y. 2,938,079 5/1960 Flanagan 179/1 [21] Appl. No. 636,390 2,891,111 6/1959 Flanagan 179/1 [22] Filed May 5, 1967 3,364,425 8/1963 Peterson... 324/77 [45] Patented Dec. 22,1970 2,593,694 4/1952 Peterson 175/183 l Asslgnee fc ggziffiz rzg sg gfi Pnrnary Exammer- Kathleen H. Claffy Assistant Examiner-Tom DAmico Attorneys-Richard R. Brainard, Marvin Snyder, Paul A.
Frank, Neuhauser, Melvin Goldenberg and EXTRACTION SYSTEM FOR COMPLEX SIGNALS Oscar adde" 14 Claims, 11 Drawing Figs.
[52] US. Cl 3127291; ABSTRACT: A System for extracting the fundamental pitch fre uency from a full-wave rectified complex voice fre uenc 1 Cl 010] 1/04 q q y [51] I3. I signal in p i Separatlng the Signal into a Spectral d Fleld of Search of its constituent Sinusoidal frequencies and repetitively scanning the spectral band starting from the low frequency d to detect the frequency of the first-occurring peak of rela- [5 6] References Cited tlvely large amplitude in each scan of the spectral band. Con- UNITED STATES PATENTS sistent detection of the same frequency for this peak in each of 3,450,989 6/1969 Dickinson the scans establishes the fundamental pitch frequency signal, ,2 1963 PPQI-W allowing generation of a signal of amplitude corresponding to 3,381,091 4/ 1968 Sondhr 179/1 h f d mental pitch frequency.
Input from Rose! Signal Frequency Spectrum from Frequency Pea/c Detector l5 Analyzer l2 ,'l6
i l Sunroof/1 1 46 Generator I Input from m, Hall 8 Fre uency M Bis topic I :I m Analyzer ii gfj z I Mu/lwbraiar a'ffarlanfmra 2%,
I 4/ 42 43 g I Narrow Sample I Ou/pul to I Track and nner/ring VOCAL PITCH 05750701? f" e /8 L i .4
Output f0 Voicing Decision Logic /7 PQQQ,
input Log 'Log Log Log Log Lag Amplitude Amplitude Amplitude Amplitude Amplitude Amplitude PATENIEU uc22 I370 8HEEI1UF3 35493306 Fuii ave Frequency Fm Rectifier Analyzer I ,0 Frequency l5 ,5pectrum Peak Detector weal Pitch Detector Fundamental 5% Pitci1 Frequency Output Signal Fig Voicing Decision Anplitude Ratio Detector Voicing wind/cation Signal Absolute Level Tires/told Detector /4 Scan Scan 2 Scan 3 I Scan 4 $can s I A v r Sc 6 M inventor:
i ANAL r250 vo /cs0 SPEECH His Attom y.
David E. Wood A PA ENTEUUECZZIQYU I saw 3 UF 3 3549305 Fundamental Pitch Firs Frequency Harman/c g time and/ or i frequency g Threshold Level k lnput from 4B 0 k fifl7g Voicing Decision Logic /7 I J{ e2 Hg. r" F 0 I I i t/me lNg/gT I I l I Input from I l Fundamental E I Vocal Pitch Pitch Frequency F 46 0 l a, Detector I6 60 Output S/gnaI me 1 Samoa/@719 v 11 4 Fi .45. k
g 0 qfime F /g. 7.
Output to Smoothing 7 Circuit /8 5 W x r 52 53 I FromVoca/ 5 Pitch Detector l6 P Q 5 5 F//p-F/o 54 t I Am litude N07 From Amplitude p AND 55 I Voicing 7hreshold Y dcotion Rot/a 09/9610! /3 Detector s f Ins/(gnu,
l iom Absolute Level I rresho/d Detec 0/ l l VO/C/NG DECISION L06/C Fig. 6
Inventor FUNDAMENTAL PI'ICI-I FREQUENCY SIGNAL "EXTRACTION SYSTEM FOR COMPLEX SIGNALS BACKGROUND OF THE INVENTION This invention relates to pitch frequency signal extraction, and more particularly to extraction of the fundamental pitch frequency signal from a complex voice frequency signal.
In speaking, voiced sounds are produced by air being forced out of the lungs and through the vocal tract. The air stream, in passing through the vocal cords, is modulated so as to produce a spectrum rich in harmonics. The size and shape of the throat and oral cavities as controlled by tongue and lip positions produce further modifications to this air stream by reinforcing or attenuating particular portions of the spectrum. The final output energy is a function of the original excitation energy and resonance filtering in the cavities. The basic pulsation rate of the air passing between the vocal cords is present in the output energy waveform as a series of repetitive complexes which repeat at this basic rate. The fundamental periodicity for vocal pitch of the output energy, which may be termed the fundamental pitch frequency, varies from about 65 to 500 impulses per second for different speakers. Correct reproduction of the vocal pitch has been found to be essential to the synthesis of natural sounding speech, as required in vocoder communication systems for example.
Accurate automatic measurement of vocal pitch has heretofore proven to be a difficult problem. This problem stems from the wide acoustic variations which exist between different phonemes and their combinations. By utilization of the instant invention however, the fundamental pitch frequency signal may be readily extracted with a high degree of accuracy and minimum delay since, as I have discovered, a high amplitude peak occurs at the fundamental pitch frequency. Extraction of the fundamental pitch frequency signal is further facilitated by employment of a high signal sampling rate which enables detection of transitions from one phoneme to the next, as well as detection of unvoiced periods.
SUMMARY OF THE INVENTION One object of this invention is to provide a method and apparatus for rapidly and accurately determining the fundamental pitch frequency of a complex voice frequency signal from which the fundamental pitch frequency signal may have been lost.
Another object is to provide a method and apparatus for reliably obtaining the fundamental pitch frequency signal of a voice frequency signal and for maintaining the signal constant during portions of voiced intervals in which tracking of the fundamental pitch frequency is disrupted. Another object is to provide a simplified system for realtime determination of the fundamental pitch frequency signal of a complex voice frequency signal without computer assistance, even in the presence of signal disturbances.
Briefly, in accordance with a preferred embodiment of the invention, a method of extracting the fundamental pitch frequency of a complex voice frequency signal is described This method comprises the steps of separating the signal into a spectral band of its constituent sinusoidal frequencies, and repetitively detecting the lowest frequency in each band at which a relatively large amplitude peak occurs. Consistent detection of the same frequency at which this lowest frequency, relatively large amplitude peak occurs establishes this frequency as the fundamental pitch frequency of the signal. Interpolation between abrupt amplitude changes from one lowest frequency peak to the next enables reconstruction of the fundamental pitch frequency signal.
,-,. According to another preferred embodiment of the invention, apparatus for extracting the fundamental pitch frequency signal of a complex voice frequency signal is provided. This apparatus includes frequency analyzer means responsive to the-voice frequency signal for separating the voice frequency signal into a spectral bank of the sinusoidal signal constituents thereof, and signal amplitude peak detecting means responsive to the frequency analyzer means and producing a pulse at the lowest frequency at which a relatively large amplitude peak occurs in each spectral band produced by the analyzer means. Gating means responsive to the amplitude peak detecting means are provided for passing pulses produced by the amplitude peak detecting means to the output of the apparatus.
BRIEF DESCRIPTION OF THE DRAWINGS The features of the invention believed to be novel are set forth with particularity in the appended claims. The invention itself, however, both as to organization and method of operation, together with further objects and advantages thereof, may best be understood by reference to the following description taken in conjunction with the accompanying drawings in which:
FIG. 1 is a block diagram of the complex signal fundamental pitch frequency extractor of the instant invention;
FIG. 2 is a series of waveforms of analyzed voiced speech as produced by the frequency analyzer of FIG. 1;
FIG. 3 is a block diagram of the frequency spectrum peak detector of FIG. I;
FIGS. 4A4E are a series of waveforms to aid in the description of FIG. 3;
FIG. 5 is a block diagram of the vocal pitch detector of the system of FIG. 1;
FIG. 6 is a block diagram of the voicing decision logic circuitry used in the system in FIG. I; and FIG. 7 is a schematic diagram of the smoothing circuit of the system of FIG. 1.
DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 illustrates the basic system for extracting the fundamental pitch frequency signal from a complex voice frequency signal. Electrical signals corresponding to speech and produced, for example, by a microphone or other type of transducer are supplied through a high pass filter 10 to a full wave rectifier 11. The function of high pass filter I0 is to block low frequency signal components that might otherwise interfere with the fundamental pitch frequency. The purpose of full wave rectifier 11 is to produce a signal that has a strong frequency component at the fundamental pitch frequency during voiced speed; that is, to regenerate the fundamental pitch frequency. This is essential since not only do many forms of telephone communications greatly attenuate or even completely block the fundamental pitch frequency, but high pass filter 10 itself may remove the fundamental pitch frequency from the signal.
Output of full wave rectifier 11 is supplied to a frequency analyzer 12, a first input of an amplitude ratio detector 13, and an absolute level threshold detector 14. Frequency analyzer 12 may typically be that shown and described in D. E. Wood U.S. Pat. No. 3,243,703,issued Mar. 29, 1966 and assigned to the instant assignee. The frequency analyzer repetitively scans an input signal at a fixed repetition rate and produces an output signal corresponding to the amplitude of the sinusoidal constituents or components of the input signal for each scan. Thus, the output signal of the frequency analyzer corresponds to a sequence of scans, each scan ranging from one end of the scanned frequency spectrum to the other. This is illustrated in the analyzed voice speech waveforms shown in FIG. 2 wherein six scans, obtained sequentially according to their numerical designation, are depicted graphically with coordinates of logarithmic amplitude versus frequency. These waveforms indicate that, in each scan, a maximum peak of amplitude is generally obtained at the fundamental pitch frequency, with lesser peaks occurring at harmonics of the fundamental pitch frequency. Hence, the frequency analyzer can facilitate extraction of the fundamental pitch frequency provided this peak at the fundamental pitch frequency can be detected in each scan of the frequency analyzer by additional apparatus in the system of FIG. I.
Frequency analysis of the rectified speech is typically performed by frequency analyzer 12 over the range of 65 Hz to 500 Hz. The fundamental pitch component for both male and female speakers falls within this range, so that it can be resolved and accurately defined. Prior to analysis of the input signal, the signal is heterodyned within the frequency analyzer to an intermediate frequency range. Because the frequency analyzer typically includes a bank of i6 filters of 30 Hz bandwidth with a scanning repetition rate, in order of frequency through the filter bank, of 500 scans per second, each filter is sampled every 2 milliseconds.
Output of frequency analyzer 12 is supplied to a frequency spectrum peak detector 15 and a vocal pitch detector 16. Since the output of frequency. analyzer 12 is a continuous function of logarithmic amplitude versus frequency for each scan, the frequency of the fundamental sinusoidal component can be measured by detecting the lowest frequency in the spectrum scan where energy is a maximum, as indicated in FIG. 2 by the peaks occurring at the fundamental pitch frequency. Frequency spectrum peak detector 15 detects high amplitude peaks by combining detection of zero amplitude slope with detection of a previously positive amplitude slope, so as to distinguish maxima from minima, and furnishes a gating signal to vocal pitch detector 16 upon detection of the lowest frequency in each spectrum scan at which such maximum occurs.
Pitch detection is accomplished in vocal pitch detector 16 by detecting, once voicing is established, the vocal pitch fundamental frequency. This frequency is indicated by the position of the lowest frequency high amplitude peak in the spectrum produced by analyzer 12. During the initial buildup of energy at the onset of voicing, the position of the lowest frequency high amplitude peak shifts erratically, and often appreciably, from one scan to the next. As voicing is established, however, the position of the lowest frequency high amplitude peak becomes relatively stable, and changes position only at rates resulting from changes in articulation. Thus, vocal pitch can be detected by measuring the position of the first high amplitude spectrum peak on successive scans; when the frequency change from scan to scan becomes sufficiently small, the frequency at which this spectrum peak occurs is the vocal pitch fundamental frequency.
The vocal pitch detector is designed to automatically hold the last fundamental pitch frequency value in the event the apparent pitch output is violently disturbed by bursts of interference or oddities of speech. causing it to change more than a specified amount from the previous scan. A 2 millisecond interval between frequency scans is sufficiently short to allow smooth tracking of true pitch changes, while the hold provision ensures continuity of output during the period of voicing. During the hold condition, the vocal pitch detector searches for a new value of pitch; if a stable component is located and voicing continues to be indicated, the system effects a smooth transition from the held value of pitch to the new value, even though the transition is made abruptly by the vocal pitch detector.
Output signals from frequency spectrum peak detector 15, in addition to being supplied to vocal pitch detector 16, are also furnished to a second input of amplitude ratio detector 13. Thus, ratio detector 13 responds to the ratio of the logarithm of the maximum peak amplitude in the rectified speech frequency spectrum to the rectified speech amplitude averaged over a period, which typically may be equal to 3 to 6 times the scan times the scan time offrequency analyzer 12, or about 6 to 12 milliseconds. This averaging period is suffciently long to avoid undue response to spurious transients in the signal. The periodic amplitude modulation of voiced speech varies from at least 10 percent to nearly lOO percent of the signal, and appears in the rectified speech spectrum as a strong component. Unvoiced speech or noise exhibits no such strong frequency component. Thus, a considerable margin exists in the values of this ratio between voiced and unvoiced speech, with the higher ratios indicating voiced speech.
Output of amplitude ratio detector 13 is supplied to a first input of voicing decision logic circuitry 17, furnishing a parameter upon which a decision as to whether or not voicing is present may be based. A second input to voicing decision logic 17 is supplied by absolute level threshold detector 14, which determines whether there is sufficient energy in the output of full wave rectifier 11 to validly assume that speech is actually occurring; unless there is sufficient energy in the speech input signal, there is no reason to operate the system. When all conditions for voicing are met at an acceptable confidence level, an output signal is produced indicative of presence of voicing.
A third input to voicing decision logic 17 is energized by output signals from vocal pitch detector 16 which indicate that stable tracking of the fundamental pitch frequency is occurring. Accordingly, an additional output signal from voicing decision logic circuitry 17 is supplied to a smoothing circuit i8, enabling the smoothing circuit to produce a fundamental pitch frequency output signal in response to signals from vocal pitch detector 16 during periods beginning when both voicing is present and stable tracking of the fundamental pitch frequency is taking place, and ending only when voicing disaptears.
The output signal supplied by vocal pitch detector 16 to smoothing circuit 18 comprises a voltage with stepwise amplitude changes from one scan to the next. This voltage must represent a value corresponding to a stable fundamental pitch frequency, as determined by voicing decision logic circuitry i7, before the voicing decision logic circuitry can supply an output signal to smoothing circuit 18. When the fundamental pitch frequency is stable and voicing is present, smoothing circuit l8 minimizes the effect of voltage steps in the output signal of pitch detector 16. This minimization continues until voicing next disappears. Since these steps are relatively closely spaced, such as at 2 millisecond intervals, it is preferable to use a short smoothing time constant of approximately twice this spacing, or 4-5 milliseconds. Thus, relatively little delay is added by the smoothing operation. When no voicing is detected, the system output signal is held constant at its value occurring at the instant voicing last disappeared; when stable voicing conditions are detected, the system output signal comprises the fundamental pitch frequency output signal. The end of voicing causes the fundamental pitch frequency output signal to remain at a constant amplitude.
To briefly recapitulate, the system of FIG. 1 produces an output voltage corresponding to the fundamental pitch frequency signal, during voicing. The speech input signal is furnished to full wave rectifier 11 through high pass filter 10 which eliminates unwanted low frequency noise. The full wave rectifier reinserts the fundamental pitch frequency in the event this frequency has been lost either in transmission or by attenuation in high pass filter 10. The output of full wave rectifier 11 is broken into its constituent frequencies by frequency analyzer 12, which repetitively supplies scanned spectrums of frequency to vocal pitch detector 16 and frequency spectrum peak detector 15. Frequency spectrum peak detector 15 supplies a gating signal to vocal pitch detector 16. Thus, assuming voicing is present, the first peak in each scan produced by frequency analyzer l2 and detected by frequency spectrum peak detector 15 is supplied by vocal pitch detector 16 to smoothing circuit 18. Moreover, when voicing is present, smoothing circuit it? adds interpolation between the individual peaks furnished by vocal pitch detector 16 so as to provide a smoothly varying output signal of amplitude corresponding to the frequency of the fundamental pitch frequency signal.
Whether or not voicing is present is determined by voicing decision logic circuitry 17, which requires both presence of sufficient speech signal amplitude as determined by absolute level threshold detector 14, and indication of a sufficiently high amplitude lowest frequency spectrum peak relative to the short term average of the rectified speech produced by full wave rectifier ll, as determined by amplitude ratio detector 13. Additionally, when an indication of stable tracking of the selected fundamental pitch frequency as determined by vocal pitch'detector 16 is also provided, albeit even momentarily, voicing decision logic circuitry 17 produces a signal which al lows the amplitude of the fundamental pitch frequency, as
sensed by vocal pitch detector 16, to be supplied through smoothing circuit 18 to the system output for as long as voicing continues.
FIG. 3 is a block diagram of frequency spectrum peak detector of FIG. 1. The peak detector comprises a differentiator 21 receiving input signals from frequency analyzer 12 of FIG. 1. Output signals from differentiator 21 are supplied to a hysteresis flip-flop 22. This hysteresis flip-flop circuit, which may comprise a high gain differential amplifier, is actuated in the positive direction only when the positive input signal exceeds a threshold; however, this circuit returns to its initial condition whenever the input signal returns to zero. A circuit of this type is shown and described in D. E. Wood application Ser. No. 636,324 filed concurrently herewith, now U.S. Pat. No. 3,522,545, and assigned to the instant assignee. Output signals from hysteresis flip-flop circuit 22 are furnished to a second differentiator 23 which, in turn, provides an output signal to the input of an inverter circuit 24. Output signals from inverter circuit 24 are applied through a half wave rectifier 25 to the input of vocal pitch detector 16 shown in FIG. 1.
Operation of the frequency spectrum peak detector of FIG. 3 may best be understood with the aid of the waveforms of FIGS. 4A 4E. Thus, each of the waveforms of FIGS. 4A4E represent voltage amplitude along a common time abscissa. FIG. 4A represents the output signal of frequency analyzer 12 of FIG. 1 as supplied to the input of differentiator 21. This differentiator produces an output waveform as shown. in FIG. 43 wherein zero output voltage corresponds to the zero slope at the voltage maxima and minima in theoutput waveform of frequency analyzer 12. Although the waveform of FIG. 4A is plotted both against time and against frequency, the waveform of FIG. 413, as well as those of FIGS. 4C-4E are plotted solely against time. For this reason, the waveform of FIG. 4A also illustrates two major peaks corresponding to the fundamental pitch frequency and the first harmonic thereof.
Output of differentiator circuit 21 drives flip-flop circuit 22 into an ON condition, whenever the amplitude of positive output voltage produced by the differentiator circuit exceeds a predetermined threshold level, as shown in FIG. 4B. When the output voltage produced by differentiator 21 returns to zero, flip-flop circuit 22 returns to its OFF condition. The output pulses generated by flip-flop circuit 22, which are illustrated in FIG. 4C, are differentiated by differentiator 23, so thatupon initiation of an output pulse by flip-flop circuit 22 a positive spike is generated by differentiator 23; similarly, upon completion of each output pulse by flip-flop circuit 22, a negative spike is generated. The positive and negative spikes produced by differentiator 23, and which are illustrated in FIG..4D, are inverted by inverter circuit 24, and thereafter rectified by half wave rectifier 25, so that the output signal of peak detector 15 corresponds to positive voltage spikes generated each time hysteresis flip-flop circuit 22 switches to its OFF condition. The peak detector output signal is illustrated by the waveforms of FIG. 4E.
. FIG. 5 is a block diagram of vocal pitch detector 16, shown in FIG. 1. Input signals from frequency analyzer 12 of FIG. 1, which comprise the repetitive frequency spectrums produced thereby, are applied to the input of an amplitude threshold detector circuit 40 which, in turn, passes signals above a predetermined threshold level to the signal input of a gate circuitp4l. The control input of gate 41 is energized by signals from frequency spectrum peak detector 15 of FIG. 1, so that the gate is opened at the instant the frequency spectrum peak has been detected. The output voltage of the frequency analyzer at this instant, provided the amplitude thereof exceeds the threshold set by threshold detector 41), is furnished to the input of a bistable multivibrator 42. Bistable multivibrator 42 is reset by a signal from the frequency analyzer each time the frequency analyzer is internally actuated to scan a new spectrum of frequencies.
Rectangular output pulses produced by bistable multivibrator 42 are supplied to a differentiator circuit 43 which, through a half wave rectifier 39, provides narrow, sharp unipolar pulses to the inputs of a sample and hold circuit 44 and a narrow track gate circuit 45. These narrow, sharp pulses function as a sampling signal for sample and hold circuit 44 and as an input signal for narrow track gate circuit 45.
The reset signal for bistable multivibrator 42, which originates in frequency analyzer 12 and is preferably the scan sync signal as produced by the Nth cycle amplifier illustrated in the aforementioned U.S. Pat. No. 3,243,703, is also supplied to the control or sample input of a second sample and hold circuit 48, as well as to the input of a sawtooth generator circuit 46. This sync signal initiates a new sawtooth voltage waveform each time it is supplied to the input of generator 46. The sawtooth voltage wave is supplied to the signal input of sample and hold circuit 44, as well as to one input of an amplitude comparator circuit 49 which produces a narrow output pulse whenever the amplitude levels of applied input voltages are equal. The sawtooth voltage wave is also furnished to the signal input of a third sample and hold circuit 47.
Output signals from sample and hold circuit 44 are provided to the signal input of sample and hold circuit 48. Output signals from sample and hold circuit 48 are furnished to the second input to amplitude comparator circuit 49 which, in turn, supplies output pulses to the control input of narrow track gate circuit 45. Output of narrow track gate circuit 45 is coupled to the control or sampling input of sample and hold circuit 47, as well as to voicing decision logic circuitry 17 shown in FIG. 1. Output signals from sample and hold circuit 47 are supplied to the input of smoothing circuit 18 of FIG. 1.
Narrow track gate circuit 45 functions as a coincidence detector, and provides an output signal corresponding to the pulses supplied thereto from differentiator circuit 43 through half wave rectifier 39 as long as these pulses substantially coincide in time with the pulses from amplitude comparator circuit 49. Although gate 45 must be sufficiently wide in time to follow changes in vocal pitch frequency which produce the characteristic intonation of speech, these changes are relatively slow in comparison with the frequency analyzer scan interval; hence, gate 45 generally prevents transients not connected with normal speech from appearing in the output signal thereof. In sample and hold circuit 47, the output pulses from narrow track gate circuit 45 sample the sawtooth waveforms supplied from sawtooth generator 46. Thus, sample and hold circuit 47 provides a signal of amplitude corresponding to the amplitude of voltage produced by sawtooth generator 46 at the instant a pulse is supplied'by narrow track gate circuit 45.
The vocal pitch detector functions by accepting each scan from the frequency analyzer and passing the portion thereof of amplitude greater than a predetermined level through amplitude threshold detector 40 to gate 41. If, at the instant a peak is produced by peak detector 15 the amplitude of the spectrum exceeds the level set by amplitude threshold detec tor 40, thereby assuring that the detected peak is not merely a low level peak due to noise or some form of disturbance, gate 41 passes this instantaneous spectrum amplitude to bistable multivibrator 42 which, in turn, supplies a pulse to differentiator 43. The differentiator thus supplies output pulses in the form of narrow spikes while voicing occurs. A single spike occurs during each spectrum scan at the instant the first spectrum peak is detected. These spikes are rendered unipolar by rectifier 39. Because bistable multivibrator 42 is reset by the frequency analyzer only once per scan, only one spike is produced per scan.
Sawtooth generator 46, being synchronized with the frequency analyzer, produces repetitive sawtooth waves of duration equal to the scan plus the interval between two successive scans. Upon occurrence of each narrow spike from rectifier 39, the sawtooth voltage amplitude at that instant is supplied by sample and hold circuit 44 to sample and hold circuit 48 and, upon occurrence of the next reset signal from frequency analyzer 12, is supplied to amplitude comparator 49. Due to the short delay introduced by sample and hold circuit 48, which is comparable to but a fraction of the duration of each spectrum, comparator 49 receives a signal from sample and hold circuit 48 which represents frequency of the selected peak occurring in the preceding scan. Hence, in amplitude comparator 49, output of sample and hold circuit 48. representing the sawtooth voltage amplitude at the instant of occurrence of the previous spike from rectifier 39, is compared with the presently occurring sawtooth and, when coincidence is detected, amplitude comparator 4) supplied a gate pulse to the control input of narrow track gate 45. Thus, the
pulse produced by amplitude comparator 19 occurs during the no output pulse is produced by narrow track gate circuit 45.
Thus, output signal information from circuit 45, by indicating: stable tracking of the fundamental pitch frequency signal constitutes a valid indicator of the presence of voicing, (11.25: hence is supplied to one input of voicing decision logic circuitry 17 as well as to the sampling or control input of sample and hold circuit 47.
Output of sample and hold circuit 67 constitutes a signal which may vary only slowly since, if the lowest frequency spectrum peaks occur at approximately regular times in suecessive scans, output pulses from narrow track gate circuit 435 occur at approximately regular times and hence sample the sawtooth voltage from generator 46 at approximately regular times. The sampled amplitude is thus approximately unchanged from sample to sample. In the event stable tracking ceases, as manifested by anticoincidence of pulses supplied to narrow track gate circuit 45, sample and hold circuit 47 continues to produce a steady signal of amplitude equal to that provided at the time of the most recent output pulse from gate 45.
Voicing decision logic circuitry 17, shown in FIG. 6, comprises a first two-input AND gate 50 responsive to the continuous output signals both of absolute level threshold detector 14, and of amplitude ratio detector l3 through an amplitude threshold detector 51. When both inputs of AND gate 50 are fulfilled, the AND gate provides a voicing indication output signal. In addition, the voicing indication output signal is sup plied to one input of a second two-input AND gate 52, the second input of which is fulfilled by a continuous output signal supplied by an integrator circuit 54 in response to spikes produced by narrow track gate circuit 45 of vocal pitch detector 16. Integrator 54, which may comprise the well-known i 1? circuit type, is preferably shunted by a diode 55 in order t: allow AND gate 52 to respond immediately upon initiation of a train of output pulses from narrow track gate 45. Output signals of AND gate 52, which are in the form of continuous signals, actuate a flip-flop 53 to its set condition. Output signals from flip-flop 52, when in the set condition, are supplied directly to smoothing circuit 18. Flip-flop 53 is reset through a NOT circuit 56 from the output ofAND gate 50.
Voicing decision logic circuitry 17 produces a voicing indication output signal provided AND gate 50 receives signals both from absolute level threshold detector circuit 14, which requires that the output signal from full wave rectifier 11 is of sufficient amplitude as to make it likely that a voicing signal is being supplied to the system, and from amplitude ratio detector 13, which requires that the ratio of peak amplitude in each spectrum to the average amplitude threshold detector 51 so as to make it likely that the output signal from full wave rectifier 11 is a voice signal and not merely large amplitude noise.
AND gate 52 combines recognition of a voicing indication output signal, as produced by AND gate 50, with a stable tracking indication supplied from vocal pitch detector 16 through diode-shunted integrator [M in order to provide a large time constant for the smoothing circuit. Thus, the logic supplied by AND gate 52 indicates that a peak at the same frequency in each spectrum is being detected, so that the fundamental pitch frequency corresponds to the frequency at which this peak is consistently detected. As long as AND gate 52 provides an output signal, an output is provided to smoothing circuit 18 from flip-flop circuit 53. Upon completion of the output signal from AND gate 5'2 however, smoothing circuit 18 continues to receive a signal from flipflop 53 for as long as AND gate continues to produce an output voicing indication signal, so as to maintain the large time constant in smoothing circuit 18 for as long as voicing continues, even if stable tracking of the fundamental pitch frequency is no longer indicated by virtue of absence of pulses from narrow track gate 45 of the vocal pitch detector. However, in the event the voicing indication is lost, NOT circuit 56 resets flip-flop 53. The flip-flop thus deenergizes smoothing circuit 18, removing the large constant thereof so as to allow the smoothing circuit to quickly change output voltage values while the system seeks a new spectrum peak. During the time a new peak is being sought however, sample and hold circuit 47 maintains the fundamental pitch frequency output signal constant. Once stable tracking and voicing have both been reestablished, the smoothing circuit again receives its long time constant almost instantaneously, due to passage of the first output pulse from narrow track gate circuit through diode 55 to the set input of flip-flop circuit 53.
FIG. 7 shows the circuitry of smoothing circuit 18 of FIG. I. This circuitry comprises a series-connected resistance 60 and capacitance 61 which, together, function as an integrator with a time constant preferably of about two spectrum periods. An INHIBIT gate 62; is connected in parallel with resistance 60, and is controlled by output signals from flip-flop 53 in voicing decision logic circuitry 17. Thus, in absence of output signals from AND gate 52, flip-flop 53 initially produces no output signal, so that gate 62 forms a short circuit across resistance 6t). Input signals which may be received from sample and hold circuit 47 of vocal pitch detector 16 are thus passed directly through gate 62 to the output of smoothing circuit 18. Under these conditions, the fundamental pitch frequency output signal as produced by smoothing circuit 18 corresponds to what might be described as a stepwise or box car signal; that is, the signal is substantially a replica of the output signal produced by sample and hold circuit 47 of the vocal pitch detector. On the other hand, when an output signal is produced by AND gate 52 of voicing decision logic circuitry 17, a signal is provided at the output offlip-flop 53 which inhibits gate 62', that is, the path through the gate is open-circuited. The effect of resistance 60 is thus inserted into smoothing circuit 18, establishing the large time constant therein. The fundamental pitch frequency output signal therefore changes very slowly from this time on, until the circuit of gate 62 is once more closed. By removing the large time constant during intervals in which voicing ceases, the output signal produced by smoothing circuit 18 can change very rapidly, so that once voicing has again been detected by the system, along with stable tracking of the fundamental pitch frequency, the fundamental pitch frequency output voltage reaches an amplitude value corresponding to that of the fundamental pitch frequency very rapidly. This provides the system with a very fast response.
The foregoing describes a method and apparatus for rapidly and accurately determining, with a high degree of reliability, the fundamental pitch frequency signal of a complex voice frequency signal from which the fundamental pitch frequency signal may have been lost. The system further enables real time determination of the fundamental pitch frequency signal of a complex voice frequency signal without computer assistance, even in the presence of signal disturbances.
While only certain preferred features of the invention have been shown by way of illustration, many modifications and changes will occur to those skilled in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and changes as fall within the true spirit and scope of the invention.
1 claim:
1. A method of extracting the fundamental pitch frequency of a complex voice frequency signal comprising the steps of periodically separating said voice frequency signal into a spectrum of its constituent sinusoidal frequencies, repeatedly detecting the frequency of the lowest frequency high amplitude peak in each frequency spectrum of the signal, comparing the frequency of the lowest frequency high amplitude peak in a first spectrum with the frequency of the lowest frequency high amplitude peak in a succeeding spectrum, and equating the fundamental pitch frequency of said complex voice frequency signal to the frequency of the lowest frequency high amplitude peak of the spectrum upon detection of the lowest frequency high amplitude peaks at substantially identical frequencies in the compared spectrums.
2. The method of extracting the fundamental pitch frequency of a complex voice frequency signal of claim 1 including the preliminary step of full-wave rectifying said voice frequency signal prior to separating said voice frequency signal into a spectrum of constituent sinusoidal frequencies.
3. The method of extracting the fundamental pitch frequency signal of a complex voice frequency signal of claim 2 including high pass filtering said voice frequency signal prior to full wave rectifying said signal.
4. A method of extracting the fundamental pitch frequency signal of a complex voice frequency signal comprising the steps of separating the voice frequency signal into a spectrum of its constituent sinusoidal frequencies, repeatedly detecting the lowest frequency high amplitude peak in each frequency spectrum of the signal, storing a manifestation of the time at which said lowest frequency high amplitude peak occurs in a given spectrum, comparing the occurrence of the lowest frequency high amplitude peak of a first spectrum with the occurrence of the lowest frequency high amplitude peak of a succeeding spectrum, and providing an output signal corresponding to an interpolation of said peaks as long as said peaks occur at approximately the same time in each spectrum whereby said output signal represents a reconstructed version of the fundamental pitch frequency signal.
5. The method of extracting the fundamental pitch frequency signal of a complex voice frequency signal of claim 4 including the preliminary step of full-wave rectifying said voice frequency signal prior to separating said voice frequency signal into a spectrum of constituent sinusoidal frequencies.
6. A method of extracting the fundamental pitch frequency signal of a complex voice frequency signal comprising the steps of periodically separating said voice frequency signal into spectra of its constituent sinusoidal frequencies, detecting the lowest frequency high amplitude peak in each of said frequency spectra, storing a manifestation of the time at which said lowest frequency high amplitude peak occurs in a given spectrum, comparing the occurrence of the lowest frequency high amplitude peak of a first spectrum with the occurrence of the lowest frequency high amplitude peak of a succeeding spectrum, detecting the amplitude of said voice frequency signal, and providing an output signal corresponding to an interpolation of said peaks as long as said peaks occur at approximately the same time in each of said spectra and the amplitude of said voice frequency signal exceeds a predetermined level.
7. The method of extracting the fundamental pitch frequency signal of a complex voice frequency signal of claim 6 including continuing the output signal occurring upon last coincidence of said lowest frequency high amplitude peaks of successive spectra upon cessation of a stable frequency for successive lowest frequency peaks.
8. The method of extracting the fundamental pitch frequency signal of a complex voice frequency signal of claim 6 wherein said step of detecting the lowest frequency high amplitude peak in each frequency spectrum of the signal is performed only when said high amplitude peak is above a predetermined level with respect to the average amplitude of said spectrum.
9. The method of extracting the fundamental pitch frequency signal of a complex voice frequency signal of claim 7 wherein said step of detecting the lowest frequency high amplitude peak in each frequency spectrum of the signal is performed only when said high amplitude peak is above a predetermined level with respect to the average amplitude of said spectrum.
10. Apparatus for extracting the fundamental pitch frequency of a complex voice frequency signal comprising: frequency analyzer means repetitively separating the signal into a spectrum of sinusoidal signal constituents; peak detector means responsive to said frequency analyzer means and producing a pulse at the lowest frequency at which a high amplitude peak occurs in each spectrum produced by said analyzer means; means for storing a manifestation of the time at which said lowest frequency high amplitude pulse occurs in a given spectrum; means for comparing the occurrence of the lowest frequency high amplitude pulse of a first spectrum with the occurrence of the lowest frequency high amplitude peak of a succeeding spectrum; and gating means responsive to said comparison means for passing the output of each peak to the output of said apparatus upon occurrence of said pulses at approximately the same time in successive spectra.
11. The apparatus of claim 10 including smoothing circuit means responsive to said gating means for interpolating the output signal between successive ones of said peaks.
12. The apparatus of claim 11 including full-wave rectifier means coupling said voice frequency signal to said frequency analyzer means.
13. The apparatus of claim 10 including full-wave rectifier means coupling said voice frequency signal to said frequency analyzer means.
14. The apparatus of claim 13 including high pass filter means coupling said voice frequency signal to said full-wave rectifier means.
US636390A 1967-05-05 1967-05-05 Fundamental pitch frequency signal extraction system for complex signals Expired - Lifetime US3549806A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US63639067A 1967-05-05 1967-05-05

Publications (1)

Publication Number Publication Date
US3549806A true US3549806A (en) 1970-12-22

Family

ID=24551686

Family Applications (1)

Application Number Title Priority Date Filing Date
US636390A Expired - Lifetime US3549806A (en) 1967-05-05 1967-05-05 Fundamental pitch frequency signal extraction system for complex signals

Country Status (1)

Country Link
US (1) US3549806A (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3852535A (en) * 1972-11-16 1974-12-03 Zurcher Jean Frederic Pitch detection processor
US3855418A (en) * 1972-12-01 1974-12-17 F Fuller Method and apparatus for phonation analysis leading to valid truth/lie decisions by vibratto component assessment
DE2649259A1 (en) * 1976-10-29 1978-05-03 Tekade Felten & Guilleaume Automatic telephone speech identification - uses spectral analysis to identify key features of voice characteristics
EP0092612A1 (en) * 1982-04-27 1983-11-02 Koninklijke Philips Electronics N.V. Speech analysis system
US20050097075A1 (en) * 2000-07-06 2005-05-05 Microsoft Corporation System and methods for providing automatic classification of media entities according to consonance properties
EP1612773A2 (en) 2004-06-30 2006-01-04 Sony Corporation Sound signal processing apparatus and degree of speech computation method
US8082279B2 (en) 2001-08-20 2011-12-20 Microsoft Corporation System and methods for providing adaptive media property classification
WO2015110525A1 (en) * 2014-01-24 2015-07-30 Digital Endoscopy Gmbh Tracking the fundamental frequency of a voice signal in real time
US9955856B2 (en) 2013-12-19 2018-05-01 Digital Endoscopy Gmbh Device and a method for manufacturing an elongated hollow profile element, an elongated hollow profile element, and a bending section for an endoscope
US10076233B2 (en) 2013-10-30 2018-09-18 Digital Endoscopy Gmbh Device for transmitting a deflection movement, endoscope bending control unit, and endoscope
US10080483B2 (en) 2013-10-30 2018-09-25 Digital Endoscopy Gmbh Secondary endoscope mountable to a mother endoscope and a combination of a mother endoscope and a secondary endoscope
US10092171B2 (en) 2013-10-30 2018-10-09 Digital Endoscopy Gmbh Deflection movement transmission device, endoscope bending controller and endoscope
US10441142B2 (en) 2013-07-22 2019-10-15 Digital Endoscopy Gmbh Sealing component for an endoscope connector
US10874291B2 (en) 2014-01-23 2020-12-29 Digital Endoscopy Gmbh Fluid block for an endoscope control part and endoscope
US10939803B2 (en) 2013-12-02 2021-03-09 Digital Endoscopy Gmbh Endoscope head and endoscope
US11278187B2 (en) 2015-08-07 2022-03-22 Digital Endoscopy Gmbh Endoscope head

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3852535A (en) * 1972-11-16 1974-12-03 Zurcher Jean Frederic Pitch detection processor
US3855418A (en) * 1972-12-01 1974-12-17 F Fuller Method and apparatus for phonation analysis leading to valid truth/lie decisions by vibratto component assessment
DE2649259A1 (en) * 1976-10-29 1978-05-03 Tekade Felten & Guilleaume Automatic telephone speech identification - uses spectral analysis to identify key features of voice characteristics
EP0092612A1 (en) * 1982-04-27 1983-11-02 Koninklijke Philips Electronics N.V. Speech analysis system
US20050097075A1 (en) * 2000-07-06 2005-05-05 Microsoft Corporation System and methods for providing automatic classification of media entities according to consonance properties
US7756874B2 (en) * 2000-07-06 2010-07-13 Microsoft Corporation System and methods for providing automatic classification of media entities according to consonance properties
US8082279B2 (en) 2001-08-20 2011-12-20 Microsoft Corporation System and methods for providing adaptive media property classification
EP1612773A2 (en) 2004-06-30 2006-01-04 Sony Corporation Sound signal processing apparatus and degree of speech computation method
EP1612773A3 (en) * 2004-06-30 2009-08-19 Sony Corporation Sound signal processing apparatus and degree of speech computation method
US10441142B2 (en) 2013-07-22 2019-10-15 Digital Endoscopy Gmbh Sealing component for an endoscope connector
US10076233B2 (en) 2013-10-30 2018-09-18 Digital Endoscopy Gmbh Device for transmitting a deflection movement, endoscope bending control unit, and endoscope
US10080483B2 (en) 2013-10-30 2018-09-25 Digital Endoscopy Gmbh Secondary endoscope mountable to a mother endoscope and a combination of a mother endoscope and a secondary endoscope
US10092171B2 (en) 2013-10-30 2018-10-09 Digital Endoscopy Gmbh Deflection movement transmission device, endoscope bending controller and endoscope
US10939803B2 (en) 2013-12-02 2021-03-09 Digital Endoscopy Gmbh Endoscope head and endoscope
US9955856B2 (en) 2013-12-19 2018-05-01 Digital Endoscopy Gmbh Device and a method for manufacturing an elongated hollow profile element, an elongated hollow profile element, and a bending section for an endoscope
US10874291B2 (en) 2014-01-23 2020-12-29 Digital Endoscopy Gmbh Fluid block for an endoscope control part and endoscope
CN106413522A (en) * 2014-01-24 2017-02-15 数字内镜检查股份有限公司 Tracking the fundamental frequency of a voice signal in real time
WO2015110525A1 (en) * 2014-01-24 2015-07-30 Digital Endoscopy Gmbh Tracking the fundamental frequency of a voice signal in real time
US10441152B2 (en) 2014-01-24 2019-10-15 Digital Endoscopy Gmbh Tracking the fundamental frequency of a voice signal in real time
US11278187B2 (en) 2015-08-07 2022-03-22 Digital Endoscopy Gmbh Endoscope head

Similar Documents

Publication Publication Date Title
US3549806A (en) Fundamental pitch frequency signal extraction system for complex signals
US4696041A (en) Apparatus for detecting an utterance boundary
CA1204855A (en) Method and apparatus for use in processing signals
Schroeder Period histogram and product spectrum: New methods for fundamental‐frequency measurement
EP0054365B1 (en) Speech recognition systems
US3566035A (en) Real time cepstrum analyzer
US3617636A (en) Pitch detection apparatus
Scarr Zero crossings as a means of obtaining spectral information in speech analysis
KR100227950B1 (en) Human speech processing apparatus
US3546584A (en) Apparatus for analyzing a complex waveform containing pitch synchronous information
Alimuradov et al. Speech/pause detection algorithm based on the adaptive method of complementary decomposition and energy assessment of intrinsic mode functions
US3573612A (en) Apparatus for analyzing complex waveforms containing pitch synchronous information
JPH0462399B2 (en)
WO1990014656A1 (en) Apparatus and methods for the generation of stabilised images from waveforms
US4982433A (en) Speech analysis method
US3381091A (en) Apparatus for determining the periodicity and aperiodicity of a complex wave
Miller Performance characteristics of an experimental harmonic identification pitch extraction (HIPEX) system
US5058168A (en) Overflow speech detecting apparatus for speech recognition
Reddy et al. Predominant melody extraction from vocal polyphonic music signal by combined spectro-temporal method
US3448216A (en) Vocoder system
US3083266A (en) Vocoder apparatus
CA1230180A (en) Method of and device for the recognition, without previous training, of connected words belonging to small vocabularies
Du et al. Determination of the instants of glottal closure from speech wave using wavelet transform
Pelle A robust pitch extraction system based on phase locked loops
JPS6214839B2 (en)