WO1994018667A1

WO1994018667A1 - Voice recording electronic scheduler

Info

Publication number: WO1994018667A1
Application number: PCT/US1994/001597
Authority: WO
Inventors: Ari B. Naim; Thomas J. O'brien
Original assignee: Naim Ari B; Brien Thomas J O
Priority date: 1993-02-11
Filing date: 1994-02-10
Publication date: 1994-08-18

Abstract

Electronic sheduling devices are widely available but none utilize speech as the means of storing messages and controlling functionality. The use of speech, rather than a type written message, is a novel approach to this application because it makes the device language independent (not limited to particular alphabet or keypad), removes the burden of manual message input, and makes the device available to the visually impaired and others incapable of utilizing the written language. By employing both voice recognition (43) and sythesis (12), voice commands and information can be interpreted through and interactive process. Integrating speech storage and control technology with an electronic appointment keeping device introduces an important new product which can be offered to individuals as a means of improving their work efficiency and daily productivity.

Description

VOICE RECORDING ELECTRONIC SCHEDULER FIELD OF THE INVENTION

The present invention relates generally to digital voice recording devices coupled with a programmable daily scheduler that possess alarm reminder options.

Functional operation of the device is accomplished by means of either an external switch on a keypad or through spoken voice commands. The device also has the feature of being able to present information both on a visual display or in audio (voice synthesis) . By introducing an interactive dialogue between device and user, memory and voice recognition demands are appreciably reduced, important to compact design and portability.

BACKGROUND OF THE INVENTION The prior art is comprised of various types of scheduling systems, hand-held, U.S. Pat. No. 4,117,542, desk-top, U.S. Pat. No. 4,548,510, and through communication lines, U.S. Pat. No. 4,783,800. Of these, the present invention relates most closely to the hand-held device. The device disclosed in U.S. Pat. No. 4,117,542 is intended for storing and retrieving telephone numbers, street addresses, appointments, and agenda. It has the option of offering normal calculator functions as well. The main difficulty with this device is that it is limited to the laborious task of manually typing in messages and other information on the keypad. Entering information by means of typing on the keypad has many disadvantages. The most obvious is the time requirement, especially for individuals not adept at typing. Another disadvantage is that messages cannot be entered while being engaged in a task that does not permit use of one's hands and/or vision, such as, while driving a motor vehicle. Yet another disadvantage is the limitation that only information which can be expressed by the characters on the keypad can be entered and stored in the device (so that Chinese and music, for example, cannot be entered on a keypad with English characters) . To alleviate these difficulties, various approaches have been taken, such as handwritten character recognition and voice input. Handwritten character recognition is disclosed in U.S. Pat. No. 4,276,541 with a device that has a designated area on its face where the user can write a message down. The device embeds algorithms that proceed to decipher the writing and store it in data format. Of course, this approach does not meet our objective of reduced manual input means.

Alternatively, we choose voice as a means for entering information. Voice, or audio, has the advantage of requiring only limited manual contact with the device and eliminates the continual hand-eye coordination demanded by a typing or writing function. In addition, audio recording permits any language or sound to be recorded. Prior art in voice recording and reproducing for use in wrist watches appears in U.S. Pat. No. 4,391,530, where voice is coupled to alarm time entries to render context to the alarm times. An additional feature of a confirming external switch is disclosed in the U.S. Pat. No. 4,405,241. However, the design size constraint of both of these designs compromises the extent of memory storage and the number and types of functions for data manipulation.

Though using voice as an input means reduces the extent of manual input, some manual input of information and functional operation is still necessary. By using voice in order to control a device, the simplicity of use is even further advanced and, in some instances, manual demands are completely eliminated. Voice control also alleviates the process of learning to operate the device.

Prior art in voice control of a small time keeping device is found in U.S. Pat. No. 4,635,286, which discloses voice control for a wristwatch. In the proposed approach, several difficulties are evident. First, space constraints of a wristwatch considerably limit memory size and computing power and therefore the vocabulary size and the number of functions that can be controlled. Second, the described voice input means make the voice recognition function a very difficult task because the beginning and ending boundaries of the spoken voice command input are not easy to extract. Third, the reference command words are prerecorded and assumed to be speaker-independent. And fourth, little attempt is made at recording messages (more than single words) and the proposed method of recording alarm times is cumbersome and requires continuous visual and manual interaction with the device.

An additional device and method are proposed in U.S. Pat. No. 5,014,317. In this device, the size is no longer constrained to that of a wristwatch and the functionality includes message recording coupled with alarm time settings. In one of the embodiments of this patent, both message and alarm time settings are entered through voice means. A word spotting algorithm is used to isolate the time and date information in order to set the alarm. The remainder of the message is not recognized, but rather stored for later retrieval. In the second embodiment of the same patent, name data is recognized from vocally input information. This approach to data entry, whereby alarm time and date information or name information is extracted from a sentence is very difficult and prone to easily occurring errors. To date, recognizing even single words (unconnected) from a limited vocabulary is a difficult problem when implemented in the real world environment with background noise and variances in any one person's speech patterns. For example AT&T's DSP16 speaker-dependent series of voice recognition components has a success rate of 95% for unconnected words that are user trained to recognize only the trainer's spoken words. The problem is compounded when the system is designed for speaker- independent applications, where speech pattern variations from speaker to speaker must be accounted for as well.

In the method proposed in U.S. Pat. No. 5,014,317, both speaker-independent voice recognition and word-spotting are employed, thus requiring the use of speaker-independent recognition algorithms and the added complexity of locating a particular word in a string of spoken words. In addition, no attempt is made at controlling the device using the spoken words.

Accordingly, a primary goal of the present invention is to provide a device in which use is made of voice entry to not only enter messages and alarm times but also to control functions of the device, such as switching between modes of operation (message entry and alarm time entry) . The proposed method of accomplishing this goal removes the need for visual contact as well as reducing memory and voice recognition requirements.

SUMMARY OF THE INVENTION

This invention is intended to offer a means of keeping a daily schedule/agenda in a simple and easy to use fashion. Messages and appointments are stored by either recording an audio signal (e.g. voice) or by typing manually on an alphanumeric keypad. The device is fully operational both by voice commands means or through a keypad. An electronic scheduler in accordance with the present invention comprises: (a) a real time clock, comprising means for keeping current time and date; an alarm time register; an alarm date register; means for identifying a match between the current time and a set alarm time stored in the alarm time register; means for identifying a match between the current date and a set alarm date stored in the alarm date register; and means for outputting an alarm time reached signal or prerecorded message or sound when the current alarm time matches the set alarm time and the current alarm date matches the set alarm date; (b) a random access memory (RAM) for storing units of compressed digital audio data defining a message; (c) an audio storage/retrieval processor comprising: a microphone for receiving audio signals; amplifier means for amplifying the audio signals; a first low pass filter; an A/D converter for converting the amplified audio signals into digital audio data; data compression means for compressing digital audio data from the A/D converter into compressed digital audio data to be stored in the RAM; means for retrieving digital audio data from the RAM; means for expanding the retrieved data; means for D/A converting the expanded data; means for filtering the D/A converted data; means for amplifying the filtered D/A converted data to reproduce the original input audio signal during audio playback; (d) addressing means for assigning addresses to the units of compressed digital audio data, the addresses corresponding to storage locations in the RAM at which the units of data are stored; (e) keypad means, comprising alphanumeric keys and function keys, for entering text information and alarm time and date settings; (f) display means for displaying information retrieved by the addressing means; (g) voice synthesis means for synthesizing audible speech, including means for synthesizing an audible indication that the electronic scheduler is ready to accept a message to be stored, and for synthesizing an audible readout of text entries entered through the keypad means; and (h) secret option means for entering protected information for limited access.

In one preferred embodiment of the present invention, the real time clock comprises an oscillator, a time counting unit incrementing continuously based on a reference signal provided by the oscillator, a date counting unit, and means for making a periodic comparison between the alarm time stored in the alarm time register and the current time provided by the time counting unit. In addition, an electronic scheduler in accordance with the present invention may comprise means for initiating a programmed sequence that sounds an alarm or plays back a recorded message in response to the alarm time reached signal.

Preferred embodiments may also comprise means for marking selected memory addresses with an alarm time such that corresponding data is to be played back or logged into a scheduling network along with other messages to be played back.

Preferred embodiments may advantageously include means for storing context information packets associated with selected messages, the context information indicating the time the associated message was entered; alarm time(s) associated with the associated message; the number of times the message has been played; and a date on which the message may be automatically erased. An electronic scheduler in accordance with the present invention may also comprise a read only memory

(ROM) containing prestored digitized commands, and means for audibly reading out commands appearing on the display means by extracting the prestored digitized commands from the ROM.

The addressing means may include a microprocessor or a digital signal processor controlling information flow between all components.

In addition, preferred embodiments may include a read only memory (ROM) storing firmware, application software, screen message data and prerecorded voice message data.

Preferred embodiments may also include means for grouping entered information into data groups where a group can include name, telephone number, address, and message; and search logic means for retrieving from memory all the stored information in a group when only a portion of the information is provided.

In addition, an electronic scheduler in accordance with the present invention may include: (i) speech patterning means for extracting identifying parameters from the digital audio data; (j) speech recognition means for comparing the extracted identifying parameters to each of a group of reference identifying parameters associated with a first reference vocabulary, and producing a match indication as a function of the comparing; (k) command logic means to effect the performance of predetermined functions of the electronic scheduler upon receiving the match indication; and (1) interactive speech control means for controlling the interaction of the command logic means with the voice synthesis means such that the voice synthesis means synthesizes prompts indicating when speech commands are to be input and which options are available at a given instant. The first reference vocabulary may be either factory installed and speaker-independent or be created through a training process with spoken utterances or sounds by extracting from the utterances or sounds identifying parameters and storing the identifying parameters as the group of identifying parameters for the first reference vocabulary.

The speech recognition means may include means for producing a nonmatch indication when the match indication does not result from the input of a given spoken utterance, the nonmatch indication indicating that the given spoken utterance was not recognized.

In preferred embodiments, the predetermined functions include: turning on and off the electronic scheduler, retrieving specified stored information, setting the alarm time associated with a particular recorded message, and setting a secret code for limited data access. Preferred embodiments may also include: (m) a second reference vocabulary containing time and date information for use by the speech recognition means to extract time and date information from the extracted identifying parameters; (n) alarm time logic means to allow entry of an alarm time including time of day and date upon the speech recognition means producing a match indication; and (o) interactive speech recording means for controlling the interaction of the command logic means with the voice synthesis means, whereby the voice synthesis means synthesizes speech or sound prompts for indicating the required delivery time of an audio message input and which options are available at a given instant.

In addition, an electronic scheduler in accordance with the present invention may comprise means for audibly confirming the content of the information entered by the speech recognition means.

Thus, according to the invention, audio input is converted from analog to digital and stored in random access memory (RAM) for later retrieval. Unlike recording an audio signal on tape media, digital memory storage offers the control integrity and access that is necessary for a scheduling/agenda system. Other digital mass storage devices can be used either as a replacement to or in addition to the RAM, such as, optical or magnetic disk drives. For example, to be able to record "Call Joe at 555 1212" and have this message alert one to this task shortly before it must be executed requires that one be able to program an alarm and have this particular message ready for play at that time instant. Any audio input can thus be automatically incorporated into the scheduling/agenda system along with typed in information. The digitized audio information simply receives a different storage location in the memory. The use of an audio input rather than keypad entry is an option that is important for many reasons. The foremost advantage is the ease and expediency in which a message can be recorded. One of the main reasons electronic organizers have only captured a small sector of the overall market is attributed to the enormous patience required to enter information into the device. Since the majority of the market to which it appeals are busy individuals, this is a significant deterrent.

Another advantage the audio input offers is in lowering the level of the user's required sophistication and familiarity with technology. Yet another advantage of the audio input is that information other than what can be expressed as alphanumeric characters can be recorded, such as music or an individual's voice.

The output of the device encompasses both visual display and audio output. The display shows previously entered messages, messages in the process of being typed in, commands, functions and more. When an audio message is searched and found the display will indicate something to the effect of "Audio Information, press <PLAY> to listen." The audio output is achieved by accessing the particular block of data stored in the memory, passing it through a digital to analog converter, filtering it, amplifying it and outputting it through the speaker. The audio playback can be halted at any instant, played back repeatedly or saved for future reference.

In addition, there is an audio output that reads out the commands appearing on the visual display. This is especially useful when visual contact with the device is limited (e.g., when driving). Audio commands and readouts are accomplished by extracting prestored digitized commands and digits from the ROM concurrent with the display of these commands on the display.

The scheduling aspect of the device offers a method of logging into and retrieving from memory telephone numbers, addresses, appointments, meetings, and other information and daily activities. These entries can be classified into user defined or factory predefined categories (e.g., personal entries, business entries). Schedule inquires can be made by date, by time, or by any key-word present in the stored information.

A programmable alarm is coupled with the scheduling aspects of the device. The alarm time, the time the alarm turns on, can be appended to any of the entries made, be it an appointment, a meeting, a phone call that must be made at a specific hour or any other alarm related need. Alarm times can be appended to audio inputs as well, extending the utility of the audio input in an important way. For example, an individual can quickly leave himself a note to remind himself of a task to be performed by simply speaking into the device and then keying in the hour for the alarm to turn on. An option is also available whereby the actual message entered will be played back instead of an alarm.

The option of having a message automatically played back instead of an alarm broadens the use to such things as games, a fun programmable alarm clock that wakes one up to whatever tune or message that was audibly entered, a message one individual leaves for another, and more.

The search capability offers access to stored entries by providing only a portion of a particular entry to be found. The method used to search is what is known in the field of artificial intelligence as a "top-down" search. This involves first searching all the name fields, then all the telephone number fields, then all the address fields, then all the message fields, and finally all the search index fields. The first item found that satisfies the search is displayed. The system continues to search through additional fields to locate another match. If the appropriate key is pressed, the system displays the next match found, until a message "search complete" is displayed to announce that the entire memory has been searched, and all matching fields have been found. In the case where additional matches are requested and the system is still in - li ¬ the process of searching, the message "searching" will flash on the screen to indicate such a status.

In order to ensure the security of the information entered by the person using the device, an option is available that limits access to prespecified entries. A key labeled "SECRET" can be pressed before entering information so that the information to be currently entered can only be accessed by knowing a code word. The code word is programmable and can be changed at will.

Voice operation of the device is available through user spoken commands. Before entering a command, the device provides an audio prompt to indicate when the command should be spoken and what command options are available at that instant. For example, after entering a message through the audio means, a reminding alarm is set at a specific time at which the message will be played back. After entering the message, the system will prompt the user by announcing: "alarm ?". By saying "No" the entry is complete and no alarm time has been appended. By saying "Yes", the system prompts: "hour ?"; the user then says one word indicating the hour. The system then prompts: "minute ?"; the user then says two additional digits specifying the minute. The system then prompts: "AM ?"; the user then says "Yes" for AM and "No" for PM. The system then prompts: "done ?". By saying "Yes" the entry is complete; by saying "No", additional prompts are presented. Note that for this small example a vocabulary of only 15 words is necessary, i.e., Yes, No and thirteen digits (0, 1,...,12).

The main objective of this device is to use voice as a means for entering information, such as appointments, which will offer a user a more efficient and less demanding mechanism for maintaining a schedule. In addition, the present innovation introduces a method of exploiting voice recognition for controlling the device's functionality. Because the dominant constraint of the device is its size, use of many sophisticated voice recognition systems, that demand high computational power, are prohibitive. The proposed method, however, offers a unique and practical solution by which more simple, portable and less computationally demanding voice recognition algorithms can be taken advantage of.

To this end, this specification also describes, in detail, one of many possible hardware implementations of the proposed objectives. The design pays special attention to power consumption, memory backup features, memory management and voice recognition error minimization. Power consumption is an important consideration for extended portable usage, since voice synthesis, recording and playback components consume a relatively large amount of power. Storage of audio data in digital form requires relatively large amounts of memory and so memory management is vital through data compression and automatic erasing features. And, with audio confirmation and automatic rejection of poorly received voice inputs, a means for reducing recognition errors, at little computational burden, is effected.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a block diagram of one embodiment of a voice controlled appointment keeper according to the present invention.

FIG. 2 depicts the audio storage/retrieval processor 11 portion of the block diagram shown in FIG. 1.

FIG. 3 depicts the circuitry for the keypad interface component 31 shown in FIG. 1. FIG. 4 depicts the circuitry for the real-time- clock component 5 shown in FIG. 1.

FIG. 5 depicts the circuitry for the power control component 3 shown in FIG. 1.

FIG. 6 is a flowchart of the voice recognition and training sequence used to recognize spoken commands for function execution and spoken information for data entry; this algorithm is implemented in software and is located in the voice recognition processor 43 in FIG. 1.

FIGS. 7A, 7B and 7C are flowcharts for the interactive voice control and voice information entry dialogue between the device and the user.

FIG. 8 is an example of interactive dialogue between the device and the user for entering an alarm time, corresponding to block 136 in FIG. 7A.

FIG. 9 is one possible outer appearance design of the device for handheld size.

DETAILED DESCRIPTION OF ONE PREFERRED EMBODIMENT

Reference is made to FIG. 1 showing the key building blocks of a device suited for carrying out the present invention. Component interconnectivity is specified by lines; arrows designate direction of information flow and no arrows indicate bi-directional flow.

Two means of inputting information into the device are available, by speaking into the microphone 29 or by keying in on the keypad 33. The information entered into the device is conveniently divided into two types of information. The first is "control information", intended to control the device in performing such functions as playback, search the data base, set alarm time and the like. The second type is "message information" which includes telephone numbers, names, notes and the like, that is usually intended for storage and retrieval purposes. Voice recognition is performed mostly on the control type information, while the audio inputted message type information is limited to a stored digitized audio message. Voice recognition can, however, be used to input some types of message information, such as phone numbers, that are stored in text format, while the name and other information associated with that message information can be entered and retrieved as message information. All typed text information (names, numbers, addresses, emos, and functional commands) entered through the keypad provides both control and message type information.

The voice input undergoes the following processing before being ready for storage in the device's memory. Audio range frequency signals enter the Microphone 29 where they are transformed into electrical signals and transferred to the Audio and Storage Retrieval Processor 11.

Refer to FIG. 2 for a detailed description of the Audio and Retrieval Processor 11 in FIG. 1. The electrical signals from the microphone are amplified by the input amplifier 50 (FIG. 2) to raise the signal level, and then passed through a low pass filter 51 to remove aliasing frequencies. The amplified anti-aliased audio signal is then digitized by an A/D converter 52, which converts the analog signal into a binary representation capable of being stored, retrieved, and manipulated by digital hardware. Compression of the binary representation is then performed by the data compressor 53 in order to increase the amount of recording time available for a given digital memory size. It is possible to achieve the same results using software to perform the data compression. However, the functional block is represented as hardware to show functional necessity. When the original audio signal is to be reproduced, the opposite operation, expansion (done in the data expansion block 54) , must be performed on the stored compressed binary representation. The data rate (amount of information (in bits) required per time to effectively represent the signal that is to be reproduced) is a limiting factor in the amount of recording time available for a given digital memory size.

There are several types of compression available to reduce the effective data rate without sacrificing signal integrity beyond intelligibility. One of these methods is Adaptive Differential Pulse Code Modulation (ADPCM) , an algorithm capable of 2:1 to 8:1 compression that was developed primarily for speech data compression over telephone lines. There are several commercially available integrated circuits that perform this compression and expansion in hardware. In addition, there are current offerings that incorporate the microphone amplifier, filters, A/D converter, D/A converter, data compressor, and data expander all on the same substrate. Other compression techniques can be used in addition to, or as a replacement of, ADPCM. Some algorithms are capable of compressing random binary data at ratios of 1.5:1 to 10:1 (depending on the redundancy factor of the data) . Through the combined use of more than one compression algorithm, the data rate can be reduced to yield longer recording times without expanding memory resources or significantly affecting audio sound quality. The compressed data is stored in the RAM 9 (FIG.

1) for later retrieval. Each converted input binary sequence defining a message is assigned an address so that it can be retrieved at any time, marked with an alarm time to be played back when the alarm time sets off, or logged into the scheduling network along with other typed in messages. To effectively address the memory, a data "header" or "footer" is stored with each message to indicate the message length or alternatively, an "end of message" or "beginning of message" sequence is stored with each message to define its memory location. An alternative and more restrictive addressing method involves reserving a portion of memory where a table listing the addresses of the messages is located. In addition to this storage addressing information, a context information packet is also stored with the actual message. The context information contains the time the message was entered; the alarm(s) , if any, that are associated with the message; the number of times the message was played; a date by which the message may be automatically erased; and any other information (including a text reference message) that may be used to control or track the message. Inputting information via the keypad is possible. There are alphabet keys, numeric keys, and control keys (See FIG. 9) . Pressing the alphanumeric keys serves the purpose of entering alphabetic letters and numbers as indicated by the labels closest to each alphanumeric key. Function keys are also available to execute such operations as: search by various categories (e.g. telephone number, first or last name, profession) , calculator functions, playback voice recorded message function, name entry, telephone/facsimile number entry, address entry, user defined search item entry, and memorandum entry. These same functions and others can also be accessed through screen selection prompts whereby the user selects the number corresponding to a particular function displayed on the screen. When a key is pressed on the keypad, the keypad interface 31 (FIG. 1) senses it within the time it periodically samples the keypad for a "keypress" (-0.1 seconds is a common period) . A binary code representing which particular key has been pressed is passed to the CPU 37 where the preprogrammed operation for that keypress is executed.

Refer to FIG. 3 for a detailed description of the keypad interface 31 which is used by the system to find out which key, if any, is being pressed on the keypad 33. This particular implementation uses a row-by-row decode technique. The CPU periodically selects a row to be read through a latch 70. This latch is also used to control other system functions (such as volume control in this application) if there are left over outputs. The latch data is decoded by a decode circuit 71, a data selector, and the selected row is read by selecting another latch 73. The data read by the CPU is a bit pattern of the keypad, such that any key can be checked individually for a depression. The ON key is decoded separately as to provide a switch that works without the CPU. This is needed to provide a user input for turning on the device during power-down mode. A latch 74 is used to read this key during power-on as well as other signals. This circuit can be implemented in many different ways, and is shown here to provide functional completeness. The essential functionality afforded by this circuit is the sensing of keypresses.

There are three ways that information is presented to the user: first, as an alphanumeric character or picture on the display; second, the same as the first method but in addition a synthesized voice that audibly reads out whatever is displayed; or third, playback of the user's recorded message.

If the information was entered by means of the keypad, or audibly through voice recognition means, the first and second output means are possible, i.e., by display means on the face of the device and/or by synthesized voice means. When the user desires to hear a synthesized voice output of the display, each alphanumeric character that is displayed also triggers the transmission of a particular prestored digitized acoustic sound. The sequence of these sounds, each sounding out one character, produces the sound of the complete word/number. These digitized acoustic sounds are prestored in the ROM 7 (FIG. 1) . There are other ways of producing the synthesized voice output and many commercial packages are readily available (e.g. AT&T DSP16 series) (See further discussion below on voice synthesis) .

If the information was entered by speaking into the microphone as "message information" (i.e., no recognition operation) , only the third means of output is possible (i.e. playback, of the user's recorded message) . The user is alerted to the fact that this is the only possibility by a message that appears on the Display 15 (FIG. 1) (e.g., "voice message - press <PLAY> to listen). By pressing a particular key, the recorded message is played back.

The operation by which the audio information is presented to the user is shown in FIG. 2, which depicts the audio storage/retrieval processor 11 of FIG. 1. As shown in FIG. 2, outputting audio information is accomplished by first addressing in the RAM the particular memory block that is associated with the message to be heard. Then the data is decompressed in the data expansion unit 54 into binary data for the D/A converter 55. The D/A Converter converts the digital signal to a sampled audio signal, passes it through a Low Pass Filter 56 to remove unwanted harmonics, and then passes it through an -Amplifier 57 which drives a Speaker 25. Once the message is heard, it can be deleted or saved in memory, replayed, tagged with a new alarm for future referencing amongst other options.

The LCD (liquid crystal display) Display 15 (FIG. 1) in this example device is a text or graphic display for providing information to the user. Information such as status, options, or recorded information (previously typed into the unit) can be shown on the display. The display is controlled by the CPU as defined by the firmware stored in the ROM. The CPU 37 (FIG. 1) controls the information flow and operational functions of the unit. At each CPU instruction cycle, an instruction is received from the ROM. The CPU then transfers data between itself and an external device or between two external devices. The CPU operation can be interrupted periodically to handle maintenance functions such as reading the keypad for key presses, checking to see if any of the alarm settings has reached its term (i.e., comparing the stored alarm times with the current time) , checking to see if the "ON" button is depressed or if the main battery level is low. The Oscillator 13 (FIG. 1) provides a time base for the internal functions of the CPU. As shown in FIG. 1, this same time base may be used as a reference for the audio storage/retrieval processor 11. The address decode circuit 35 shown in FIG. 1 controls the CPU's access to the peripherals or devices of the system. This unit effectively divides the address space of the CPU into portions big enough for the individual peripherals or devices.

The read only memory (ROM) 7 shown in FIG. 1 contains the firmware (hardware drivers) , application software, screen message data, pre-recorded voice messages (user alerts, etc.), and other data required for operation of the device. Its function is to provide preprogrammed instructions for the CPU. At each instruction cycle of the CPU, the ROM receives a binary address from the CPU, and presents the data corresponding to that address.

The random access memory (RAM) 9 shown in FIG. 1 stores text and voice data along with some statistical information about the data. The information contained within the RAM can be accessed by the CPU at any time, and its organization is entirely dependant upon the software driving the system. Additional digital mass storage devices can be used, such as optical or magnetic disk drives.

The real time clock 5 shown in FIG. 1 stores and counts the time of day, day of the week, day of the month, month, and year with the use of an oscillator. The alarm function of the real time clock is used to wake up the unit during stand-by mode when one of the alarm times that were set matches the current time of the clock. In addition, the periodic interrupt function of this component is used to provide the CPU with interrupts at regular intervals in order to scan the Keypad or check the alarm times, etc. The input to the real time clock is primarily for setting the clock time and for setting the alarm time. Refer to FIG. 4 for the circuitry of the real time clock 5 in FIG. 1. The time counting unit 91 and date counting unit 92 increment continuously based on the reference frequency provided by the oscillator 90. This counting continues even during stand-by mode (power-off) . The current time and date for these units can be set by the CPU at the user's discretion. The time and date set and updated in these units are the source from which the CPU can obtain current time and date information for display to the user. The alarm times and dates set by the user are addressed and stored in the memory unit. The CPU extracts the most current pending alarm time and date and sets the alarm time register 95 and the alarm date register 96 to that time and date. A comparison is made on a periodic basis between the time register 95 and the time counter 91 by the comparator 93 and comparison is also made in the same periodic manner between the date register 96 and the date counter 92 by the comparator 94. When both times and dates match, the alarm time signal 98 indicates this occurrence to the power control circuit 3 (FIG. 1 or FIG. 5) . The clock address decoder 99 controls the operation of the registers and counters of this circuit by transmitting select signals 97 to the unit that is to be activated. The instruction for activating the whole of the real time clock component is received from the address decode circuit 35 (FIG. 1) .

The memory back-up circuit 17 assures data retention in stand-by mode (power off) , and protects the RAM from false writes during power on/off. A lithium battery 39 is used to provide the power necessary for retaining RAM data and running the real time clock during stand-by mode. The power control circuit 3 boosts the voltage of the batteries 41 (either replaceable or rechargeable) to 5V in order to power the system during active operation (power on) , and automatically removes the 5V in order to enter stand-by mode when the CPU does not access the keypad interface 31 in a predefined time frame. As a result, the system is turned off in the event of a CPU failure, and can also be turned off (at the user's request, or automatically when user input has ceased for an extended period) by halting regular keypad interface accesses. Refer to FIG. 5 for a more detailed circuit description of the power control circuit block of FIG. 1. The unit can be turned on by two sources: the user via a simple momentary switch, and the reaching of an alarm time. Flip-flop 101 is SET by the alarm signal 98 (FIG. 4) or the ON signal 76 (FIG. 3). The output of the flip-flop controls the DC/DC converter 102, and turns it on or off, which applies or disconnects the power to the circuit in order to conserve battery usage while the unit is not in use. The flip-flop is turned off by the time-out of the One Shot 103 (retriggerable) which is caused by the absence of keypad scans. The keypad scans can be stopped purposely, or by a CPU failure. The low battery detect circuit 105 provides the system with a signal (low battery) that indicates a battery voltage lower than a predefined value. This indication can be used to alert the user or shut down the system during an out-of-tolerance condition. The reset circuit 104 provides the system with a momentary set-up time after power-on to allow for oscillator stabilization and low level hardware initialization, and is also necessary for the device start-up (power on) . The reset function is built into some CPUs, and is therefore shown only for functional completeness. This power method is replaceable with a simple switch, or can be implemented without a DC/DC converter.

The external interface 16 shown in FIG. 1 is for the purpose of transferring the memory contents to a personal computer or to provide additional memory to the device. This interface is a means by which the information stored within the device can be archived by the user through downloading the data to a personal computer or storing in external removable non-volatile memory. Alternatively, this interface can be used to upload information into the device's memory (e.g. voice mail). The expansion of storage capability is also facilitated by this interface for extending the available audio recording time and text storage capacity of the device. Voice recognition of spoken command and spoken data information is performed in the following manner. As FIG. 6 shows, the A/D converter 52 (FIG. 2) converts a spoken utterance into a digital data stream, passes it through the feature extractor 110 where the "identifying parameters" of the data stream are removed. Each reference word in the vocabulary has identifying parameters that distinguish it from the other words in the same vocabulary. The feature extractor is followed by the edge detector 111 which locates the point in the recorded time frame where the particular utterance actually began and ended (there may be some instant of delay before speaking the word) and uses those points as the points of reference for the beginning and ending of the relevant identifying parameters. Ascertaining the beginning and the ending boundaries of the utterance is an important element to forming identifying parameter sets that are comparable to those in the reference vocabulary. A parameter set formed from a time shifted version of the same uttered word, or an expanded or compressed version of the same uttered word, can lead to very different identifying parameters, and consequently, an error in the classification of the utterance.

The identifying parameters extracted from the incoming spoken utterance are compared 112 to each group of identifying parameters belonging to each of the words in the reference vocabulary 117 and a measure of distance is made for each one (e.g., the Hamming distance for binary code words). The comparison that produces the best (e.g., the smallest) distance is considered to be the best match, and its measured distance is then compared to a defined threshold 114 to ascertain whether this best match is close enough to be considered the same. The final decision 115 is reached as follows: if the best distance measured does not pass the threshold, the utterance is not considered to match any of the words in the reference vocabulary and the final decision is to request a retry (that the same utterance be produced again for another attempt at recognition) ; if best distance measured does pass the threshold, the utterance is considered to be that word corresponding to those identifying parameters it was found closest to. The final word can be either a command word, such as, "YES" or "NO" or information words, such as, digits for alarm time entry. The reference vocabulary, which is characterized by the identifying parameters of the words it contains, is formed in a fashion similar to the recognition process described above. The feature extractor 110 and the edge detector 111 are used to construct the reference identifying parameters when the user of the device utters words into the microphone. Because this training process for the reference identifying parameters is performed by a particular user of the device, this method is referred to as speaker-dependent. This is done by initially placing the device in training mode (the answer to training 112 question would be YES) through an external switch, then pressing one of the ten digit keys and speaking the number that corresponds to the pressed key into the microphone. (The device may ask for more than one trial for each word spoken in order to construct a more tolerant identifying parameter set.) Several more keys can be programmed to be available as spoken commands including the playback function, the alarm function, the memo function, the secret function and the telephone function. They too are pressed, while in training mode, and the user speaks the commands into the microphone, sometimes more than once. By doing this, the device is trained to associate the spoken utterance with the particular key that was pressed. For example, the user presses the key marked "3" and says "THREE". The training unit 116 now adjusts the parameter set in the reference vocabulary to regard the spoken word "THREE" as representing the same function as the pressing of the key marked "3".

In another embodiment, blocks 112 and 116 can be removed when the reference vocabulary identifying parameters have been factory installed and cannot be changed. This case would pertain to speaker-independent systems that are capable of recognizing the spoken utterances independent of which speaker spoke them. This is usually accomplished by having a large number of speakers utter the words in the defined vocabulary. Each speaker will produce identifying parameters for each of the words in the defined vocabulary that are slightly different than those produced by any other speaker. The different identifying parameters produced by each of the speakers can then be averaged and used as the "speaker-independent identifying parameters".

The voice output for the purpose of sounding information and prompts (e.g., data required from the user) to the user is shown as the voice synthesis processor 12 block in FIG. 1. The functionality necessary for this task is that required for storage of voice patterns, such as the audio message storage means, and that required for audio playback, such as said audio retrieval means 11 (FIG. 1 or FIG. 2). Following the audio storage means in FIG. 2, the voice synthesized data patterns, after data compression 56, are transferred through the CPU data bus to the ROM for storage, along with the other data necessary for system operation. Output of the voice synthesis operation is initiated in different situations: it can be automatic when reading the display output, if such an option is selected by the user; it can be part of a dialogue (user prompts) for voice or keypad control command entry; it can be part of a dialogue for voice or keypad data information entry; or other situations, such as sounding of voice alarms. The times at which this occurs are controlled by the software of the system. The data for producing the intended words or sounds is read from the ROM and sent through the CPU data bus to the audio storage/retrieval processor 11 (FIG. 1 or FIG. 2) where it is expanded 54 (FIG. 2), D/A converted 55 (FIG. 2) , filtered 56 (FIG. 2) and amplified 57 (FIG. 2) for listening by the user.

This method for voice synthesis is accomplished without additional hardware requirement. However, the drawback to this method is a significant increase in ROM size needed to store the extensive data for the vocabulary's speech patterns. The use of a separate voice synthesizer employing allophone, phoneme, LPC, or other comparable methods, would reduce the storage requirement of the needed vocabulary at the expense of additional hardware. With this approach, the memory requirement is reduced because only the pointers to the pattern sequences, needed to reproduce the intended utterances, are stored. Alternatively, the patterns for the allophones or phonemes can be stored in the system ROM, along with the vocabulary look-up tables (where the needed allophones or phonemes for each word are listed) . The particular implementation would dictate the necessity of additional hardware for the voice synthesis requirement.

In order to enter commands or enter data information into the device through audio means, an interactive method is devised. This interactive method employs the voice synthesis means and the voice recognition means, both discussed above, in the following fashion. Refer to FIG. 7 for a flowchart of one example of a software implementation of the interactive method. The recognition process can be initiated by either engaging an external switch 130 or entering an audio message 160. (Other uses for the voice recognition system includes voice recognition training 116 (FIG. 6).) Using the external switch to initiate recognition, the device prompts the user through voice synthesis means to enter a command by sounding "ENTER ALARM". The user says "ALARM" 135, "SCHEDULE" 138, "NAME" 148 or "VOLUME" 151 and the device uses the voice recognition means to classify the uttered command to one of these possible commands in its vocabulary. A typical dialogue for information entry is given in FIG. 8, which illustrates an example of alarm time entry 135 dialogue.

The interactive method effectively reduces the vocabulary size for voice recognition and therefore also reduces the memory requirement and raises the recognition performance (i.e., fewer words to recognize is a simpler problem) . For example, the device will sound "set alarm?" and the user must answer either YES or NO, thus the device does not need to know the words "set" or "alarm". In addition, the burden on the user to remember the different available commands is reduced since options are presented and a mere selection must be made. And the response input time is nearly known so that the recognition algorithm need only operate on a specific sector of the recorded memory, improving recognition performance. By combining voice recognition and voice commands with an electronic appointment keeping device, a technologically sophisticated device offering many benefits can be made readily available for use by even the most technophobic of individuals.

The user depresses a button to begin recording a message. In order to conserve memory, it is advisable that the record button continue to be depressed while the message is being recorded and released only when recording is complete. When the message has been entered and the record button released, the interactive method is initiated with a request from the user if the recorded message is to be classified as secret or not secret. Further dialogue follows that detailed in the example in FIG. 8. This process of entering the alarm time setting command 135 is shown in FIG. 7 as audio message entered block number 160. Message type information can be stored into different fields, such as, name, telephone number, fax number, address, and more. The number of fields shown in the flowchart in FIG. 7, such as name and alarm, are bound only by the size of memory installed on the device. The design of the flowchart and the dialogue is a function of software programming and completely flexible and tailorable to specific applications. In FIG. 9 an example of the outer appearance of a device that encompasses the functions discussed in this patent is shown. Finally, many modifications and variations of the exemplary embodiment specified herein will fall within the true scope of the present invention. For example, the interactive method of controlling the functionality of the device and entering messages into its memory banks, is also applicable to larger computer based systems. And, it is especially helpful in applications where manual or visual contact with the device is limited. In such cases the applications may not be intended for scheduling, but rather, for information entry and functional control in general. In addition, it should be made clear that the audio entry for message storage and the audio entry for voice recognition may be different hardware or software processes depending on the particular recognition algorithm used. Yet another variant of the present invention would include a personal computer link for down loading inputted information, for loading into the device information entered through a personal computer, or for nonvolatile memory expansion. Such a link could of course be used for facsimile transmission of information if it was needed. Accordingly, the scope of protection of the following claims is intended to be broad enough to cover all such modifications and variations.

Claims

We claim:

1. An electronic scheduler, comprising:

(a) a real time clock, comprising means for keeping current time and date; an alarm time register; an alarm date register; means for identifying a match between the current time and a set alarm time stored in the alarm time register; means for identifying a match between the current date and a set alarm date stored in the alarm date register; and means for outputting an alarm time reached signal when the current alarm time matches the set alarm time and the current alarm date matches the set alarm date;

(b) a memory for storing units of compressed digital audio data defining a message;

(c) an audio storage/retrieval processor operatively coupled to said memory, comprising: a microphone for receiving audio signals; amplifier means operatively coupled to said microphone for amplifying said audio signals; an A/D converter for converting the amplified audio signals into digital audio data; data compression means for compressing digital audio data from said A/D converter into compressed digital audio data to be stored in said memory; means for retrieving digital audio data from said memory; means for expanding the retrieved data; means for D/A converting the expanded data; means for filtering the D/A converted data; means for amplifying the filtered D/A converted data to reproduce the original input audio signal during audio playback;

(d) addressing means, operatively coupled to said memory and said audio storage/retrieval processor, for assigning addresses to said units of compressed digital audio data, said addresses corresponding to storage locations in said memory at which said units of data are stored;

(e) keypad means, comprising alphanumeric keys and function keys, for entering text information and alarm time and date settings; (f) display means for displaying information retrieved by said addressing means;

(g) voice synthesis means for synthesizing audible speech, including means for synthesizing an audible indication that the electronic scheduler is ready to accept a message to be stored, and for synthesizing an audible readout of text entries entered through said keypad means.

2. An electronic scheduler as recited in claim

1, wherein said real time clock comprises an oscillator, a time counting unit incrementing continuously based on a reference signal provided by said oscillator, and means for making a periodic comparison between the alarm time stored in the alarm time register and the current time provided by the time counting unit.

3. An electronic scheduler as recited in claim

2, further comprising means for initiating a programmed sequence that sounds an alarm or plays back a recorded message in response to said alarm time reached signal.

4. An electronic scheduler as recited in claim 1, further comprising means for marking selected addresses with an alarm time at which time corresponding data is to be played back or logged into a scheduling network along with other messages to be played back.

5. An electronic scheduler as recited in claim 1, further comprising means for storing context information packets associated with selected messages, said context information indicating the time the associated message was entered; alarm time(s) corresponding to the associated message; the number of times the message has been played; and a date on which the message may be automatically erased.

6. An electronic scheduler as recited in claim 1, further comprising a read only memory (ROM) containing prestored digitized commands, and means for audibly reading out commands appearing on said display means by extracting said prestored digitized commands from the ROM.

7. An electronic scheduler as recited in claim 1, wherein said addressing means comprises one member of a group including a microprocessor and a digital signal processor controlling information flow between all components.

8. An electronic scheduler as recited in claim 1, further comprising a read only memory (ROM) storing firmware, application software, screen message data and prerecorded voice message data.

9. An electronic scheduler as recited in claim

1, further comprising means for grouping entered information into data groups including name, telephone number, address, and message; and search logic means for retrieving from memory all said stored information in said data groups when only a portion of said information is provided.

10. An electronic scheduler as recited in claim 9, further comprising:

(i) speech patterning means for extracting identifying parameters from digital audio data;

(j) speech recognition means for comparing the extracted identifying parameters to each of a group of reference identifying parameters associated with a first reference vocabulary, and producing a match indication as a function of said comparing;

(k) command logic means coupled^' to said speech recognition means to effect the performance of predetermined functions of the electronic scheduler upon receiving said match indication; and

(1) interactive speech control means for controlling the interaction of said command logic means with said voice synthesis means such that said voice synthesis means synthesizes prompts indicating when and which speech commands are to be input.

11. An electronic scheduler as recited in claim

10, wherein said first reference vocabulary is either factory installed and speaker-independent or is created through a training process with spoken utterances or sounds by extracting from said utterances or sounds reference identifying parameters and storing said reference identifying parameters as the group of reference identifying parameters for the first reference vocabulary.

12. An electronic scheduler as recited in claim

11, wherein said speech recognition means comprises means for producing a nonmatch indication when said match indication does not result from the input of a given spoken utterance, said nonmatch indication indicating that said given spoken utterance was not recognized.

13. An electronic scheduler as recited in claim 10, wherein said predetermined functions include: turning on and off the electronic scheduler, retrieving specified stored information, setting the alarm time associated with a particular recorded message, entering name and telephone number, and setting a secret code for limited data access.

14. An electronic scheduler as recited in claim 10, further comprising: (m) a second reference vocabulary containing time information for use by said speech recognition means to extract time information from the extracted identifying parameters; (n) alarm time logic means coupled to said speech recognition means to allow entry of an alarm time upon said speech recognition means producing a match indication; and (o) first interactive speech recording means for controlling the interaction of said alarm time logic means with said voice synthesis means, whereby said voice synthesis means synthesizes speech or sound prompts for indicating the required delivery time of alarm time.

15. An electronic scheduler as recited in claim

10, further comprising:

(p) a third reference vocabulary containing name and telephone number information for use by said speech recognition means to extract name and telephone number information from the extracted identifying parameters; (q) name and telephone number logic means coupled to said speech recognition means to allow entry of an name and telephone number including time of day and date upon said speech recognition means producing a match indication; and

(r) second interactive speech recording means for controlling the interaction of said name and telephone number logic means with said voice synthesis means, whereby said voice synthesis means synthesizes speech or sound prompts for indicating the required delivery time of name and telephone number input.

16. An electronic scheduler as recited in claim 10, further comprising means for audibly confirming information entered by said speech recognition means.

17. An electronic scheduler as recited in claim

1, further comprising a personal computer link to external computer or memory for archiving information stored in said internal device memory or transferring data into or out of the electronic scheduler.

18. An electronic scheduler as recited in claim 1, further comprising secret option means for entering protected information for limited access.

19. An electronic scheduler as recited in claim 1, further comprising auto dialing means for producing a series of audible tones in appropriate frequency ranges to dial a recorded or typed in telephone number.

20. A method for using speech to operate an electronic device, comprising: (a) a step for receiving spoken utterances as audio data and converting said utterances into digital spoken utterance data;

(b) speech patterning step whereby identifying parameters are extracted from said digital spoken utterance data;

(c) speech recognition step; comparing said extracted identifying parameters to each of a group of reference identifying parameters associated with a reference vocabulary, and producing a match indication as a function of said comparing; each word in said reference vocabulary, containing words for device functional command information, obtains an associated group of identifying parameters by forming speech patterns by speech patterning step; (d) command function execution step for performing predetermined functions of the device upon receiving said match indication from speech recognition step;

(e) audible speech synthesis step for producing intended words or sounds upon retrieving from memory, expanding, D/A converting, filtering and amplifying for listening by the user; and

(f) interactive device control step for audibly delivering speech or sound prompts generated through voice synthesizing step, indicating required delivery time of speech commands relevant at each moment of operation, and then using said selected commands to perform one of a group of predetermined functions.

21. A method for using speech to operate an electronic device as recited in claim 20, wherein said predetermined functions include: turning on and off the electronic device, retrieving specified stored information, setting the alarm time associated with a particular recorded message, searching for stored information by providing only a portion of said stored information and setting a secret code for limited data access.

22. A method for using speech to operate an electronic device as recited in claim 20, further comprising interactive speech recording step for controlling the interaction of said command function execution step with said means for synthesizing audible speech or sound whereby said synthesis means audibly delivers speech-like or sound prompts for the purpose of indicating the required delivery time of said audio message input.

23. A method for using speech to operate an electronic device as recited in claim 20, wherein said synthesizing step audibly confirms information entered by said speech recognition step.