Method and System for Rapid Navigation in Aural User Interface
TECHNICAL FIELD OF THE INVENTION
The invention relates to user interfaces for accessing digital devices and services. More specifically, the invention is a method for enhancing the usability of voice and multimodal user interfaces. The user can choose from two modes for navigation through menu structures, one of which is a standard mode for normal use and the other a rapid mode for experienced use.
BACKGROUND OF THE INVENTION
User interfaces for digital devices and services typically use menu hierarchies both (a) to inform the user of alternative selection options as well as (b) to provide the means for the user to navigate through the menu layers to the desired selection option. Typically, the menu hierarchies are presented in visual form. In situations, e.g. when hands-free and eyes-free use of digital devices and services are desired, a vis- ual presentation of the menu hierarchies is not feasible. Situations like these occur e.g. in cars and other vehicles or in case of people with visual impairments, few to mention.
In such situations as mentioned above a menu hierarchy can be presented in aural form using for example text-to-speech (TTS) synthesis or predefined voice prompts to inform a user of alternative selection options. However, although an aural or voice user interface is suitable for informing the user of the alternative selection options, it has a drawback to be relative slow in presenting information to the user. When a user is navigating through the menu structure for the first time, this slowness is justified due the need to inform the user of the alternative selection options and to initiate him into the menu structure. After several repeated times of navigating the menu the user learns the menu structure, and the slowness brings a major disconvenience on the user. Nevertheless, a need for navigating through the menu layers remains, but the user would prefer faster means of going through the menu layers.
There are known several ways to speed up the navigation process in user interfaces. The first way is to accelerate a speed of the entire TTS synthesis or voice prompts presentation. This means that the aural presentation of the menu remains in the original form but pronunciation is uttered faster which may exacerbate understand-
ing the menu items. Another known way to make navigation process faster is to provide option for the end user to customize the user interface by creating manual short-cuts. This is well-known and widely used in the field of web browsing. Still another way to speed up the navigation process is to provide automatic short-cuts to digital services for the end user. The latter two ways are mainly used in visual user interfaces and they expect rather many steps to be taken by the user to reconcile the menu structure with the short-cuts. It is also possible to arrange recommendations of most wanted menu items based on the earlier user behavior but this upsets the original menu structure and complicates navigating through the menu items infrequently used. However, all these above-mentioned alternatives can be used complementary to the present invention described in section Summary of the invention.
One known method to accelerate a TTS client application is described in the document US 6188983 "Method for Dynamically Altering Text-to-Speech (TTS) Attributes of a TTS Engine not Inherently Capable of Dynamic Attribute Alteration" [1]. This document makes known a method that enables a TTS client application to change e.g. pitch and speed while playback is in progress. This capability can be used when TTS engines don't allow these modifications to be made dynamically. This method is restricted to adjust TTS parameters like pitch and speed of TTS playback by the user without stopping playback. This method is targeted for general purpose use in connection with TTS playback where the text can be read faster or slower depending on user request but the contents of a text remain in the original form. Simply, this means that the user has to listen to the same litany of voice prompts (and text) every time again and again whenever navigating the menu items. This is inconvenient for the users that are well familiar with the contents of a voice prompt (and text).
Another known method for using TTS or predefined audio prompts in user interfaces is described in the document WO 01/45086 entitled "System and Method of Voice Browsing for Mobile Terminals Using Dual-Mode Wireless Connection" [2]. Here, interactive voice response services are used by means of a voice mode and a data mode for alternately transmitting voice and data between the mobile terminal and the server application. During one call there is a capability of swapping between speech and data. The speech content, user input and command vary according to the particular voice application. The main idea in this method is to use a limited sized grammar to improve speech recognition in general on terminal-side but it doesn't influence the navigation properties e.g. speed in the aural user interface itself. Again, this means that the user has to listen through the same speech every
time which is inconvenient for the users that are well familiar with the contents of a speech.
Therefore there is a need for especially experienced users to improve the navigation characteristics in aural user interfaces for accessing digital devices and services. To assure user-friendly hands-free and eyes-free operation of the user interface a flexible and fast navigation process is an essential need.
SUMMARY OF THE INVENTION
The object of the invention is to provide a method and system which allows rapid voice scrolling of menu items to the user in aural user interfaces. The objective of the invention is achieved by activating the rapid mode in which mode voice prompts corresponding to menu items are shortened significantly and hence the user interface is faster than in prior art. In the standard mode when the rapid mode is deactivated full-length voice prompts are used.
The advantage of the invention is that those users who are well familiar with the menu structure easily activate and use the rapid mode to avoid themselves a slowness of the standard mode user interface. Meanwhile in unfamiliar usage situations the users may choose navigating through menu items in the standard mode. The invention enhances the usability of aural and multimodal user interfaces providing for experienced users a faster way to present menu options, if desired. Otherwise the user can continue with standard mode in normal order. Also an advantage of this invention is that it can be used in association with other methods and systems to speed up navigation process.
The method of present invention relates to a method for navigating in user interfaces for accessing digital devices and services, the device comprising at least an aural or multimodal user interface and connection to the network device, is characterized in that, it comprises steps, in which a second mode for an aural user interface is activated and said second mode comprising different and shorter content than the first mode.
The system of present invention relates to a system for navigating in aural user in- terfaces for accessing digital devices and services, the device comprising means for at least an aural or multimodal user interface and means for connection to the network device, is characterized in that, it comprises means for activating and deactivating a second mode for an aural user interface.
According to the present invention a network device is a server in a network or a computer device in a network using peer-to-peer connection.
One preferred embodiment of the invention is to use TTS engine embedded in the terminal device. In some embodiments the TTS engine can be in the network and accordingly the TTS engine should provide both standard and rapid prompts to the terminal device or, alternatively, either standard or rapid prompts depending on the current mode of the terminal device. One embodiment of the invention is to use predefined voice prompts both in standard and rapid mode. In one simplified embodiment of the invention the voice prompts in the rapid mode are just short audio signals e.g. beeps. In some embodiments of the invention the rapid mode is activated by a special key or by a long press of menu up and menu down key. In some embodiments of the invention the rapid mode is activated by the long press on menu up/down function key for a fast forward/reverse operation and deactivated by releasing said menu up/down key for a standard forward/reverse operation.
Some embodiments of the invention are described in the dependent claims.
BRIEF DESCRIPTION OF THE DRAWINGS
The foregoing and other objects, features and advantages of this invention will be apparent from the following more particular description of the preferred embodiments of the invention as illustrated in the accompanying drawings.
Fig 1. is a flow diagram of a method for selecting a mode according to the invention.
Fig 2. is a flow diagram of a method for rapid navigation in aural user interfaces according to one embodiment of the invention.
Fig 3a. is a flow diagram of a method for rapid navigation in aural user interfaces according to another embodiment of the invention.
Fig 3b. is a flow diagram of a method for rapid navigation in aural user interfaces according to an optional embodiment of the invention.
Fig 4. is a block diagram of a system for rapid navigation in aural user interfaces according to one embodiment of the invention.
Fig 5. is a block diagram of a system for rapid navigation in aural user interfaces according to another embodiment of the invention.
DETAILED DESCRIPTION
According to the invention an aural or voice user interface has two modes for navigating through menu structures and the two modes are a standard or normal mode and a rapid mode. In this document a standard mode is called a "first mode" and a rapid mode is called a "second mode". In user interfaces menu options (menu items) can be presented in aural form using e.g. voice prompts generated by text-to-speech (TTS) synthesis or predefined voice prompts. When the second mode is activated, voice prompts corresponding to menu items are shortened significantly by presenting only e.g. a first syllable of each voice prompt. When the second mode is deacti- vated the first mode is used where voice prompts corresponding to menu items are presented as full-length voice prompts.
Fig 1. shows a flow diagram of the two modes for navigating in aural user interfaces according to the invention. The first mode is depicted by block 11 and the second mode depicted by block 12 and the change between these two blocks is activated by giving a sign by the user e.g. by pressing a separate key, by a long press of the menu up/down key, by a special voice key or by giving a voice command. The selection of the mode is independent of the actual location of the navigation process i.e. the user may change the mode whenever he wants to do so. In case of timeout situation which occurs when e.g. a large block of operations may disconnect a network con- nection automatically after a fixed period of time, the mode automatically changes to the first (normal) mode. So, after timeout the first mode is a default mode.
Fig 2. shows a flow diagram of a method for navigating in aural user interfaces according to one embodiment of the invention. The user starts a navigation process in the terminal device according to step 100 and a main menu is available for selecting a desired menu option as shown in step 102. Lets presume that the menu is fully voice-enabled and presented to the user using a TTS engine in eyes-free situation e.g. while driving a car. In case of a mobile phone as a terminal device the top-level menu structure of the main menu can be in the written form (in aural form in reality) the following:
(1) messages
(2) call register
(3) profiles
(4) settings
(5) games
(6) calculator
(7) task list
(8) calendar
(9) infrared
(10) radio
(11) extra functions
(12) services
The user would like to add a note on the item (7) "task list". However, he doesn't' remember whether said item is called "task list", "notebook", "post-it stickers", "to- do-list", or any number of similar expressions. But as an experienced user he does remember that the desired item is somewhere around the middle in the menu structure.
According to the invention the user has alternative ways to proceed. In step 104 in figure 2 he can make a choice for navigating in the standard (first) or rapid (second) mode. If he wants to change the mode from the default mode which normally is the first mode, in step 106 he can select the second mode by giving a sign e.g. by pressing or releasing (stop pressing) a separate key, by a long press of the menu up/down key or releasing the menu up/down key, by a special voice key or by giving a voice command. He can conventionally select a direction to browse the menu up or down. Lets presume the user's selection in step 106 is a long press of menu down key. In step 108 the next item is fetched from the menu. In step 110 it is checked which of the two modes have been selected. In case of the second mode the menu item is uttered in the shortened form e.g. the first syllable of the text "messages" can be "me" according to step 112. If this is the selection desired, the user gives a sign by pressing a key or by giving a voice command corresponding to "yes" in step 114 to move on to step 116 where the right item is selected. In this example "me" is not the right selection and the user gives a sign by pressing a key or by giving a voice command corresponding to "no" in step 114 to proceed to step 124 for the next new menu item. If there are any items left in the same menu, the user returns from step 124 through step 126 to step 106 again. If the user wants to stop the navigation process or if he wants to select a new menu or submenu he continues from step 124 to step
128. If all the items of the menu are navigated, he also continues from step 126 to step 128 for a new selection of a menu.
In this example the user had an idea that the item "task list" is somewhere around the middle in the menu structure. This means that there is no need to change the mode in step 104 and according to the invention the user continues in the second mode through menu items 1 to 5 and these items are uttered e.g. "me - ca - pro - se - ga" instead of "messages - call register - profiles - settings - games", before the right selection option would be presented to him. In other words, according to the invention the user can fast-forward the first five or so items, and listen in detail only that area of the menu in which he is reasonably confident the right menu item choice is in.
When in the area of the menu where the desired item most probably locates, the user can decide to change mode from the second to the first mode in step 104. The selection is made in step 106 deactivating the second mode by giving a sign e.g. by pressing a separate key, by a long press of the menu up/down key or releasing said key, by a special voice key or by giving a voice command. This sign can be any of the aforementioned and it can be independent of the previous sign used to change the mode if more than one sign is in use for changing mode. In case if only one sign is in use for this purpose in the terminal device the user gives the same sign once again to change the mode. After deactivating the second mode in step 106 the condition expressed in step 110 is not met and the selection will be the standard mode i.e. the first mode according to step 118. Lets presume that after changing mode according to steps 104 and 106 the next menu item is number 6 "calculator". Then the next menu item is expressed in the first mode in step 118 in the full-length form "calculator". In this case it is not the right choice and according to step 120 the user goes on the menu structure through steps 124 and 126. Then again in step 104 is the possibility to change the mode, but in this case the user doesn't do so and continues with the present mode. Then in step 118 the next item is uttered "task list" which is the right selection to the user and he selects this item in step 122 to add a note to the task list "task list". So in this example, to this far the first seven menu items are presented in top-level menu up direction in the form "me - ca - pro - se - ga - calculator - task list" according to the invention. After this the user can choose in step 124 (and 126) whether he wants to make a new selection of menu in step 128 or end navigation in step 129.
It should be noted that in figure 2 the loop comprising steps 104 and 106 can situate anywhere in the flow diagram i.e. the mode change can be performed whenever the user wants to do so. This same loop is also depicted in figure 1.
One preferred embodiment of the invention is to activate the second mode by a long press of the menu up/down key is steps 104 and 106 according to figure 2. While keeping the menu up/down key pressed down for a period of time t, rapid voice scroll in the second mode is executed in a loop according to steps 108, 110, 112, 114, (116), 124, and 126. After time period t the menu up/down key is released which means that a sign according to steps 104 and 106 is given by stopping a long press of the menu up/down key (i.e. releasing the key) and the mode is changed to the first mode. Then, standard voice scroll in the first mode is executed in a loop according to steps 108, 110, 118, 120, (122), 124 and 126 until the mode is changed again in steps 104 and 106 whenever the user wants to do so.
According to one embodiment of the invention in the second mode a first syllable of each item can be replaced by a simple audio signal such as a beep "di". In this case the first seven menu items are presented in top-level menu up direction in the form "di - di - di - di - di - calculator - task list" according to the invention. According to some embodiments of the invention instead of a beep other audio signals or aural characteristics generated by the TTS engine or predefined voice prompts e.g. a tone, pitch or any combination of aforementioned can be linked to each menu item or each group of menu items.
To summarize, one embodiment of the invention is a fast voice scroll of menu items in a mobile phone user interface. Selected menu items can be fast-forwarded or fast- reversed using the rapid (second) mode navigation according the invention and then the right selection option can be presented in the standard (first) mode. This facilitates the menu navigation especially in hands-free and eyes-free usage situations.
Another embodiment of the invention is to use the information of past device and service usage behavior of the user as a basis for recommendations for the right selection option. The most likely option can be presented in different tone, pitch or other voice characteristics to facilitate option selection. This kind of enhancement is possible in terminal devices that implement a "recommendation engine" for recommending services e.g. mobile services (m-services).
According to still one embodiment of the invention is a menu of music items of a voice-enabled MP3 player which is either embedded as one application in a mobile
phone or as a separate accessory device. The user can select between different pieces of music selecting from a menu of music items e.g. names of pieces of music. However, if all the names are provided to user interface via TTS synthesis the process is very slow. Moreover, in particular in case of experienced user, he will remember a rough order of the pieces. In this case according to the invention it is possible to fast-forward or fast-reverse those areas of the menu where it is not likely the desired music item exists.
Still some other embodiments of the invention are to follow. In case the menu items are news headlines in the user interface of the Internet browser or alike application in terminal device, the headlines can be arranged in a way that similar topics are next to each other. Thus the user can either listen to the headlines in the first mode or radiply voice scroll over topic areas not interesting him in the second mode according to the invention. In case of email the menu items can be email headers prefixed with the name of the sender. Thus the user can rapidly voice scroll over mes- sages from senders not interesting him at the moment. In case of personal organizers, mobile phones, PDAs or other similar devices the menu items can be a calendar or task list entries arranged in order of due time and date. Thus according to the invention the user can rapidly voice scroll to entries corresponding to his rough time of interest e.g. from notes for today to notes for the next days to notes for next months.
Figure 3a shows a flow diagram of another embodiment of invention. In addition to the embodiment of the invention described in figure 2, there is a possibility to set an attribute corresponding to the menu items or groups of menu items in the aural user interface in the second mode loop in step 111. This attribute is selected by the user by a predefined sign e.g. by pressing or releasing a key or a combination of keys or by giving a voice command corresponding to the attribute. The attribute can be certain menu items or group of items in the menu structure corresponding to its position in the menu defined by serial numbers or alpha-betic order of items, certain types of news headlines or music items, names, email addresses or headers prefixed with the name of senders, time or date or any other kind of similar attribute selection criteria. The attribute can also correspond to information of the previous usage behavior of the user. When the attribute option is set in step 111, the items or group of items selected by the attribute are run automatically in the second mode, without needing any measures taken by the user, until said attribute is not valid anymore, i.e. the menu has been scrolled down to an item that does not fulfill the attribute criteria. When the attribute is valid for the item or a group of items in the second mode
navigation in step 113, the "attribute loop" is run through in the second mode according step 112. When the attribute is invalid the user is asked according the embodiment of the invention described in figure 2 to select or not to select a new mode in this case the first mode navigation according steps 104 and 106. All the other steps illustrated by dash line relating to the embodiment of figure 3a are the same as depicted in figure 2. The attribute can be defined among menu items in different ways e.g. it can be a group of items beginning with the same letter or presented by same audio signal (beep) or some other way alike. All other embodiments of the invention described in association with figure 2 are also feasible with this attribute feature. Accordingly, the aforementioned attribute option according to figure 3a is also applicable in the standard mode loop in figure 2 according to the invention.
Figure 3b shows a flow diagram of an optional embodiment of invention. In addition to the embodiment of the invention described in figure 2, there is a possibility to set an attribute corresponding to the menu items or groups of menu items in the aural user interface in the second mode loop in step 111. This attribute is selected by the user by a predefined sign e.g. by pressing or releasing a key or a combination of keys or by giving a voice command corresponding to the attribute. In this embodiment the attribute is certain menu items or group of items in the menu structure corresponding to successive prompts having similar contents. E.g. in case of email messages the menu items can be email headers prefixed with the name of the sender. According to the embodiment of the invention illustrated in figure 3b, the selection for an attribute for rapid (second) mode is made in step 111. Lets presume in this example that the selection for the attribute is the name of the sender of the email message. Now, the user can rapidly voice scroll over messages from senders not interesting him at the moment as shown in figure 3a. There might be many messages from the same sender and the messages are expressed by the name of sender in the second e.g. in the chronological order where the latest message is presented first. For the name "John Smith" the voice prompt in the second mode could be "John". If he has sent five email messages it is presented in form "John- John- John- John- John" according to the rapid mode navigation illustrated in figure 3a. According to the optional embodiment illustrated in figure 3b the attribute for the rapid mode is a voice prompt "John". The latest message from John is presented in the second mode as a voice prompt "John" according to steps 113, 115 and 112 in figure 3b. According to the optional embodiment this second mode prompt "John" be- haves like a "virtual" first mode prompt, because similar successive prompts are to follow. In this case the prompts with similar contents in succession are replaced by a "virtual" second mode prompt e.g. by audio signal "bib" according to step 117
where the "rapid rapid mode" is automatically activated after a voice prompt "John" is presented as an menu item for the latest message. As a resault of this example, there is presented in the rapid mode navigation a litany of prompts "John-bib-bib- bib-bib" instead of repeating the name five times. The prompts in rapid rapid mode can be preferable similar or different audio signals or aural characteristics generated by the TTS engine or predefined voice prompts e.g. a tone, pitch or any combination of aforementioned can be linked to each menu item or each group of menu items. All other embodiments of the invention described in association with figure 2 are also feasible with this attribute feature. Accordingly, the aforementioned attrib- ute option according to figure 3a is also applicable in the standard mode loop in figure 2 according to the invention.
In figure 4 is illustrated a block diagram of a system for rapid navigation in aural user interfaces according to one embodiment of the invention. In this embodiment the TTS engine 302 is embedded into the terminal device 30 which also comprises the user interface 300 for at least aural or multimodal input and output, the processor unit 304 associated with the memory 307, DSP unit 305 for signal processing and receiver or transceiver unit 338 for receiving (and transmitting) radio frequency signal transmitted by the application server 35 through the network 33. The processor unit 304 operates to present menu items to the user via the user interface and and the memory unit 307 stores the menu items and optional attributes. Other means that are required for presenting the menu items to the user are loudspeakers, microphones and display associated with suitable drivers illustrated by box 301. As an option also the recommendation engine (not depicted) can be embedded in association with TTS engine to the terminal device. Box 306 depicts input means for acti- vating and deactivating the second mode e.g. by pressing or releasing a separate key, by a long press of menu up/down key or releasing the key, or by using a voice key by giving a voice command. The TTS engine 302 is arranged so that there is a separate "pipeline" for the first mode navigation 310 and for the second mode navigation 312 and means for selecting the right mode 314 and 316 according to the se- lection made in box 306. These means illustrated by boxes 314 and 316 can be e.g. switches. In the network side the arrangement comprises at least an application program 350 and a transmitter 351 in association with the server 35. Communication between the terminal device 30 and application server 35 in the network, or a computer device in a network using peer-to-peer connection, takes place by any known telecommunication system which is compliant with but not limited to, at least one of the following: TCP/IP, CDMA, HSCSD, GPRS, WCDMA, EDGE, UMTS, Bluetooth, Telsedic, Iridium, Inmarsat, WLAN, DIGI-TV and imode.
Figure 5 illustrates a block diagram of a system for rapid navigation in aural user interfaces according to another embodiment of the invention. In this embodiment the TTS engine 454 is embedded into the network side 43 into the application server 45 which also comprises an application program 450, a processor unit 455 and associ- ated memory 451 for signal processing and transmitter or transceiver unit 452 for transmitting (and receiving) radio frequency signal via network 43 to the terminal device 40. As an option also the recommendation engine (not depicted) can be embedded in association with TTS engine. The TTS engine 454 is arranged so that there is a separate "pipeline" for the first mode navigation 456 and for the second mode navigation 458 and means for selecting the right mode, boxes 457 and 459. These means can be e.g. switches. The terminal device 40 comprises the user interface 400 for at least aural or multimodal input and output, processor unit 404 for presenting menu items to the user, memory 407 for storing menu items, DSP unit 405 for signal processing and receiver or transceiver unit 402 for receiving (and transmitting) radio frequency signal from the application server 45 through the network 43. Box 406 depicts input means for activating and deactivating the second mode e.g. by pressing or releasing a separate key, by a long press of menu up/down key or releasing said menu up/down key, or by using a voice key or by giving a voice command. The signal processing is arranged in box 454 so that it provides both the first mode and second mode prompts or, alternatively, depending on the current mode of the terminal device either the first mode or second mode prompts to the box 405 of the terminal device 40. The current mode of navigation is in accordance with the selection made in box 406. Other means that are required for presenting the menu items to the user are loudspeakers, microphones and display associated with suitable drivers illustrated by box 401. Communication between the terminal device 40 and application server 45, or a computer device in a network using peer-to-peer connection, takes place by any known telecommunication system which is compliant with but not limited to, at least one of the following: TCP/IP, CDMA, HSCSD, GPRS, WCDMA, EDGE, UMTS, Bluetooth, Telsedic, Mdium, Inmarsat, WLAN, DIGI-TV and imode.
The present invention is an optional user interface enhancement for the end user that can be used with several other known methods to speed up the navigation process in user interfaces. As examples of these methods here is mentioned three of them. The first way is to accelerate a speed of the entire TTS synthesis or voice prompts pres- entation. The second known way to make navigation process faster is to provide option for the end user to customize the user interface by creating manual short-cuts as well-known in the field of web browsing. The third way to speed up the navigation
process is to provide automatic short-cuts to digital services for the end user. At least all these above-mentioned alternatives can be used complementary to the embodiments of the present invention.
While presently preferred embodiments of the invention have been shown and de- scribed in particularity, those skilled in the art will recognize that the invention is not limited to the embodiments described herein. The invention may be otherwise embodied within the spirit and scope of the idea as set forth in the appended claims.
CITED DOCUMENTS
[1] US 6188983: "Method for Dynamically Altering Text-to-Speech (TTS) Attrib- utes of a TTS Engine not Inherently Capable of Dynamic Attribute Alteration"
[2] WO 01/45086: "System and Method of Voice Browsing for Mobile Terminals Using Dual-Mode Wireless Connection"