WIRELESS AUDIO STREAMING SYSTEM AND METHOD
Field of the Invention
The present invention relates to the field of wireless audio service, and, in particular, the field of audio service provided to a customer or consumer at a mobile or remote location via wireless transmission wherein the audio content is chosen by the customer or consumer ahead of time or in real time from vast number of possible sources available on a network such as the Internet. More particularly, this invention allows • a user to access any streaming audio content, regardless of format, from a wireless device.
Background of the Invention
The Internet has revolutionized the ability of people to access information. The majority of public information that people care about is now on the Internet in one form or another. All of that information is accessible to anyone with Internet access and with basic skills in the process of searching for such information and in otherwise interfacing with the Internet, skills that are now routinely possessed even by children. In short, it is little exaggeration to say that, at least in the industrialized world, essentially all public information is readily available to all people.
This information takes a variety of forms. Some of it is factual information such as news, scientific and medical information, and some of it is entertainment such as music and video programming. The adaptability of digital storage, the universality of Internet protocols and the proliferation of interface modes allow virtually any information to be transmitted over the Internet.
While the first large application for the Internet was print, such as HTML text, audio is now rapidly gaining in proliferation and popularity. One of the reasons for the increase in audio usage on the Internet is the advent of widespread sampling, compression and delivery technologies such as MP3, RealAudio, Windows Media Player and others. Audio can now be delivered routinely either in "streaming" form, that is, the audio information is played and heard contemporaneous with its transmission, or by downloading and playback. Streaming audio has the advantage that it can be "live," allowing the user to hear the audio within milliseconds after its creation or play. This is
useful in news programming, live musical performances, financial announcements, and conversation.
Until recently, Internet access was significantly limited by the capacity and expense of the hard wire connections between the user's personal computer and the rest of the Internet. Traditionally, these hard wire connections have been ordinary telephone lines. Successful attempts to address this limitation have included increasing the capacity of the connections through the use of high capacity wiring such as cable access and improving the efficiency of the information delivery through the use of compression and related technologies such as digital subscriber lines. A trait common to each of these approaches has been the use of hard wiring between the user's computer and the Internet; in other words, people have had access to the information of the Internet only if they were within a few feet of high speed access.
Wireless technology promises to change that. Initially used in cell phones or PCS devices to allow mobile access to the switched telephony system, wireless technology is rapidly being applied to allow people to access the Internet. This expansion is facilitated by the increasing use of digital transmission and by the more efficient use of the radio spectrum brought about by multiple access technologies such as CDMA and TDMA. The evolving use of connectionless wireless wherein data packets are transmitted with address headers rather than by establishing a connection with the destination implies smoother transition between the wireless link and the overall Internet.
It is estimated that a third of the workforce in the United States, about 42 million people, are defined as mobile, i.e., they are away from their offices at least twenty percent of the time. Approximately 42% of the adults in the United States, about 84 million people, use the Internet. By 2004, it is projected that the number of mobile telephone users worldwide will reach a billion people. Wireless carriers will probably offer
"always on" wireless transmissions that support CD quality audio by late 2001. Based on the foregoing, there is a need and niche for a service allowing ready mobile wireless access to Internet audio.
Systems have begun to appear recently to address certain aspects of this niche. These can be broadly classified into two categories: (a) Playback services: systems that
allow download and playback of Internet audio content and (b) Voice browsers: systems that allow on-demand access to text-based Internet audio content.
Playback services: The first category typically requires the user to select and explicitly download Internet audio content to special devices with storage capability. The user can then access this content anytime from anywhere through the device. Although this approach allows the user to access Internet content without needing a connection to the Internet, it has several drawbacks. The first drawback is that special devices that have the ability to store content are required to enable this service. Further, this approach also limits the content available to the user to that selected during download; live content such as news feeds are not available. Third, the amount of content is limited by the storage capability of the device. Finally, storage of audio content is liable to copyright and ownership issues. Examples in this category include offerings by companies such as Voquette, Audible, and Audio Basket. These systems allow the user to download content to devices such as an MP3 player. CommandAudio offers a similar service for storage and playback of "radio" content through special FM receivers with storage capability. Voice browsers: The second category includes systems that allow access to Internet audio "on demand"; i.e. they do not require storage of content. This typically involves the user accessing Internet content through a (cellular) telephone or a handheld device with wireless access. The existing offerings in this space include the so-called "voice browsers" (e.g. Tell Me, Be Vocal, Quack). These systems allow the user to access categorized Internet content through voice commands. The categories typically include stocks, weather, news, etc. In most systems the delivery of content is accomplished by converting textual content to audio through text-to-speech conversion or by playing out pre-recorded audio samples. This approach has several limitations. First, these services provide access to limited content, depending on the categories made available by the service provider. As a result, it is not possible to personalize the service much. Also, this approach restricts the type of content that can be provided to users. Specifically, only text-based information can be played back through text-to-speech hardware and software. These systems are typically not able to play back "streaming" content. As a result of this, non-text content (such as music) is not available through this
method. Also, live content such as Internet radio cannot be accessed. Further, producing pre-recorded content is an expensive proposition as the amount of content increases.
There are other systems that enable access to pre-recorded audio content from PC- based remote devices. For instance, the Intouch system allows PC-based users to play back pre-recorded audio content, which is stored at a central server, based on user profile. However, this system is limited to PC-based end-devices that contain standard decoding software such as a RealAudio player. The system does not support wireless devices, such as cellular phones, that do not have decoding software for Internet audio formats.
It is thus apparent that there is still a need for a system that will deliver pre- existing and live wireless streaming audio that accesses the "vast archive" of the Internet. Preferably, such a system would be capable of accessing all audio that is available on the Internet, regardless of the format, convert it to a desired format, and deliver it wirelessly to a mobile user device for play. The mobile device typically lacks audio decoders that are prevalent on PC-based systems. Our invention addresses this requirement and allows access to any type of Internet content, without requiring special hardware devices. The following table compares the key features discussed above. The content types include textual content, music, and live content. The access type characterizes the access features available with each service. Personalization specifies the flexibility in selecting content. On-demand access specifies whether Internet content can be accessed when desired.
Summary of the Invention
The present invention addresses the needs described above by providing a system for true audio Internet access through a mobile wireless device.
Specifically, the invention describes a system that allows any streaming audio content, anywhere on the Internet, to be delivered to any mobile device. The service operates in two steps: (a) the user creates a profile of preferred audio programming by accessing an Internet web site (b) the user calls a pre-specified phone number to access the streaming audio from a mobile device.
The system comprises two key components: a Central Service Manager for managing user profiles, and a set of Media Interfaces (media translation and user navigation system) to deliver the audio to mobile devices.
The central service manager (CSM) is linked to the Internet via high speed wired access. The CSM provides a Web interface for the user to create a personalized audio profile. Before using the service, a user accesses the CSM via the web interface to create a profile containing an index of selected audio programming. The user can choose the programming from all the streaming audio content that is available over the Internet. The CSM is in communication with multiple Media Interfaces (MI). The Mi's are located at hosting provider sites, customers' PBX, or within central offices of carriers. To use the audio service, the user accesses the service through a user device by connecting to the MI. On receiving a request from the user, the MI requests and receives streaming audio content from various audio content servers already linked to the Internet. The MI converts the audio received from the audio content servers to a desired format and then sends it out to the user devices.
The MI also provides an interactive navigation system to access the audio content. This navigation is provided in a variety of ways including interactive voice response, text-to-speech, and speech recognition. The user device itself can be an ordinary cell phone, a telephone, a dedicated audio playback device, or anything else capable of linking to the MI and receiving and playing the desired audio.
A significant advantage of the present invention includes the fact that the user can access audio without regard to the format of the audio, because the format is converted to a system format at the MI level. Further, the delivery of streaming audio in this format makes it possible to provide access to different forms of audio, including live Internet radio and music.
Brief Description of the Drawings
Fig. 1 shows a flowchart of the operation of a preferred embodiment of the system from the user's perspective.
Fig. 2 shows in schematic form the overall system architecture of a preferred embodiment of the Invention.
Fig. 3 shows the components in the Central Service Manager.
Fig. 4 shows the components within the Media Interface.
Detailed Description of the Invention
Operation of the service: In operation, users first subscribe to the service by accessing a web interface through a PC or a mobile device with data connection to the Internet. Examples of devices include a personal computer, a mobile laptop computer with a wireless link, or a PDA with a wireless connection. The user creates a password- controlled account associated with a frequently used phone number (or login ID) for using the service. The user specifies preferences and settings such as the nature of the audio programming that is desired and the times that the programming is to be delivered. The audio content can be selected from pre-defined categories or from any audio link on the Internet. The user can also assign specific keys on the telephone dial to specific content (e.g. key 1 for news, key 2 for sports update). The user can update the personalized preferences at anytime through a Web, WAP (Wireless Application Protocol www.wapforum.com), or other data interface over a PC, a handheld device, or any data- enabled device connected to the Internet.
The user accesses the service through a user device (e.g. cellular phone, POTS phone, or a PDA (Personal Digital Assistant) with a cellular connection and audio capability). The user connects to the service by calling a pre-specified phone number. The MI identifies the user either through caller-ID recognition or through query and validates the identity of the user through a password check. The MI then retrieves the profile associated with the user. The MI uses text-to-speech conversion to read out the
choices in the user's profile. The user makes a content selection and starts receiving the corresponding audio programming. The user can navigate through the choices through an interactive interface, comprised of voice response, text-to-speech, voice recognition, and graphical menu-based commands. Typical navigation commands include play, stop, pause, go to main menu, etc. These commands can be entered either via pressing specific keys on the keypad or by saying these commands. The user can also specify the content of interest through a voice input. Another means of content selection is through an arbitrary search capability. The user specifies a certain category of content by. saying a keyword. The MI recognizes the speech, does a search on that keyword, presents the results to the user, receives the selection, and then plays out the selected choice. The search is typically done on content already selected within the user's profile. It is also possible to search the entire Internet for content, based on the requested keyword. Figure 1 shows the operation of the system in one preferred embodiment using a phone interface.
The specific embodiment of Figure 1 operates as follows. When the user calls in, the system plays back a welcome greeting (1). If caller ID is available, the system identifies the user through the user's phone number and prompts the user to enter a passcode (4). If there is no caller ID, then the system prompts the user to enter a phone number (2). If the phone number does not match one in the database, the system prompts the user again (3). This repeats until a maximum number of retries is reached or until a matching phone number is obtained. In step 4, the user's passcode is retrieved. If the passcode is invalid, the system prompts for a valid password (5). If the passcode is valid, the system plays out the main menu options (6-7). An example of this is "Press 0 for commands, press # to go to main menu, etc.". At this point, the user responds by entering a choice. Suppose that the user decides to listen to the items in the profile. The system checks if there are clips available in the profile. If not, a message indicating this is played out (9) and the system returns to the main menu. If there are clips in the profile, the system plays out the list of selections available in the user's profile (11). The user responds by selecting a selection. The user could also enter other options to return to main menu or to hear the list of commands again. Once the user has selected a clip, the system plays out an advertisement (12). As the ad plays out, the system makes a
connection to the appropriate content server. If the connection is not established by the time the advertisement plays out, the system issues a wait message (13). The system then plays out the selection to the user. At any point during the playback, the user may press # to return to main menu, 0 to hear commands, or 5 to pause. If the stream is a live stream, the system notifies the user that the selection is a live stream (17). The user can then choose to resume by pressing 6. If the stream is not live, the system notifies the user that a stream has been stopped (16) and queries if the stream should be bookmarked (19). The system bookmarks the stream for subsequent playout if the user responds with a bookmark request. This information is used during the content selection before step 12. The system checks to see if the stream had been bookmarked, and if so, fetches the appropriate location within the stream before playout in step 12.
System architecture: An overall view of the architecture of a preferred embodiment of the invention is depicted in the diagram of Fig. 2. A Central Service Manager 10 (CSM), which is connected to the Internet, provides the interface 12 for personalizing the audio content selection. The interface is typically a web page that is accessed through a personal computer 8 or a handheld device 9 connected to the Internet. The user profile information is stored in a database 13 at the CSM. The CSM is in communication with Media Interfaces. Although the simplified diagram shows two Mis, 4 and 6, it will be appreciated that in reality there would be many such Mis. The MI is a gateway between the telephone network and the Internet. The Ml and the CSM exchange user profile data. The user profile is replicated in a local database associated with each MI. The simplified diagram shows two databases 3 and 5 associated with Ml 4 and 6 respectively.
The MI is the access point for the service. The user device may be any device capable of playing audio and communicating with the MI . Illustrative devices include "Plain Old Telephone Service" ("POTS") , a current-generation cell phone 1 with circuit- switched access, and anything else capable of performing the desired function (e.g. a PDA 2). The simplified diagram shows only one device of each type connected to the MI, but it will be appreciated that in reality there would be many such devices in each category.
The MI retrieves audio content based on the user's profile and delivers it to the user's device in appropriate format. The selected audio content could be located at a server 7 anywhere on the Internet or specifically hosted at the CSM at a content server 11.
Figure 3 shows the details of the Central Service Manager (CSM). The components of the CSM include a web server 3, a wireless data server 4, a database 5, an advertisement server 6, billing system 7, and a content management interface 10. Users create a profile of selected audio content through the web interface 1 or an alternate wireless data interface 2 (e.g. WAP, imode). The user profile information is stored in the database 5.
A central database is maintained at the CSM to allow a central storage location for the user profile data. This makes it possible for users to access the web server from any location.
The advertisement server 6 contains advertisements that are delivered to the user depending on demographics. The advertisements served, as well as user demographics are captured in the database 5. The billing system 7 is used to bill the user for premium or subscription-based content.
The CSM contains content servers 8 for content that is hosted at the CSM. A cache 9 may be included to reduce the load on the content server. The content management interface 10 provides an interface for managing the content hosted on the CSM. This includes mechanisms to upload and update the content on the content servers.
In a specific embodiment of the CSM, the web server resides on a Unix platform, the database is an Oracle 8 database on a Unix platform, the wireless data server is a Ericsson WAP server, the content server is a RealAudio or a WindowsMedia streaming server on a Unix platform, and the cache is NetCache Cl 100 appliance from Network Appliances.
Figure 4 shows the details of the Media Interface (MI). The MI connects to the Internet on one end 1 (e.g. through a Tl connection) and to a telephony interface 2 (e.g.
an ISDN PRI line) on the other. The MI has two main functions: (a) providing delivery of the audio content and (b) providing a navigation system for the user interface. In the current embodiment, the MI enables delivery of RealAudio, mp3, and WindowsMedia format content to 1st and 2nd generation cellular phones and regular POTS phones. However, it is obvious to someone skilled in the art that the same technology can be applied to deliver other forms of content, including QuickTime, EPAC, and other proprietary audio codecs.
Content delivery includes the following steps: (a) determining the format of the audio stream, (b) converting the audio content to an appropriate format, and (c) delivering the content to the end-device.
The MI detects the content format in step 3 by checking the file format of the particular item in the user's profile (e.g. RealAudio). An appropriate media decoder (e.g. Real Player) is invoked in step 4 to decode the incoming stream. The decoder operates on the incoming stream in real-time. The decoded stream is then converted to a form suitable for delivery over the circuit switched telephony network in step 5. Specifically, the decoded content is converted to the PCM (pulse coded modulation) format, which is the underlying transport mechanism over existing circuit-switched telephony networks. The decoded PCM signal is then written to the buffers on the line interface card that resides in the MI in step 6. The card outputs the signal over a channelized voice line (e.g. ISDN PRI) that is connected to the card. This signal is thus delivered into the PSTN. The PSTN routes the signal to the calling cellular phone over the wireless network. The key advantage of the PCM format is that the service is agnostic to the wireless bearer. In other words, the service works seamlessly with different transport mechanisms such as TDMA, CDMA, and GSM. The PCM format also enables delivery to any wired phone.
The MI allows the user to navigate through the profile using a combination of
DTMF (touch tone), text-to-speech conversion, voice recognition, and menu-based graphical technologies. In the case of DTMF based control, the user presses keys on the telephone to make menu selections or to navigate through the options. The system interacts with the user by reading out profile selections and menu options through text-to-
speech conversion or by transmitting decoded streaming audio content. The user can also communicate with the system through automated speech recognition. In the case of end- devices with graphical capability, such as WAP or imode based-phones, the user can also navigate through the system through menu-based commands.
The MI is connected to a database 7 that replicates the user-profile database at the
CSM, This replication reduces the traffic to the central database. It also reduces the latency for retrieving user profiles. The database is replicated through periodic update messages from the CSM. Note that a separate database is still maintained at the CSM to allow centralized access to the user profile data from a web interface.
In a specific embodiment, the MI uses Dialogic D/480SC-2T1 hardware for the telephony interface, Nuance software for speech recognition, and Lernout & Hauspie hardware (TTS 2000/T) for text-to-speech conversion.
In a deployment of the complete system, the MI and the CSM could be collocated at a facility, or the Mi's could be distributed among different geographic areas. Revenue model: The wireless audio system described in this invention can be deployed in several models.
First, the system can be set up as a subscriber-based service sold to end-users. The service operator therefore realizes revenue through subscription fees. These fees can be based on the kind of audio that is delivered, the time of delivery, the quantity, and any other factors desired. Fees can also be charged to advertisers who present advertisements accompanying the requested audio. The requested audio can also be bundled with other fee-based information. The expenses of the service operator will generally include fees charged by the owners of the intellectual property rights to the audio that is delivered through the system, equipment costs, as well as operating costs for Internet bandwidth and telephony support.
A second mode of operation is to deploy the service through "private labels". In this model, content providers (such as a news channel like CNN) provide a "wireless" access for their service. The service itself is operated through the system described in this invention, although it is branded by the content provider. The content provider may charge the end-user for the service or provide it for free to improve its brand. The content
provider provides a license fee and/or usage-based fee to the service operator. In addition, if the service includes advertisements, the service operator may receive a share of the revenues.
A third model of operation is through an alliance with a cellular carrier. In this case, the carrier offers a service to its end-user. The service could be subscription based or free. The service operator runs the service and receives a revenue share and or a license fee from the carrier.