« PreviousContinue »
Copyright Date Unique Song ID Retail Channel
(1-3 bits) (16 bits) (24-32 bits) 1 (12-16 bits)
1 USING EMBEDDED DATA WITH FILE SHARING
This application is a division of application Ser. No. 09/952,384, filed Sep. 11, 2001 (noW U.S. Pat. No. 7,756, 892), Which:
(a) is a continuation in part of application Ser. No. 09/620, 019, filed Jul. 20, 2000 (noW U.S. Pat. No. 7,689,532);
(b) is a continuation in part of PCT application PCT/US01/ 22953, filed Jul. 20, 2001; and
(c) claims priority benefit to provisional patent applications 60/232,163, filed Sep. 11, 2000, and 60/257,822, filed Dec. 21, 2000.
These patent applications are hereby incorporated by reference.
This application also relates to U.S. Pat. Nos. 7,055,034 and 7,197,156, Which are incorporated herein by reference.
The invention relates to file sharing systems for computer netWorks such as the Internet, and specifically relates to using embedded data in files to enhance such systems.
With the explosive groWth of the Internet, file-sharing programs have evolved. One popular file sharing program is knoWn as Napster, With a user base that has groWn to betWeen 10 and 20 million users in 1 year. This is one of the fastest groWing products today. Currently, scores of music files can be found from Napster’s database of current online users, and doWnloaded from another user’s computer, in a data transfer scheme knoWn as peer-to-peer file sharing. File-sharing is easily extended to all content, such as done With Scour.com.
In the Napster system, Web site servers store a database of directories of the digital music libraries on the hard drives of thousands of registered users. The digital files of the songs themselves remain on the users’ hard drives. If a user Wants a particular song title, he logs onto the Napster Web site and types in a search query for the title. Client softWare on the user’ s computer connects to the Napster server and receives a list of active users Who have the requested file on their computer. In response to selecting a handle name, the client softWare opens a link betWeen the user’s computer and the computer of the selected user, and the client softWare executing on the tWo computers transfer the requested file.
Many neW file-sharing systems are evolving in Which the database is dynamic and not stored on a central server. One example of softWare With a dynamic database is knoWn as Gnutella. Initially, When a user logs on to the Gnutella netWork, the user doWnloads client softWare from a Gnutella Website. Next, the user types in the Internet address of an established Gnutella user (e.g., from a listing available at the Web site). The client softWare then transmits a signal on the netWork that informs other computers in the Gnutella file sharing netWork of its netWork address and connection status. Once a link With the other computer is secure, the other computer informs other computers of the Gnutella netWork that it has encountered in previous sessions of the user’s presence (e.g., address and connection status).
After this initial session, the client softWare stores the addresses of other computers that it has encountered in the Gnutella netWork. When the client softWare is loaded, it recalls these addresses and attempts to reconnect With the
other computers located at these addresses in the Gnutella netWork. The Gnutella softWare enables users to exchange many types of files. It enables users to issue a search request for files containing a desired text string. In response, the Gnutella clients connected With the user’s computer search their respective hard drives for files satisfying the query. The client on the user’s computer receives the results (e.g., files and corresponding addresses) and displays a list of them. By clicking on a file item in the user interface, the user instructs the client softWare to transfer the selected file.
In another file sharing system knoWn as Freenet, the identity of the person doWnloading and uploading the files can be kept secret. Alternatively, the files could be stored on a central server, but uploaded by users such that the central server does not knoW the origin or true content of the files.
Unfortunately, the file-sharing methodology also alloWs massive piracy of any content, such as text, music, video, softWare, and so on. HoWever, due to the scalability and freedom of distribution With file-sharing, it provides a poWerful tool to share information. As such, there is a need for technology that facilitates and enhances authorized file sharing While respecting copyrights.
A feW examples of the benefits of file-sharing folloW. A file sharing system alloWs unknoWn artists to obtain inexpensive and WorldWide distribution of their creative Works, such as songs, images, Writings, etc. As files become more popular, they appear on more of the users’ computers; thus, inherently providing scalability. In other Words, there are more places from Which to doWnload the file and most likely several files exist in close proximity to the doWnloading computer, thus improving efiiciency. In addition, anonymous file-sharing, like FreeNet, foster political debate in places around the World Where such debate might trigger reprisals from the government.
Current attempts to curb unauthorized file sharing include enforcement of copyright laWs and use of files With content bombs. The current legal enforcement efforts allege that uses of file sharing systems violate copyright laWs. Content bombs involve placing files that appear to be the correct content, but contain alternative content or viruses. For example, a MP3 file can have the middle replaced With someone saying “do not copy songs” instead of the desired music. Neither of these solutions Will help the Internet groW and improve the quality of life, WorldWide.
Current copy management systems alloW copying, but block rendering on equipment if the person does not have rights, Where rendering only refers to reading a text file, seeing an image, Watching a movie, listening to an audio file, smelling a smell file, or executing a softWare program. Although this can limit piracy Within a file-sharing system, it does not improve the system for the user. In fact, this rendering based method of copy protection detracts from the system. This detraction stems from the fact that current copy control systems are implemented on the user’s computer at the time of importing into the secure system, rendering, or moving to a portable rendering device or media, as described in the Secure Digital Music Initiative’s specifications version 1 (available at http://WWW.sdmi.org, and incorporated by reference). In other Words, current copy control systems do not check rights at the time of copying or transfer betWeen computers. For example, the user doWnloads the protected file, and then finds out that he/ she cannot render the file (i.e. play the song). In addition, the user does not knoW if the file is the correct file or complete until after doWnloading and attempting to render the file. More specifically, the file is encrypted by a key related to an unique identifier Within the user’s computer; thus, after copying to a neW computer, the file
cannot be decrypted. In addition, watermarks can only be used after the file has been decrypted, or designed to screen open (i.e. decrypted) content for importation into the user’s secure management system after the file has been copied to their computer.
Another approach would be to use a database lookup to determine whether the content is allowed to be shared. For example, music in the MP3 file format can be determined whether it can be shared by the ID3 song title tag. However, this solution does not scale. Specifically, every downloaded file needs to access and search this central database, and this database’s access does not improve as the file becomes more popular. In addition, the approach can be bypassed by changing the file’ s title tag or filename, although this makes searching more diflicult.
A desirable solution includes embedding data throughout the content in which the embedded data has any of the following roles. The embedded data can have an identifier that has many uses, such as identifying the file as the content that the user desires, allowing the file to be tracked for forensic or accounting purposes, and connecting the user back to the owner and/or creator of the file. The embedded data can be analyzed in terms of continuity throughout the file to quickly demonstrate that the file is complete and not modified by undesirable content or viruses. An additional role is to identify the content as something that is allowed to be shared, or used to determine the level or type of sharing allowed, such as for subscription users only.
The embedded data may exist in the header or footer of the file, throughout the file as an out-of-band signal, such as within a frame header, or embedded in the content while being minimally perceived, most importantly without disturbing its function, also known as a watermark.
In the utilization of this embedded data, the computer from which the content to be downloaded (i.e. the uploading computer) can check to make sure the content is appropriate to be uploaded when the files (e.g., music files) on this computer are added to the central database and/or when the content is requested. Similarly, the downloading computer can also check that the requested content is appropriate before, after or during the downloading process. An appropriate file can be defined as any of the following: the content is allowed to be shared, i.e. it is not copyright material, the file is the correct content, and that the content is complete and does not contain any viruses.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is an overview of peer-to-peer file sharing system demonstrating locations at which embedded data can be used to control file-sharing.
FIG. 2 is a flowchart of an embedding process.
FIG. 3 is a flowchart of a detecting process.
FIG. 4 is a diagram of a file sharing system using embedded data.
FIG. 5 is a diagram of an embedded data format and corresponding database format.
FIG. 6 is a diagram illustrating an arrangement for generating a unique ID based on content.
The following sections describe systems and methods for using auxiliary data embedded in files to enhance file sharing systems. FIG. 1 depicts an example of a file sharing system for a computer network like the Internet. The solution described below uses data embedded in a file to identify a file
as having content desired for downloading, to verify that the content of the file is complete and free of viruses, and to allow the file to be shared among users’ computers at the user’s share level. In many applications, an embedding process encodes auxiliary data in the file during creation, but it may also be embedded at a later time. For example, the file may be embedded (or re-embedded) as part of a file transfer process or electronic transaction where a user is granted usage rights for the file.
FIG. 2 depicts an embedding process for adding auxiliary data to files in a file sharing system. A data embedding process 200 (e.g., steganographic encoder, file header encoder, data frame header encoder, etc.) embeds auxiliary data 202 in a file 204 to create a data file 206 including the embedded data 202. The file may then be distributed in a file sharing system comprising a number of computers or other devices in communication with each over via a network. The auxiliary data embedded in the file is used to manage file sharing operations, and to enhance the user’s experience.
Types of Embedded Data
The embedded data can be placed in the header or footer of the file, throughout the file such as within frame headers, or hidden in the content itself using steganographic encoding technology such as digital watermarking. The file may contain any combination of text, audio, video, images and software, in compressed or uncompressed format.
Auxiliary data used to manage sharing of a file may be embedded in headers and footers of the file for each type. When the data is to be embedded throughout the file, the file can be broken into frames of known size, with a header for each frame including space for embedded data. For MPEG compressed audio and video, these frames already exist. The embedded data can be hidden in copyright, private or auxiliary bits. The data embedded in frame headers can be modified by the audio in any frame and/or encrypted (defined as dynamic locking in U.S. Pat. No. 7,055,034, already incorporated by reference) to improve its robustness to duplication in another content file, a content bomb, or virus.
With respect to watermarking, there are many known techniques for embedding data within software, image, audio, video, and text in the state of the art, and new techniques will evolve, especially for software. Examples of steganographic encoding and decoding technologies are described in U.S. Pat. Nos. 5,862,260 and 6,614,914. The watermark may exist only in one place in the content, several places in the content, or continuously throughout the content. For example, in an audio file, the watermark may be repeated in temporal segments of the audio track. In a still image, the watermark may be repeated in spatial segments of the image. In video, the watermark may be repeated in temporal or spatial segments of the video signal.
Roles of Embedded Data
The embedded data may include an identifier (ID) that serves as an index to an entry in a searchable database that describes or otherwise identifies the content of the file. For example, the database can include elements, where each element comprises an ID, song title, album (or CD) title, release year, and artist name. This database can be indexed by any of these elements, thus improving automated searching capabilities. Specifically, rather than needing to search for “Help and Beatles”, “The BeatlesiHelpl”, and so on, a unique ID can be used in a search query to identify The Beatles’ song Help, and different IDs may be used for different releases.
The user, via an automated search program, only needs to submit a search query including that ID. When searching, the user may be presented with a drop down menu of titles of files from the database that satisfy the search query. The search
program automatically knoWs the ID from the database so that the correct file can be found and doWnloaded from a computer at an address associated With that file in the database. In addition, these IDs could help music be searched by year, Which is desirable to many people Who Want to hear music from their high school or college days.
In addition to facilitating automated searches for content in files, the ID may also be used to track these files. For example, the file transfer system can add the ID of a file to an event log When the file is transferred (e.g., doWnloaded, uploaded, etc.). The specific components of the file transfer system involved in the event logging process may vary With the implementation. Also, the time at Which the event is triggered and logged may also vary.
The client system responsible for sending a file may issue and log an event, and either store the log locally, and/or send it to a central or distributed database for communication to other systems. The client system that receives the file may perform similar event logging actions. Additionally, if a server system is involved in a file transfer, it may also perform similar event logging actions. For example, the server may transfer the file, or facilitate the transfer betWeen tWo clients, and as part of this operation, log an event of the operation including the file ID, the type of event, etc. In distributed systems Where no central server is involved, the event logs can be stored on computers in the file sharing netWork (or a sub set of the computers), and composite event logs can be compiled by having the computers broadcast their event logs to each other. Each computer, in this approach, could maintain a copy of the event log, Which is synchronized upon each broadcast operation. The log could be used to account for all file transfers, and be used to properly pay the rights holders.
Another use for the embedded data When it contains a unique ID, such as unique to the retailer, song, artist and/or rights holder, is to link the consumer to more information, such as information about the retailer, song, artist and/or rights holder. The ID couldbe used to link to the retailer’ s Web site, Where the consumer can find additional songs in the same genre, year and by the same artist. Or, the ID could be used to link to the artist’s Web site Where the consumer finds additional information about the artist and song, and can locate other songs by the artist. Or, the ID could be used to link back to the rights oWner, such as the record label Where the consumer can find additional information and music.
This connected content link could be displayed by the file sharing application during the doWnloading process. This provides the user With benefits of not Wasting time during the doWnloading process, and gaining access to more music and information. The file sharing company can use this process to increase the revenues generated from the file sharing system through deals With the companies Who gain access to the user via the connected content links.
The unique ID could be generated from the content, such as done With CDDB, Which generates an ID from a CD’ s table of contents (TOC), and then steganographically embedded into the content. Alternatively, the unique ID may not be embedded but inherently linked to the content via a hash or fingerprint function that turns some or all of the content into a feW bits of data. The number of bits alloWed determines the likelihood that different files transform into the same number of bits. HoWever, even With as feW as 32 bits, this is unlikely. In addition, this is less likely if the hash function prioritizes parts of the data that are most perceptually relevant. This process is sometimes referred to as fingerprinting.
The embedded data, When continuously embedded throughout the content, can improve the reliability of the content by, for example, demonstrating that the content is
complete and has no viruses. One Way to make the embedded data continuous is to insert it in periodically spaced frame headers, or steganographically encode it at locations spread throughout the file.
A person trying to sabotage the file-sharing system can try to replicate the embedded data through a content bomb (such as audio repetitively saying “do not copy”) or virus to fool the system. Thus, the harder it is to duplicate the embedded data, the more reliable the system is. When trying to resist duplication, it is advantageous to encrypt the embedded data payload, thus making it harder to duplicate. In addition, the embedded data payload can be modified by the content to improve resistance to duplication. Finally, the embedded data can be modified by the content and then encrypted for more secure applications. The above three robustness methods are labeled dynamic locking and disclosed in patent application Ser. No. 09/404,291, already incorporated by reference. When the embedded data is a Watermark, meaning that it is steganographically embedded Within the content and not just as auxiliary data in each frame, it is usually inherently robust to duplication because many Watermarks use secret keys that are required to detect the Watermark and read the information carried in it. One form of key is a pseudo-random noise (PN) sequence used as a carrier to embed, detect, and read the Watermark. In particular, a spreading function is used to modulate the PN sequence With the Watermark message. The resulting signal is then embedded into the host data (e.g., perceptual or transform domain data) using an embedding function. The embedding function modifies the host signal such that it makes subtle changes corresponding to the message signal. Preferably, these changes are statistically imperceptible to humans yet discernable in an automated steganographic decoding process. Encryption and changing the Watermark message or PN sequence adaptively based on the content can improve the robustness of the Watermark to duplication.
Alternatively, if the embedded data is generated from the content, the embedded data is inherently linked to the content and is difiicult to duplicate in a virus or content bomb. For example, pseudo-randomly chosen frames can be hashed into a feW data bits that can be embedded in other pseudo-randomly chosen frames. Thus, Without knoWledge of the pseudo-random sequence (i.e. key) used to choose the frames and the hash function, the hacker cannot duplicate the embedded data.
Importantly, header and footer structures should be of knoWn size or protected so a hacker cannot slip a virus into the header or footer.
The embedded data can also demonstrate that the file is alloWed to be shared, Which means its oWner has authorized copying (i.e. sharing) rights. The Watermark message may include standard copy control information such as tWo message bits to encode copy permission states of “no more copy,” “copy once” and “copy freely.” In addition, only one bit can be used, thus indicating Whether or not sharing, is alloWed.
The copyright can be linked to other copy management systems. For example, according to the DVD-Audio specification (available at http://WWW.dvdforum.org) and the Portable Device Specification of the Secure Digital Music Initiative (available at http://WWW.sdmi.org), audio may be Watermarked With copy control information. This inforrnation may automatically be passed along if encoded Within a Watermark robust enough to survive the compression used in most file-sharing systems. Alternatively, the Watermark can be read and re-embedded as embedded data, possibly another type of Watermark (as discussed in U.S. Pat. No. 7,197,156, already incorporated by reference).