SYSTEM AND METHOD FOR MARKING OF AUDIO DATA BACKGROUND OF THE INVENTION
The invention concerns the marking of audio data in order to authenticate it or restrict its use. More particularly, the invention concerns marking of audio data by imposition of a code on the spectral content of the audio data.
Today, audio information, either alone, or in combination with other kinds of information, is increasingly available on easily-accessed networks and storage media. On such means, audio information is rendered in the form of audio data, that is audio information in a form on which computer programs operate. In this regard, see the definition of "data" in the Dictionary of Computing, 4th Ed., Oxford Univ. Pr., 1996 at p. 118. Audio information is, generally, that information that may be perceived by the human auditory system. Such information is typically in the form of sound that is composed of signals in the frequency range of about 15 hertz to 20,000 hertz. Audio data, therefore, is sound that has been converted into a format that can be operated on by computers. Audio data is available and may be apprehended from broadcast and cable media, over networks (such as the internet), and from digital storage media such as compact disks (CDs) and digital audio tapes (DATs), and equivalents.
The owners and marketers of audio information have extremely valuable rights in this information. The unauthorized use and access of audio data costs these owners and marketers billions of dollars per year in lost revenues. Further, because of the widespread proliferation of digital processing apparatus, the information content of audio data can be altered, modified, copied, presented, and so on, without authorization.
Accordingly, the development and widespread adoption of digital processing technology, coupled with easily-accessible telecommunication and storage technology, has created the problem of protection of the content of audio data by technical means that are commensurate with the technologies that process, communicate, and store such data.
SUMMARY OF THE INVENTION
The invention provides a technological solution to this problem by imposing on the spectral content of audio data information known and accessible only to the owner of the audio information and authorized representatives. Hereinafter this is referred to as "mark coding" or "marking". The solution is embodied in a method and system in which a pseudo random sequence of numbers or indexes is generated in response to the audio information and to private information. The pseudo random sequence of numbers or indexes points to specific spectral components of the audio information which are evaluated for marking with marking information. Marking is accomplished by altering the spectral content of the audio information in such a way as to be virtually inaudible. The encoding of audio data with marking information adjusts the spectral distribution of energy in the audio information, without adding new frequency components or deleting existing ones. Consequently, the spectral population of the audio information does not change with mark coding.
Accordingly, decoding may be conducted using the spectral population of the mark coded audio information. Further, such mark coding can be robust enough to withstand a number of quantizations, or it may be so fragile as to be lost with the first quantization following encoding. Or, the mark coding can combine robust and fragile components.
Accordingly, it is an objective of this invention to mark audio data by means of information that is encoded into the data by a process driven by the content of the data itself.
It is further objective to mark code the audio information by alteration of the energy distribution in the spectral representation of the information.
It is further objective to make such mark coding inaudible.
It is still a further object to provide both a robust form of mark coding which can withstand a sequence of quantizations of the audio information and also to provide a fragile mode of mark coding which is vulnerable to the first quantizations following such coding.
These objectives and many benefits of this invention will be manifest when the following description is read while referring to the drawings, which are described below.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 is a block diagram showing a prior art system for marking an image. Figure 2 is a block diagram illustrating the architecture of a system according to the invention.
Figures 3A-3C are spectral diagrams illustrating how the spectral content of audio information is coded according to the invention.
Figure 4 is a flow diagram illustrating the operation of the system of Figure 1 in coding audio information with marking information.
Figure 5 is a block diagram illustrating incorporation of the invention into an audio encoder.
DETAILED DISCUSSION OF PRIOR ART Figure 1 illustrates a prior art system for identifying an image based upon information in the image that is obtained, but distinct, from the contents of the image itself. The information may either be derived from the image or entered into the image. The system of Figure 1 derives, creates, or otherwise generates a digital mark that may compared with the contents of an image 10 for the purpose of identifying, authenticating, or otherwise validating the image 10. The mark is derived by processing private information 11 in response to the contents of the image 10. In this regard, the private information may comprise any type of information in any form that can be transformed into a digital representation and that is private to a person, enterprise, or a machine having some relationship to the contents of the image 10. The private information may comprise, for example, a private number set or sequence such as social security number, a driver's license number, a DNA sequence, a telephone number, a private character set or sequence, a private alphanumeric set or sequence, a private graphic, a private image, a private document, or a private code. It is necessary that the private information 11 be repeatable in the sense that, from one operation to another, the private information will not change. The private information 11 is input to a digitizing process 12 which reduces the private information to a digital electronic form having a predetermined size. In Figure 1, the digital form of the private
information is represented by a two-dimensional array 14 of ones and zeros that would reside, for example, in the memory of a computer. The two-dimensional array 14, may of course be assembled into a unique uni-dimensional multi-bit vector by scanning it in some regular fashion. At this point, the private information has been rendered in a repeatable manner into a digital object that may be processed in response to the contents of the image 10. An electronic image 16 of the image 10 is obtained by conventional means. For example, the electronic image 16 may be a digital image of the image 10. The digitizing element 18, for example, may implement the well-known ISO 1098 algorithm, with the product being an image feature such as spectral content that may be represented by a rectangular array 19 of samples, each sample being embodied in the digital number.
In Figure 1, the private information in the digital array 14 is scrambled in response to the digitized image contents produced at 18, with the scrambling being done in a linear feedback shift register (LFSR) 20 that is seeded by a lxn digital number embodying the digitized private information available in the array 14. The LFSR 20 is a shift register of n-stages whose operation is controlled by a sequence of clock pulses provided from a clock generator 22. The operation of the LFSR may be understood with reference the Sklar's DIGITAL COMMUNICATIONS: Fundamentals And Applications, (Prentiss-Hall 1988), pp. 546-549. The LFSR 20 scrambles the digital number embodying the private information in a series of shifts, each shift occuring in response to a clock pulse produced by the clock generator 22. The clock generator 22 operates in response to the array of samples produced by the element 18. When the array has been traversed, an END OF ARRAY signal is produced that disables the clock generator 22 and unloads the contents of the LFSR 20. At the occasion of this signal, the operation of the LFSR 20 will have scrambled the digital number that embodies the private information in a pseudo-random manner. This process is also referred to as "randomization" or "pseudo-randomization". The LFSR contents are arranged into an array 24 of ones and zeros in a two-dimensional matrix. This produces a two-tone image, with each pixel of the image corresponding to a respective identically-located bit in the array of binary digits. This two-tone pixel array is presented at 26. It is possible to reduce the image of
the pixel array 26 and use it as a mark 28 to apprehend or enter information into the contents of the image 10. This is done by correlating the mark with the image contents. The result is a correlation pattern. Either a code is derived by the correlation pattern, or a code is imposed by adjusting the correlation pattern. In the mode where the correlation pattern is adjusted, care is taken to make any change in the spectral content of the image virtually imperceptible to the eye.
THE INVENTION
In the invention, a random or pseudo-random process is used to index into the spectral content of audio data in order to mark the audio data by encoding the data with information that is related to an interest in the program information embodied in the audio data. Such information ("marking information") may, for example, signify an owner, a licensee, a distributor, a marketer, or any person or enterprise having an interest in the program information. The marking information is encoded in the audio data in such a way as to make it virtually imperceptible to the human auditory system, yet tractable to a decoder. This process is referred to as "mark coding" or, simply, "encoding".
Figure 2 illustrates an exemplary system with which the invention may be practiced. In Figure 2, an audio program source 200 provides on 201 audio data embodying audio information program material. The form of the information at 201 is presumed to be compressed, formatted information that is referred to hereinafter as "audio data". In this regard, the audio data at 201 may be generated using any of a number of audio compression algorithms and formats. Examples of such include AC3 (Dolby), Pro Logic, THX, MPG, and other equivalent schemes. For convenience, it is assumed that the audio data has been compressed and formatted using MPEG audio compression. In this regard, reference is given to D. Pan's article entitled "A Tutorial On MPEG/Audio Compression", in IEEE Multimedia, V.2, No. 2: Summer 1995, pp. 60-74. Following the example, the audio data at 201 is the result of spectral decomposition of audio information, followed by processing, quantization, and coding and formatting for transmission. Because the operations and functions of the invention preferably are directed to a spectral representation of the audio data to be encoded, the audio data at 201 is deformatted and expanded to provide a digital representation ("digital audio") of
the audio information; this is accomplished at step 202. This partial inversion of the MPEG process advantageously provides the digital representation in a sequence of frames. The frames are clearly marked in the MPEG scheme. (As mentioned above, the MPEG scheme is merely an example.) Generally, block 202 of Figure 2 will provide a partition of digitized audio with a partition mark provided to indicate the boundary between adjacent partitions. In the MPEG scheme, a partition is a frame and the partition mark indicates the boundary between adjacent frames. A transform function at 208 transforms each unit of digitized audio into a spectral content signal that may be represented by a magnitude vs. frequency plot (also called a "spectral representation") having, for example, 1024 frequency bins. The transformed unit of digitized audio is evaluated by a threshold function at 210 for the purpose of determining whether the unit can be encoded. Since, as will become clearer as this description progresses, encoding involves altering the distribution of energy in a transformed unit of audio data, the thresholding component 210 evaluates the distribution of energy among the spectral components of the unit. In this regard, the threshold component 210 identifies units with high spectral density. In this description, the term "high spectral density" means that a spectral representation has active frequency components in no less that thirty percent of the frequency bins defined for the representation. Thus, for a 1024 bin fast fourier transform (FFT), for example, 307 bins would have to be active. Active frequency components are those exhibiting a signal magnitude above some noise floor. Thus, to determine which bins are "active", an average system noise magnitude is subtracted from the magnitudes of all bins in a sample. Once system noise is removed, the mean magnitude is calculated. If this mean falls above a predefined threshold, the frame is ready for coding, if below, the frame is not encoded. For transformed units that have high spectral density, the threshold component 210 enables a clock generator 212 that generates one clock pulse for each frequency bin in the spectral representation of the current unit of digitized audio, with the condition that the magnitude of the frequency or frequencies in the bin exceed a minimum level. The clock pulses produced by the clock generator 212 operate a linear feedback shift register (LFSR) 214 seeded with a digital representation of private information 215. Following the last clock pulse, the LFSR 214 contains a digital
word from which a number can be extracted by, for example, truncation, modulo- arithmetic, and so on. The number obtained from the LFSR indexes to a bin in the spectral representation of the current unit of digitized audio. The bin that is indexed by the contents of the LFSR 214 is inspected by a mark coding component 216 for encoding according to the invention. A buffer 218 accumulates the succession of transformed units of digitized audio, both coded and uncoded, in the sequence that they are transformed at 208, feeding them in sequence to a inverse transform component 219. The output of the inverse transform unit 219 is fed to a compression unit 220 that renders the now-encoded audio data into the form and format that the audio data had at 201.
Refer now to Figures 3A-3C for an understanding of how marking information is encoded on the spectral representations of the units of digitized audio that are provided by the transform component 208 of Figure 1. Figure 3 A is a spectral plot representing data output by the transform component 208 that drives the clock generator 212 and that is buffered at 218. Figure 3 A shows a plurality of frequency bins 1, 2, 3, . . ., j-1, j, j+1, . . . . Assume that the clock generator 212 thresholds the magnitudes of the frequency bins, in sequence, comparing their contents against an established counting threshold magnitude 300. As an example, presume that every bin whose contents exceed the counting threshold magnitude 300 by a predetermined amount will result in the generation of one clock pulse by the clock generator 212. In this case, the amount by which any bin's magnitude must exceed the magnitude level 300 provides a magnitude margin that will guard against loss caused by subsequent processing that involves quantization as would be required, for example, by decoding and also by reencoding following a decoding.
Presume next that the number obtained from the LFSR 214 indexes to frequency bin j. A binary bit can be encoded in the unit of digitized audio that has just been transformed by considering the immediate neighbors of bin j, that is bin j-1 and bin j+1. A zero, for example, could be encoded according to the invention by making the magnitude of bin j exceed the magnitude of each of its immediate neighbors by a predetermined amount. If this were already the case, no further processing would be required to encode a zero. Presume, however, that the magnitude of the contents of bin j is less than the magnitude of the contents of
each of its neighboring bins, as illustrated in Figure 3A. In this case, the mark coding component 216 would redistribute the spectral energy between the three bins to achieve the results illustrated in Figure 3B. That is, a portion of the magnitude in bin j-1 would be deleted from that bin and added to the magnitude in bin j; similarly, a portion of the magnitude of the contents of bin j+1 would be subtracted from that bin and added to the contents of bin j. This is illustrated in Figure 3B. This illustrates one way in which one binary state (in this case, a zero) can be encoded into the audio data by redistribution of the spectral energy in the audio information. Figure 3C illustrates how another binary state (in this case a one) can be the spectral energy distribution of audio data. Presume that the number derived from the LFSR 214 indexes to bin k in Figure 3C. In this case, the contents of bin k initially have a magnitude 305, while bins k-1 and k+1 have magnitudes (307 and 309, respectively) that are less than the magnitude of bin k. In this case a one is encoded by subtracting a portion of the magnitude of bin k and adding that portion to the contents of bin k-1 and subtracting another portion of the contents of bin k and adding that portion to contents of bin k+1. These coding examples do not exhaust all of the possible combinations of the magnitudes of three adjacent bins and the ways in which they can adjusted in order to encode one or another of two states. Neither do they exhaust the conditions that can apply to the contents of the bins prior to encoding. Further, the coding does not have to be binary, nor does it necessarily require three bins or even adjacent bins although adjacency greatly assists in making the marking imperceptible. However, the invention does contemplate that the coding will be accomplished by redistribution of the spectral content of audio data. The invention further contemplates that such spectral redistribution will take into account the counting threshold magnitude 300, taking care that the spectral redistribution will not change the number of bins causing clock pulses; this ensures that encoded bins can be identified and decoded. The encoding is also limited in that the spectral redistribution must not produce artifacts that are perceptible to the human auditory system. Thresholding the encoding by identifying noise-like spectral distributions partly achieves this purpose; the use of adjacent bins and limitations on the amount of energy redistributed therebetween also contributes to the imperceptibility of coding.
Figure 4 illustrates a method for mark coding according to the invention and may be understood with reference to Figure 2. In Figure 4, the mark coding process starts at 400 with the presumption that decoded and decompressed audio data is fed in a sequence of defined and/or definable units to the transform component 208. Each unit of digitized audio is received and transformed by the process 208. When the unit of digitized audio is transformed by the transform component 208 at step 402 it is buffered at 403 and process flow passes to the threshold component 210 where a decision 404 determines whether the spectral distribution indicates that the transformed unit has high spectral density. If not, the negative exit is taken and conventional buffer control is accessed at 405 to ensure proper operation of the buffer 218. If the transformed unit has high spectral density, the positive exit is taken from the decision 404 and the first symbol of a code symbol set 408 is encoded as pattern of ones and zeros ("bits"). A first location for the first bit is determined in step 410 by clocking the LFSR 214. The first location is indexed by the contents of the clocked LFSR; this location is one of the bins in the last-transformed unit of digitized audio. A bit is encoded at this location in step 412. For so long as bits remain to be encoded for the current symbol, the negative exit will be taken from decision 414, the next location accessed by cycling the LFSR 214 in step 410 and the next bit will be encoded in step 412. When the current symbol is encoded, the positive exit is taken from decision 414. Next, in decision 418, if more symbols remain to be encoded, the positive exit is taken, the next symbol is obtained in step 407 and is encoded as just described. When all symbols have been encoded, the negative exit is taken from decision 418. The buffer 218 is processed in step 405, the next unit of digitized audio is obtained and transformed, and the just-described process is repeated. The mark coding process illustrated in Figure 4 can repeatedly encode a symbol sequence that forms the basis of marking information. Such information can be replicated throughout the audio program information, as deemed necessary for any particular application. This procedure may be employed to mark audio data at any one of a number of points in an audio information processing and distribution configuration. In this regard, it may be used where audio data is introduced or staged through nodes in a communication system such as the internet. It can also
be used to mark audio data that is being processed for storage on readable media such as CDs. Advantageously, the architecture of Figure 2 can be adapted for an encoding appliance that receives analog audio and converts it into an encoded audio data format. The procedure of Figure 4 can, of course, be implemented in a software program and executed in general purpose digital computer; alternatively, the architecture of Figure 2 and procedure of Figure 4 can be implemented in application specific integrated circuitry (ASIC).
As is known, quantization of an audio signal for encoding is an inherently lossy procedure. Thus, each time audio information is quantized, loss is introduced. Such loss typically results in alteration of spectral content. Thus, mark coding of audio data in order to insert marking information into audio information as taught, for example, in Figures 3B and 3C will be affected by subsequent quantization of the audio information. This provides the opportunity for three modes of encoding: robust, fragile, and a combination thereof. With reference to Figures 3B and 3C, robust mark coding is accomplished by maximizing the amount of spectral energy redistributed to encode bits up to a limit where such coding becomes perceptible. With careful attention to the limit, bits can be robustly encoded so as to mark audio information in a way that is tractable to a decoder after marked audio information has been subjected to quantization after mark coding. Fragile mark coding limits the amount of energy redistributed between adjacent bins to a level that would result in loss of encoded bits after subsequent quantization; for example, with a known level of noise introduced by a quantization procedure utilized in a known compression algorithm, those skilled in the art will be able to encode bits according to this invention by redistributing an amount of energy that is small enough to result in a statistically significant loss of bits after marked audio data has been quantized subsequent to marking.
Decoding of marking information inserted into audio information according to this invention presumes that the marking information is embedded in spectral energy by redistributing the energy contents of adjacent frequency bins to implement a binary code, for example, according to the encoding illustrated in Figures 3B and 3C. Extraction of the marking information requires the detection of mark coded bins by indexing using an LFSR as illustrated in Figure 2. Since
the mark coding procedure does not alter the spectral population of a transformed unit of digitized audio, provision of the same private information used to mark code the audio information will result in production of numbers or indexes by the LFSR in the manner described above; in decoding, of course, the indexes will point to coding locations. Decoding will require detection of bit states, accumulation of bits, and identification of symbols. Of course, error correction codes may be employed with this invention.
A particularly efficacious embodiment of the invention is illustrated in Figure 5, where an audio encoder 500 receives an unencoded analog audio signal from a source 502. The analog signal is initially framed at 510. Each frame is transformed (by FFT, DFT, or any equivalent) at 511 and then quantized at 512. Without adaptation to incorporate the invention, the output of the quantizer would be fed to a data coding and formatting unit 513. However, the system architecture of Figure 2 can be adapted for the audio encoder 500. This adaptation, which is presented in Figure 5, assumes that audio data has already been transformed into its spectral components. Thus, in the audio encoder 500, the output of the quantizer 512 is a series of spectral representations, each ready for mark coding according to the invention. A mark coding apparatus 522 that operates according to the invention includes threshold, clock generation, LFSR, and mark coding components. A seed memory or register 520 stores private information for the LFSR. A buffer 523 stages transformed frames for processing by the marking apparatus 522 and then coding and formatting by data coding and formatting unit 513.
Clearly, the other embodiments and modifications of this invention will occur readily to those of ordinary skill in the art in view of these teachings. Therefore, this invention is to be limited only by following claims, which include all such embodiments and modifications when viewed in conjunction with the above specification and accompanying drawings. I CLAIM: