WO2002013436A1

WO2002013436A1 - Method and system for steganographically embedding information bits in source signals

Info

Publication number: WO2002013436A1
Application number: PCT/US2001/024468
Authority: WO
Inventors: Mahalingam Ramkumar; Ali N. Akansu
Original assignee: Avway.Com Inc.
Priority date: 2000-08-09
Filing date: 2001-08-02
Publication date: 2002-02-14
Also published as: EP1317809A4; JP2004506379A; AU2001278163A1; EP1317809A1

Abstract

A method and system for embedding (E) information bits in a source signal (A) like images, audio or video. The embedding being performed optimally for a given worst-case scenario of unintentional or intentional attack of the host signal (to remove the embedded bits), and a given distortion of the host signal due to information embedding.

Description

METHOD AND SYSTEM FOR STEGANOGRAPHICALLY EMBEDDING INFORMATION BITS IN SOURCE SIGNALS

This invention relates to a method and system for embedding information bits in a host signal. The embedding may be performed to determine origin of any perfect or imperfect copies of the composite (host plus message) signal, or to use the host signal as a cover for secret or covert communications, over a channel which is primarily meant for transmitting the host signal only.

BACKGROUND OF THE INVENTION

Data hiding or steganography is the art of hiding a message signal in a host signal, without any perceptual distortion of the host signal. The composite (host plus message) signal is also referred to as stego-signal. Though steganography is often confused with the relatively well-known cryptography, the two are but loosely related. Cryptography is about hiding the contents of a message. Steganography, on the other hand, is about concealing the very fact that a message is hidden. Steganography may be considered as communication through subliminal channels, or secret communication.

Rapid increases in bandwidth available for dissemination and storage of digital content and availability of software tools for editing multimedia content, such as, video, images or audio calls for systems and methods to establish origin of such content. In addition, large volumes of multimedia content being exchanged over insecure channels, such as, the Internet, provide within themselves secure and subliminal steganographic channels for secure or secret communications.

The proliferation of digital multimedia as opposed to conventional analog forms, is primarily a result of (1) the ease with which digital data can be exchanged over the Internet, and (2) the emergence of efficient multimedia data compression techniques.

The first reason listed above is also a major cause for concern. Unlimited perfect copies of the original content can be made, and distributed easily. It was this concern of protecting intellectual property rights of multimedia data in digital form, that primarily triggered researchers to find ways to watermark multimedia data. Watermarking the content is done by embedding some data in the host signal (original content). The embedded data may be an imperceptible signature, which the owner of the multimedia content should be able to extract when a dispute regarding ownership occurs.

Data hiding in multimedia could help in providing proof of origin and distribution of content. Multimedia content providers would be able to communicate with the compliant multimedia players (or Tenderers) through the subliminal, steganographic channel. This communication may control or restrict access of multimedia content, and carry out e-commerce for pay-per-use implementations.

A typical application of data hiding for multimedia content delivery may involve the content providers supplying the raw multimedia data (say a full length movie) along with some hidden agents or control data . The job of the distributors would be to package the content in some suitable format (such as, MPEG) understandable by the player, for distribution of the multimedia through DVDs/CDs or live digital broadcasts, or by hosting web sites for downloads or streaming. The compliant multimedia players, will typically be connected to the Internet.

In conventional multimedia distribution, the content provider looses all control over how the multimedia is used/abused the moment it is acquired by another party. The key idea behind data hiding is to re-establish control whenever the content is used. The content provider, by hiding an agent in his raw data, hopes to control access to his/her multimedia content. This can be done with the cooperation of the players, and an established protocol for communication between the content providers and the compliant multimedia players.

Data hiding can be broadly classified into two categories depending on whether the original content is needed for extraction of the hidden bits: (1) non- oblivious methods need the original content for extracting the hidden bits; and (2) on the other hand, oblivious detection methods extract the hidden bits without any knowledge of the original.

In most data hiding methods, sequence of bits to be embedded, viz. B, is converted to a form suitable for embedding in a cover content. Initially, the bit sequence is converted to a signature sequence. Thereafter, the signature sequence is embedded in the cover content by an embedding function to obtain the stego- content.

From a signal processing perspective; data hiding methods can be classified into two categories, depending on the type of embedding and detecting operators. The first category includes methods where the embedding function adds the signature sequence linearly to stego-content, and the detector detects from the stego-content via correlative processing (these methods are referred to as Type I methods in data hiding literature). In the second category the embedding function and the detector are non-linear, typically employing quantizers (these methods are referred to as Type II methods in data hiding literature). One of the important characteristics of the non-linear methods is their ability to suppress the noise due to the original content (or self-noise), even though the original content is not available at the receiver.

The present invention provides a unique data hiding technique that substantially reduces the effect that noise, distortion or corruption of the host signal have on the detected signal so as to greatly enhance the integrity of steganography techniques employing oblivious detection of the hidden data. The crux of the invention is a class of methods referred to as Type III methods of which Types I and II are just special cases. An optimal choice of parameters for the proposed Type III methods depending on the engineering constraints, can substantially improve the performance of data hiding.

The present invention also provides many additional advantages, which shall become apparent as described below. SUMMARY OF THE INVENTION

A method for embedding a message signal in a host signal, the method comprising the steps of: (a) embedding the message signal into the host signal, thereby producing a stego signal; and (b) detecting an estimate of the message signal from the stego signal; provided that the detecting step (b) is not an exact inverse of the embedding step (a), and the host signal cannot be exactly extracted from the stego signal.

The embedding step (a) produces a value b, in the stego signal from a value a_l in the host signal, and wherein the embedding step (a) comprises limiting

to a limit value — , a magnitude of difference between b_t and , .

.z

Furthermore, the embedding step (a) employs a continuous periodic function to produce the stego signal, and wherein the detecting step (b) employs a continuous periodic function to produce the estimated message signal. The continuous periodic function is a triangular function f(x) having a period Δ , wherein:

- — ≤ f(x) ≤ — for all x; 4 ^J 4

/(0) = - ; and

(-) = - 2 4

Optionally, the embedding step (a) produces a value b_t in the stego signal from a value a_l in the host signal and a value s, in the message signal, such that the embedding step (a):

(i) is subject to a maximum distortion constraint P,

(ii) employs a continuous periodic function having a period Δ , and (iii) is represented by the function b_t = E(a, ,s_t) , and employs an algorithm as follows: a A A a if rem(— -) > ~ , then p, = 3 rem(—) ,

A 2 4 Δ

else p, - rem(—) ;

Δ 4 e^ s^p, ; β β if (I e, l> y) , then e, = sign(e_t)^ ;

ςr, = rem(—) ;

Δ

if , > — , then e, = -e, ;

if α₍ > 0 , then b, = α, + e, , else b_t = a_l - e_l .

The method of the present invention is particularly useful when the stego signal is corrupted or distorted prior to detecting step (b). In this embodiment where the stego signal is corrupted a value b_l in the stego signal is modified after the embedding step (a) to yield a value c_; in the corrupted or distorted stego signal, such that the detecting step (b):

(i) produces a value s_eι in the estimated message signal from a value c_l in a distorted stego signal, (ii) employs a continuous periodic function having a period Δ , and

(iii) is represented by the function s_eι = D(c_t ) , and employs an algorithm as follows:

q, = rem(—) ;

if g, > -, then _?_βl = 3- - q„

else _? = #, -— . Preferably the host signal is a sequence α, , for = 1 to N ; the message signal is a sequence s, , for = 1 to N ; the stego signal is a sequence b_l , for i = \ to N ; the corrupted or distorted stego signal is a sequence c_t , for = 1 to N ; and the estimated message signal is a sequence s_ei , for z = 1 to N .

The embedding step (a) preferably (i) imposes a limit — on a magnitude

difference between a value b, in the stego signal that is produced from a value , in the host signal; and (ii) employs a continuous periodic function having a period

Δ to produce the stego signal, wherein such the limit — and the period Δ are

chosen to minimize a mean square distance between the message signal and the estimated message signal, subject to a maximum distortion constraint P of the embedding step (a).

The present invention also provides a method for mapping K information bits to a message signal s_t , i = 1 to N . This method comprising the step of: grouping the K information bits together to represent one of 2^L symbols, wherein each of the 2^L symbol is mapped to a basis vector or its negative of a 2^i_1 x 2^Z_I orthogonal transform matrix. The orthogonal transform matrix is obtained from a cyclic all-pass filter and its circular shifts. The cyclic all-pass filter is preferably obtained from a key.

BRIEF DESCRIPTION OF THE DRAWINGS

Fig. 1 is a block diagram of the data embedding, channel and detection operation according to the present invention;

Fig. 2 is a graph depicting a periodic triangular function employed by the detector D of Fig. 1; Fig. 3(a) is a graph demonstrating that the distortion introduced during the embedding the S in A (to obtain B ) of Fig. 1 in accordance with Type II will be

uniformly distributed between and H — ;

Fig. 3(b) is a graph depicting the distribution of the distortion introduced in accordance with the method of the present invention; and

Fig. 3(c) is a graph depicting the distribution of the limiting noise t_i

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

The present invention is a method for efficient secure communication over subliminal channels provided by multimedia host signals like audio, ₍images and video transmitted over any channel. For example, the host signal may be transmitted over the Internet or distributed in storage mediums by other means or even transmitted over analog channels, such as, that used for analog television or radio broadcasts. Typically the host signal is expected to undergo some distortion before it reaches one or many end points where it may be stored or rendered.

In the method described herein, the host signal may be any form of naturally occurring signals, such as, audio, image or video or artificially synthesized versions of them. The host signal may further be represented in some transform domain. The choice of the transform may depend on the nature of the application. For example, if the host signal is an image and is not expected to be re-scaled, resized or rotated, any unitary transform may be used. On the other hand, if the image is likely to undergo rotation, scaling and/or translation, a Rotation- Scale-Translation invariant transform may be used. If the image is cropped, data embedding may be performed in many blocks of the image, so that the hidden bits can be extracted even if one such block survives. In general, the host signal can be coefficients of a one-to-one transform or a many-to-one transform. In the method described herein, the host signal can therefore be considered as a vector or a sequence of N real or complex numbers, represented by A = [a₁ a₂ ^■■■ a_N].

A sequence of K bits, represented by

/ = β i₂ ^■■■ i_κ], ij = 1/0 for l≤J≤K is mapped by a mapping toa signature sequence S , M :I -> S , where S = [s_x s₂ •^■• s_N]

An embedder E embeds the sequence in A to obtain the stego sequence B, B = E(A,S), where

B = [b_x b₂ ^■■■ b_N], the embedding being performed element-wise, b_λ =E(a,s_λ) b₂ = E(a₂,s₂) b₃ = E(a₃,s₃)

b_N = E(a_N,s_N) subject to the constraint that d(A,B) ≤ P where d(A,B) is some distance measure of signals A and B , and P is the maximum permitted distortion of the host signal. In the preferred embodiment the distance measure is the mean square error:

The stego sequence B may undergo some distortion before it reaches the detector as C , given by C = B + Z , where

Z = Z_j Z₂ ^•'^{• Z}Mh is the noise in the channel used for transmitting the host signal, and C = [c_x c₂ ^■■■ c_N]. The detector D obtains an estimate S_e of the signature sequence S embedded:

S_e = D(C) .

The block diagram of data embedding, the channel and detection operation is shown in FIGURE 1.

Based on this generalization of the embedding and detecting functions E and D , prior art in this field can be categorized into two types.

Characteristics of Type I

• B = E(A,S) → B = A + S

• D(B) = A + S ≠ S

The above two equations imply that E and D are not inverses. In addition, if S is known one can obtain the original host sequence A from B as A = B - S .

Characteristics of Type II

• B = E(A,S) • D(B) = S

Unlike Type I methods, the above two equations show that for Type II methods E and D are exact inverses. Additionally, unlike Type I methods, it is not possible to obtain A exactly, given B and S .

In the core of this invention is a class of embedding and detection operators E and D , we shall refer to as Type III.

Characteristics of Type III • B = E(A,S)

• D(B) ≠ S The above two equations illustrate that E and D are not exact inverses (like Type I and unlike Type II). Further, given S and B it is not possible to obtain A (like Type II and unlike Type I).

In a preferred embodiment described herein, the detector D , where

S_e = D(C) , S_e = [s_βl s_e2 ^{■ ■ ■} s_eN ] is implemented by the following algorithm:

q, = rem(—) ;

Δ

if g, > - then ^$ _a = ³^ - <l, >

else s„ eι, - Q - — ^ .

In the above algorithm, rem (x) stands for the remainder of a division operation (x). For example, rem(5/4)=l, rem(2/2)=0, rem(-6/4)=rem(6/4)=2.

The choice of the parameter Δ is dictated by the distortion constraint P and the energy of the channel noise Z . The detector may also be thought of as employing a periodic triangular function shown in FIGURE 2, y ⁼ f(^x) ⁼ f(^χ + Δ) for integer m . Also,

< f(x) ≤ — for all x,

4 4 and specifically, (0) = -

(-) = - 2 4

The embedding operation b_l = E(a_t ,s_t) is implemented by the following algorithm:

if rem(— '-) > — , then p, = 3 — - rem(— '-) ,

A 2 4 A

e,=s,-p,l β β if (I e, l> y) » then e, = ^(e, )y ;

if rem(-^L) > — , then e, = -e, ; Δ 2 if α, > 0 , then b, = α, + e_; , else b_; = , - e, .

In the above algoritlim, β is a parameter, the choice of which is dictated by the distortion constraint P and the energy of the channel noise Z . Also sign (x) equals +1 if the quantity 'x' is positive and sign (x) equals -1 if the quantity 'x' is negative. For example, sign(-20) = sign(-l) = -1, and sign(l 1) = sign(l) = +1.

As an example of the embedding and detection operations, let A=[-65, -250, 19, 43, -172, 179, 178, -6], and

S=[10, 10,-10, 10,-10, 10,-10, 10],

Δ = 40 , and β = 10 (Note that < s < - for all i).

4 ' 4

Now B = E(A,S) is given by

B=[-60,-255, 14,48,-167, 180, 173,-11]. Let Z=[-4, -8, 2, -10, -5, 3, -6, -4]. Therefore,

C=B+Z=[-64, -263, 16, 38, -172, 183, 167, -15], and

S_e = D(C) is now

S_e=[6,7,6,-8,2,7,-3,5].

Now let us consider b_t = E(a_l , s_t ) , for i=l . a_λ = -65, s_x =10.

If remX-) > — , then p, =3 rem(— '-) ,

X 2 ' 4 X

rem(= ) = 25 > -, _Pl = 30 - 25 = 5

e_i=s_l-p_l e_j =10-5 = 5

if (1 e, l> ^~) > hen ^e = sign(e_t ) ^ ej

2

if rem(— ) > — e, = -e, rem(=X) = 25 > — , e, = -5 Δ 2 2 if , > 0 ό_; = α, + e, else b, = α, - e, b_χ =-65 -(-5) = -60

Now C_j = b_λ + Z_j = -60 - 4 = -64. For detection,

q, = rem(—). q_Λ = rem( ) = 24

Δ 40

if q, > — then s_ei = 3 gr, ^ =24 > 20,^ =30-24 = 6

else^_e, = ,--

For Type II systems ( Δ = β ) the distortion introduced, viz. B- A for

embedding the S in A (to obtain B ) will be uniformly distributed between

and +— (FIGURE 3 a), and the average energy of the distortion introduced in A

A² will be ^■ — . For Type III systems, the distribution of the distortion introduced is

depicted in FIG.3b. The average energy of the distortion for a Type III system is given by β²3A-2β) 12Δ While for Type II systems s, = D(b_t) , for the proposed Type III system D(b,)-_Sl=t, where t_l is noise due to "limiting" (limiting occurs when β < A ). The distribution of the limiting noise t_j is shown in FIG. 3 c, and the average energy of the limiting noise is given by

(Δ - ?)³ 12Δ

Optimal choice of the parameters Δ and β for a given signal-to-noise ratio

(snr) ²(3Δ - 2/?)

Signal Energy P snr = 12Δ

Noise Energy Energy of Channel Noise Z is shown in Table 1. In Table 1,

SNR = 101og₁₀ (snr) dB, and

From the values of Δ and signal energy P , β can be obtained by solving β²QA - 2β) _{^ r} 12Δ

Table 1 - Optimal Choice of k for different SNRs

The optimal parameters are chosen to minimize

(s_l-s_e +(s₂-s_e +-- + (s_N-s_e Ϋ

J =

which is the normalized mean square distance between the embedded signature sequence and the detected signature sequence. The minimization performed under the assumption that the channel noise Z has a Gaussian distribution. If Z is zero mean and has a variance of σ² , then

where

and

The mapping M:I→S, in the preferred embodiment takes the following form. The bit sequence I of K bits is grouped into K/L L-bit symbols. Each L-bit symbol will be mapped to one of in

2^{L !} basis vectors of an orthogonal transform. Thus we can embed — N —

NL symbols or —γ-r bits in the sequence A. For example,

2 ifΝ=8192,for

L = 2, 3,4, 5, 6,7, 8, 9, and 10, K = 8192, 6144, 4096, 2560, 1536, 896, 512, 288, and 160 bits respectively.

In the preferred embodiment, L bits corresponding to each symbol are assumed to represent a decimal number between 1 and 2^L~X . This number is used as the index of the basis vector to be chosen.

The basis vectors of a Q x Q orthogonal transform where Q = 2^L~l are obtained from a random seed as follows. The random seed (or key) is used to generate uniformly distributed random sequence

[θ_λ, θ₂, ^{• • •} ,θ _{g ι} ] , - π ≤ θ, ≤ π .

2

The — - 1 random numbers define the phase of the discrete Fourier

2 transform (DFT) of a sequence H. The magnitudes of the discrete Fourier coefficients are assumed to be unity. Such a sequence H is cyclic all-pass of length Q. H is orthogonal to all its cyclic shifts. Such a sequence derived from a random seed and all its cyclic shifts form a complete basis, and can therefore be considered as the basis vectors of a Q x Q unitary transform matrix.

As an example, let Q=8. Let the — - 1=3 random numbers be

2

[-2.7489, -0.7854, 1.1781]. These random numbers describe are the angles of the Discrete Fourier Transform coefficients of H. The angles of the 8 coefficients of H are

[0, -2.7489, -0.7854, 1.1781, 0, -1.1781, 0.7584, 2.7489] and their magnitudes are equal to 1. The cyclic all-pass filter H is obtained by inverse Discrete Fourier Transform as

[0.2915, -0.1499, 0.3999, -0.0415, 0.5621, 0.5034, -0.2534, -

0.3121]

Each segment of length Q of the signature sequence S of length N carries information pertaining to one symbol between 0 and 2Q-1. Symbol sequence - [y_lt y₂, ^■■• ,y_N] ; 0< v, ≤2Q-1 for all i

Q

Signature sequence - [S_λ , S₂, •^■■ ,S_N] = S

~Q

The algorithm for obtaining the signature sequence is as follows for all i sign=l; if y.≥Q shift =y,-Q; sign = -l; else shift = >, ; circularshift (sign x H, shift);

For example, if

H = [0.2915, -0.1499, 0.3999, -0.0415, 0.5621, 0.5034, -0.2534, -0.3121] and y, =2 (circular shift by 2) then,

S, = [-0.2534, -0.3121, 0.2915, -0.1499, 0.3999, -0.0415, 0.5621,

0.5034].

As an other example, if y_t = 10 (circular shift by 10-8=2 followed by negation), then

S, = [0.2534,0.3121,-0.2915,0.1499,-0.3999,0.0415,-0.5621,

-0.5034].

The Algorithm for the inverse mapping M^~l :S_e -» I_e is as follows: Each segment of length Q of the detected sequence S_e = [S_el , S_e2 , • • ^• , S _N ] e — Q corresponds to a symbol. The embedded symbol is estimated as follows: HH=DFT(H) for all i

SS=DFT(S_e;);

YY=IDFT(SS.*HH); y_eι =index(max(abs(YY))); if(YY[3_e,])<0,then

In the above algorithm, DFT stands for Discrete Fourier Transform and IDFT stands for Inverse DFT. y_eι is the estimate of y, , which is the symbol embedded in the i'th segment of S. Finally, the binary representation of e y_el yields the corresponding sequence of bits I_eι , and

^■'e ^— IJel> ^■'^■el' ^'" i-' e —J ' Q

I_e is the estimate of the hidden bit sequence I.

Claims

WHAT IS CLAIMED IS:

1. A method for embedding a message signal in a host signal, said method comprising the steps of: (a) embedding said message signal into said host signal, thereby producing a stego signal; and

(b) detecting an estimate of said message signal from said stego signal; provided that said detecting step (b) is not an exact inverse of said embedding step (a), and said host signal cannot be exactly extracted from said stego signal.

2. The method according to claim 1, wherein said stego signal is corrupted or distorted prior to detecting step (b).

3. The method according to claim 1, wherein said embedding step (a) produces a value b_l in said stego signal from a value a_t in said host signal, and wherein

said embedding step (a) comprises limiting to a limit value — , a magnitude of

difference between b_l and , .

4. The method according to claim 1, wherein said embedding step (a) employs a continuous periodic function to produce said stego signal, and wherein said detecting step (b) employs said continuous periodic function to produce said estimated message signal.

5. The method according to claim 4, wherein said continuous periodic function is a triangular function.

6. The method according to claim 5, wherein said triangular function f(x) has a period Δ , and wherein:

-- ≤ f(x) ≤ - for all x; 4 4 /(0) = ~; and

2 4

7. The method according to claim 1, wherein said embedding step (a) produces a value b_j in the said stego signal from a value a, in said host signal and a value s, in said message signal, such that said embedding step (a):

(i) is subject to a maximum distortion constraint P,

(ii) employs a continuous periodic function having period Δ , and

(iii) is represented by the function b_l = E(a_l ,s_f ) , and employs an algorithm as follows:

if rem(— '-) > — , then p, = 3 — - rem(—) , Δ 2 4 Δ

^e, = s, - p, ;

if (| e, \> ^£-) , then e, = sign(e,)^ ;

q, = rernXX ;

Δ

if q_t > — , then e_l = -e, ;

if a_t > 0 , then b_l = a e_t ; else b, = α, - e, .

8. The method according to claim 2, wherein a value δ, in said stego signal is modified after said embedding step (a) to yield a value c, in said corrupted or distorted stego signal, and wherein said detecting step (b):

(i) produces a value s_el in said estimated message signal from a value c₍ in said distorted stego signal,

(ii) employs a continuous periodic function having a period Δ , and (iii) is represented by the function s_et = D(c_t) , and employs an algorithm as follows:

q, = rem(- ;

Δ

if q, > -, t en s_eι = 3-- q_l ,

else s„ = q, - - .

9. The method according to claim 2, wherein said host signal is a sequence a, , for = 1 to N ; said message signal is a sequence s, , for i = 1 to N ; said stego signal is a sequence b_t , for i = 1 to N ; said corrupted or distorted stego signal is a sequence c, , for i = 1 to N ; and said estimated message signal is a sequence s_eι , for i = 1 to N

10. The method according to claim 1, wherein said embedding step (a):

(i) imposes a limit — on a magnitude difference between a value b, in

said stego signal that is produced from a value a_t in said host signal; (ii) employs a continuous periodic function having a period Δ to

produce said stego signal, wherein such said limit — and said period Δ are

chosen to minimize a mean square distance between said message signal and said estimated message signal, subject to a maximum distortion constraint P of said embedding step (a).

11. A method for mapping K information bits to a message signal s, , i = 1 to N , said method comprising the steps of: grouping said K information bits together to represent one of 2^L symbols, wherein each said 2^L symbol is mapped to a basis vector or its negative of a 2^i_I x 2^i_1 orthogonal transform matrix.

12. The method according to claim 11, wherein said orthogonal transform matrix is obtained from a cyclic all-pass filter and its ciruclar shifts.

13. The method according to claim 12, wherein said cyclic all-pass filter is obtained from a key.