WO2008014007A2 - Certification and authentication of data structures - Google Patents

Certification and authentication of data structures Download PDF

Info

Publication number
WO2008014007A2
WO2008014007A2 PCT/US2007/017072 US2007017072W WO2008014007A2 WO 2008014007 A2 WO2008014007 A2 WO 2008014007A2 US 2007017072 W US2007017072 W US 2007017072W WO 2008014007 A2 WO2008014007 A2 WO 2008014007A2
Authority
WO
WIPO (PCT)
Prior art keywords
query
answer
data
responder
digest
Prior art date
Application number
PCT/US2007/017072
Other languages
French (fr)
Other versions
WO2008014007A3 (en
Inventor
Roberto Tamassia
Nikolaos Triandopoulos
Original Assignee
Brown University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Brown University filed Critical Brown University
Publication of WO2008014007A2 publication Critical patent/WO2008014007A2/en
Publication of WO2008014007A3 publication Critical patent/WO2008014007A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures

Definitions

  • the teachings in accordance with the exemplary embodiments of this invention relate generally to data authentication and, more specifically, relate to authenticating general queries over structured data.
  • the techniques do not explicitly authenticate the corresponding search algorithm, the complexity of the resulted authenticated data structures is of the same order of magnitude as the searching algorithm.
  • Another study described a general technique for designing consistency proofs for committed databases- a different problem than data authentication. However, the technique can be extended to provide a general framework for also designing authenticated data structures (that actually enjoy additional properties).
  • the authentication technique authenticates the searching algorithm that is used to produce the answer. Also, the used model is the pointer machine; the RAM model can be captured at a O(log M) overhead, where Mis the total memory used.
  • the results presented herein operate on the RAM model, thus, they include a broader class of both static and dynamic query problems and can lead to more efficient constructions, where the answer validity and not the algorithm is verified.
  • a method includes: receiving a query on a structured data set; transforming the query into a plurality of set membership queries on a plurality of sets, wherein the plurality of sets are obtained by modifying the structured data set; determining a plurality of first answers corresponding to the plurality of set membership queries; processing the plurality of first answers to obtain a second answer corresponding to the query; and returning the second answer.
  • an electronic device includes: a memory configured to store a structured data set and a plurality of sets, wherein the plurality of sets are obtained by modifying the structured data set; and a data processor configured to receive a query on the structured data set, to transform the query into a plurality of set membership queries on the plurality of sets, to determine a plurality of first answers corresponding to the plurality of set membership queries; to process the plurality of first answers to obtain a second answer corresponding to the query, and to return the second answer.
  • a method includes: storing a data set and a certification image for the data set at a responder; storing at least one digest of the certification image at a data source; in response to an update to the data set by the data source, the data source using the at least one digest and the certification image to verify the update; receiving a query on the data set, wherein the query is received by the responder from the data source; transforming the query into a plurality of set membership queries on a plurality of sets, wherein the plurality of sets are obtained by modifying the data set; determining a plurality of first answers corresponding to the plurality of set membership queries; processing the plurality of first answers to obtain a second answer corresponding to the query; and returning the second answer and a corresponding proof for the second answer from the responder to the data source.
  • a system in another exemplary aspect of the invention, includes: a responder configured to store a data set and a certification image for the data set; and a data source configured to store at least one digest of the certification image, wherein in response to an update to the data set by the data source, the data source is configured to use the at least one digest and the certification image to verify the update, wherein the responder is configured to receive a query on the data set from the data source, to transform the query into a plurality of set membership queries on a plurality of sets, to determine a plurality of first answers corresponding to the plurality of set membership queries, to process the plurality of first answers to obtain a second answer corresponding to the query, and to return the second answer and a corresponding proof for the second answer to the data source, wherein the plurality of sets are obtained by modifying the data set.
  • a method includes: storing a data set and a certification image for the data set at a data source and a responder; storing at least one digest of the certification image at the data source and the responder; storing at least one signature of the at least one stored digest at the data source and the responder; in response to an update to the data set by the data source, updating the corresponding stored at least one digest by the data source and the responder; signing the updated at least one digest by the data source; transmitting the signed updated at least one digest from the data source to the responder; receiving a query on the data set, wherein the query is received by the responder from a query source; transforming the query into a plurality of set membership queries on a plurality of sets, wherein the transformation is performed by the responder and the plurality of sets are obtained by modifying the data set; determining by the responder a plurality of first answers corresponding to the plurality of set membership queries; processing by the responder the plurality of first answers
  • a system in another exemplary aspect of the invention, includes: a data source configured to store a data set, a certification image for the data set, at least one digest of the certification image, at least one signature of the at least one digest, to update the at least one digest in response to an update operation, to sign the updated at least one digest, and to transmit the signed updated at least one digest to the responder; a query source; and a responder configured to receive the signed updated at least one digest from the data source, to receive the query on the data set from the query source, to transform the query into a plurality of set membership queries on a plurality of sets obtained by modifying the data set, to determine a plurality of first answers corresponding to the plurality of set membership queries, to process the plurality of first answers to obtain a second answer corresponding to the query, and to send the second answer, a corresponding proof for the second answer, and the signed at least one digest to the query source, wherein the query source is configured to verify the received second answer by using the proof and the signed at least one
  • FIG. 1 shows a system in which the exemplary embodiments of the invention may be employed
  • FIG. 2 depicts a flowchart illustrating one non-limiting example of a method for practicing the exemplary embodiments of this invention
  • FIG. 3 depicts a flowchart illustrating another non-limiting example of a method for practicing the exemplary embodiments of this invention.
  • FIG. 4 depicts a flowchart illustrating another non-limiting example of a method for practicing the exemplary embodiments of this invention.
  • the exemplary embodiments of the invention introduce the important concept of (query) problem reducibility in data authentication.
  • An important reduction is shown: any query problem in the proposed query model is authenticated reduced to the fundamental set membership problem. This is a non-trivial reduction, meaning that efficiency is preserved in the reduction.
  • a trivial reduction authenticates the answer to a query by authenticating all possible query-answer pairs, which for most query problems is a set of infinite cardinality. This has many important consequences as will be subsequently discussed. Overall, through this reduction, one obtains general possibility results for the design of authenticated data structures.
  • Data authentication is a fundamental problem in data management, where it is desirable to design efficient cryptographic techniques that guarantee data authenticity in untrusted and adversarial data distribution environments.
  • the problem of authenticating general queries over structured data in the RAM model of computation is studied.
  • the problem is formally defined in its general form and a new model is put forward for data authentication, where answer validity rather than the querying process is actually authenticated, and it is shown that this approach achieves generality.
  • the notion of reducibility of query authentication primitives is introduced and it is shown that any general query based on the evaluation of relations over the data elements is reduced to authenticated membership queries. Using this, general possibility results under general assumptions and characterization theorems about the use of cryptographic techniques for the data authentication problem are illustrated.
  • the proposed authentication framework enjoys important properties and certain advantages over previous approaches, including: generality, expressiveness and sufficient conditions for the design of new efficient authenticated data structures.
  • Exemplary certification data structures are defined, which extend the concept of certifying algorithms to data structures.
  • An exemplary rigorous cryptographic framework is provided for the analysis of authenticated data structures by defining query authentication schemes. Also, reductions between query authentication schemes are defined.
  • An exemplary method is presented for constructing an authenticated data structure from a certification data structure that preserves super-efficient verification. Namely, a certification data structure with verification time asymptotically less than query time yields an authenticated data structure with the same property. Finally it is shown that the authentication of general queries can be reduced to the authentication of set membership queries. Note that these contributions may be viewed as non-limiting examples of the various exemplary embodiments of the invention described in further detail herein.
  • Section 2 provides definitions for the query model
  • Section 3 describes certification data structures and their properties
  • Section 4 provides the definitional framework, introducing query authentication schemes and their security requirements.
  • Section 5 introduces the reducibility among authentication schemes and illustrates the primary results. Proofs are included in the Appendix. 2 Preliminaries
  • Definition 1 Structured Data Set.
  • a structured data set (or, simply, a data
  • !R is defined as n
  • relation E edges in G
  • More complex graphs e.g., with edge directions, weights, costs or associated data elements, can be represented by appropriately including new primitive data-element sets in £ and corresponding sequences in ⁇ . describing data elements' structure and various relations among them.
  • a query operation Qs on S is a computable function Qs : Q J ⁇ & where Q is the query space (the set of all possible queries q of specific type that can be issued about S) and JAs is the answer space ⁇ the set of all possible answers to queries drawn from Q).
  • Q is the query space (the set of all possible queries q of specific type that can be issued about S)
  • JAs is the answer space ⁇ the set of all possible answers to queries drawn from Q).
  • any query in the query space is mapped to a unique answer in the answer space and that any answer corresponds to some query.
  • Sc ( ⁇ , ⁇ R) represents a monotone subdivision of the plane into the polygons induced by the vertices and edges of a planar graph G
  • the point location query operation maps a point in the plane (query) to the unique region of the subdivision (answer) containing it.
  • query operation Qs is efficiently computable.
  • function Qs is evaluated on query q e Q by a query answering algorithm that operates over S through an appropriate for the type of queries in Q query data structure.
  • query operation Q s can be defined independently of the data set S, such that the answer to query q is Q(Z?, q).
  • the query and answer spaces are also independent of S.
  • Unique answers are used without loss of generality.
  • Qs is a mapping not a function. That is, there may be situations where more than one answer can exist for a given query. For instance, a path query on a graph, given two vertices asks for any connecting path, if it exists.
  • data set (S Q , RQ) represents a two-level search structure locating points in logarithmic time; here, data set SQ includes information about the regions defined by the edges of graph G.
  • a data set S is said to be static if it stays the same over time and dynamic if it evolves over time through update operations performed on S.
  • An update operation Us for S is a function that given an update y e ⁇ f, where y is the set of all possible
  • the presented data querying model achieves generality by combining the expressiveness of relational databases with the power of the RAM computation model.
  • index-annotated relations complex data organizations are easily represented and accessed.
  • indirect addressing is supported by treating indexes as a distinct data type which is included in S, thus the model strictly contains the pointer machine model.
  • Answer testability On input query q e Q, data set (Sc, Rc), answer a e JAs and answer test ⁇ , algorithm Verify accesses and processes only relations in Rc( ⁇ ) and returns either 0 (rejects) or 1 (accepts).
  • C(Qs) is answer-efficient if the time complexity Tc(n) o/Certify is asymptotically at most the time complexity T A ( ⁇ ) of Answer, i.e., Tc(n) is O(T A (n));
  • C(Qs) is time-efficient (resp. time super-efficient) if the time complexity Ty(n) o/Verify is asymptotically at most (resp. less than) the time complexity T A (n) o/ ⁇ nswer, i.e., 7V(w) is O(T A (n)) (resp. o(TA(n))); and analogously, (3) C(Qs) is space-efficient (resp. space super-efficient) if the space requirement Sc(ri) of (6c, Rc) is asymptotically at most (resp. less than) the space
  • an update algorithm Update c is responsible to handle updates in data set S by accordingly updating C(Qs); that is, it produces the updated set (£'c, R-'c) and, in particular, the set of tuples where "R'c and Rc differ at.
  • Algorithm Updat ⁇ c additionally produces an update test (as the answer test above, a set of indices for tuples in Rc) that validates the performed changes.
  • an update testing algorithm Updtest on input an update y e y, set Rc, a set of tuples (changes in Kc) and an update test, accepts if and only if the tuples correspond to the correct, according to y, new or deleted tuples in Kc-
  • update efficiency and update-testing (super-) efficiency for C(Qs) with respect to the time complexity of Updatec and Updtest respectively, as they asymptotically compare to Update ⁇ .
  • Certification data structures introduce a general framework for studying data querying with respect to the answer validation and correctness verification. They support certification of queries in a computational setting where the notions of query answering and answer validation are conceptually and algorithmically separated in a clean way.
  • query certification may depend only on the certification support of the answer, i.e., subset R( ⁇ ).
  • the first result shows that for every query structure there may be an efficient certification structure, that is, a completeness result showing that all queries can be certified without loss of efficiency.
  • Lemma 1 Any query data structure for any query operation on any structured data set admits an answer- time-, update-, update-testing- and space-efficient certification data structure. ⁇ Proof in the Appendix below.)
  • Certification data structures are designed to accompany two-party data query protocols in the straightforward way: party A possesses data sets (SQ, "R Q ) and (Sc,
  • Verify The underlying dynamic set S is controlled by B by creating update and query operations for S. Although both operations are performed at A, B is able to verify their correctness. Thus, this setting models certified outsourced computation: at any point in time, B maintains a correct certification image of outsourced set S allowing verification without loss of efficiency, by Lemma 1.
  • query authentication schemes extend certification structures in that answer validation is not performed in a collaborative setting; instead, the prover may be adversarial and answer verification is now achieved in the bounded computational model.
  • data authentication is examined in a non-conventional setting, where the creator (e.g. owner) of a data set is not the same entity as the one answering queries about the set and, in particular, the data owner does not control the corresponding data structure that is used to answer a query.
  • an intermediate, untrusted party answers the queries about the data set that are issued by an end-user.
  • the system 100 includes a Source (S) 102, a User (Ti) 104 and a Responder (TL) 106.
  • S 102 creates (and owns) a data set S 108, which is maintained by query data structure D(Qs) 110 for query operation Qs : Q J 2 Is on S 108.
  • Q s is a query operation on data set S 108
  • Q is a query space
  • J 2 Is is an answer space.
  • QAS(Qs, S) 112 for query operation Qs on S
  • C(Q 5 ) ((£c, 1 Rc), Certify, Verify) as defined and explained above.
  • QAS(Q 5 , 5) 112 is as defined and explained below.
  • 31 106 stores S 108, by maintaining a copy of D(Qs) 110 and some auxiliary information aux(S) (not shown) for S 108.
  • TL 106 also maintains C(Qs) 114.
  • 1/ 104 issues queries about S 108 to TL 106 by sending to IL 106 a query (q) 116 where q e Q.
  • IL 106 then sends a 118 to 1 U 104.
  • For the dynamic case on an update (y) 120 for S 108 issued by S 102, where y e ⁇ /, S 108 and Z ) (Q 5 ) 110 are appropriately updated by S 102 and .R 106.
  • the system 100 may comprise a static system or static data set, where the system or data set stays the same over time (e.g. does not evolve over time through update operations).
  • the system 100 may comprise a confirmation response (c) 122 that is sent to 5 102 in response to TL 106 receivingy 120.
  • c 122 may be sent to 5102 in response to TL 106 successfully performing , )/ 120.
  • D(Qs) 110 may be coupled to QAS(Q ⁇ S) 112
  • QAS(Q 5 , S) 112 may be coupled to C(Qs) 114
  • C(Q S ) 114 may be coupled to D(Q 5 ) 1 10
  • D(Q 5 ) 110, QAS(Q 5 , S) 112 and C(Qs) 114 may all be coupled together (e.g. to each other).
  • y 120 may comprise an update to C(Qs) 114.
  • y 120 may comprise an update to QAS(Qs, S) 112.
  • aux(S), S lOS, D(Qs) 110, QAS(Q ⁇ S) 112 and/or C(Qs) 114 may be stored (e.g. located) at a location external to 1 R, 106.
  • ⁇ , 106 may act as an intermediary (e.g., controller) for access to or use of such external components by Ii 104.
  • the system 100 may comprise a network over which S 102, Ii 104 and/or JR.106 can communicate.
  • a network may comprise the internet.
  • 1/ 104 may comprise a client enabled to communicate with R. 106 over a network.
  • 11 104 may comprise a client enabled to communicate with R. 106 over the internet.
  • Ii 104 may comprise an electronic device. Such an electronic device may comprise at least one data processor, at least one memory, a transceiver, and a user interface comprising a user input and a display device.
  • an encryption component may be employed.
  • the encryption component may be a separate entity (e.g. an integrated circuit, an Application Specific Integrated Circuit or ASIC) or may be integrated with other components (e.g. a program run by a data processor, functionality enabled by a data processor).
  • the authenticator algorithm Auth takes as input the secret and public key (SK, PK), the query space Q (or an encoding of the query type) and data set S of size n and outputs an authentication string a and a verification structure V, that is (a, V) ⁇ r PK, Q, S), where a, V e ⁇ 0, 1 ⁇ *.
  • Responder The responder algorithm Res takes as input a query q e Q, a data set S of size n and a verification structure V e ⁇ 0, 1 ⁇ * and outputs an answer-proof pair (a, p) 4- Resf ⁇ , S, V), where a e JAs and p e ⁇ 0, 1 ⁇ *.
  • Verifier The verifier algorithm Ver takes as input the public key PK, a query q e Q, an answer-proof pair (a,p) e SAs x ⁇ 0, 1 ⁇ * and an authentication string a G ⁇ 0, 1 ⁇ * and either accepts the input, returns 1, or rejects, returns 0, that is, ⁇ 0, ⁇ ⁇ ⁇ - Ver(PK, q, (a,p), ⁇ ).
  • Auth ⁇ For the dynamic case, the existence of an update algorithm Auth ⁇ is additional required such that it complements algorithm Auth and handles updates.
  • Auth f/ given update y e Y it updates the authentication string and the verification structure: (a 1 , V) ⁇ - Auth ⁇ (SK, PK, Q, S,y, a, V).
  • the first consideration for a query authentication scheme is correctness.
  • the verification algorithm accept answer-proof pairs generated by the responder algorithm and these answers to generally always be correct.
  • the user 1/ trusts the data source S but not the responder .R, it is the responder that can act adversarially.
  • 1 R always participates in the three-party protocol, i.e., it communicates with 5and 11, as the protocol dictates.
  • 1 R can adversarially try to cheat, by not providing the correct answer to a query and forging a false proof for this answer.
  • the security consideration is that given any query issued by Ii, no polynomially computationally bounded 1 K can reply with a pair of answer and an associated proof, such that both the answer is not correct and Zl verifies the authenticity of the answer and, thus, accepts it.
  • the above considerations are expressed as the following two conditions for query authentication structure.
  • Definition 7 (Correctness).
  • a query authentication scheme (KeyG, Auth,
  • Definition 8 (Security).
  • a query authentication scheme (KeyG, Auth, Res,
  • QAS(Qs, S) denote a query authentication scheme (or QAS) for query operation Qs and data set S.
  • QAS query authentication scheme
  • authenticated reductions among QASs allow the design of a QAS using no other cryptographic tools but what another QAS provides and in a way that preserves correctness and security.
  • QAS(Qs, S) is authenticated reduced to QAS(Q 's, 5"), if key generation algorithms KeyG and KeyG 1 are identical, QAS(Qs, S) uses the public and secret keys generated by KeyG never explicitly, but only implicitly through black-box invocations of algorithms Auth', Res' and Ver 1 , and QAS(Qs, S) is correct and secure whenever QAS(Q' s , 5") is correct and secure.
  • Q JAs be a query operation.
  • D(Qs) (SQ, R Q , Answer) be a query data structure for Qs.
  • C(Q S ) ((8 C , 1 Rc), Certify, Verify) for 5 with respect to Q s .
  • Q e Q( 1 Rc)
  • (D) Verifier The verifier algorithm Ver first checks if the proof/? and answer a are both well-formed and, if not, it rejects. Otherwise, by appropriately processing the proof/?, algorithm Ver runs algorithm Verify on inputs q, a, Rdj) and ⁇ .
  • algorithm Ver runs algorithm Ver 1 on inputs PK', ⁇ t, >, (yes, p'(t,)) and a ' and if 0 «- Ver(PK', ⁇ t, >, (yes, p'(t,)), a * ), algorithm Ver rejects. Otherwise, Ver continues with the computation.
  • Ver accepts if and only if Verify accepts, i.e., if and only if 1 4- Verifyf ⁇ , -RcOO, a, ⁇ ), i.e., if Verify accepts.
  • Q s is a general query operation of set S, parameterized by QAS(Q e , 1 Rc), where Q e is the set membership query operation and Rc is the certification image of S with respect to the certification data structure in use.
  • Q s is a general query operation of set S, parameterized by QAS(Q e , 1 Rc)
  • Q e is the set membership query operation
  • Rc is the certification image of S with respect to the certification data structure in use.
  • QAS(Q e , Rc) be any query authentication structure for set membership queries and QAS(Qs, S) be the query authentication scheme constructed above. For any query operation Qs and any data set S, QAS(Qs, S) is correct and secure if and only if QAS(Q e , Rc) is correct and secure.
  • denote the size of the certification image and s(n) ⁇ Rc( ⁇ ) ⁇ the size of the certification support of an answer.
  • the query authentication scheme QA S(Qs, S) For any query operation Qs, the query authentication scheme QA S(Qs, S)
  • T a (n) O(T' a (n))
  • T r (ri) O(s(ri)T'J(ri)
  • the above lemma can be used to have general complexity results in terms of the presented parameterized query authentication scheme QAS(Qs, S).
  • QAS(Qs, S) parameterized query authentication scheme
  • asymptotic analysis is of interest, omitting improvements of constant factors. So, only the related costs with respect to the set size n are studied and not the exact implementation of the cryptographic primitives.
  • TjQi) O ⁇ TyQi) + sQi) log n); with respect to space complexity, SJn) is OQi + mQi)), SrQi) is OQt + S ⁇ n) + mQi)); with respect to the proof size, p ⁇ n) is O(sQi) log n); when k tuples are updated, these can be handled in O ⁇ k log n) time.
  • TJn OQnQi
  • T r Qi) is O ⁇ sQi) 4n + Tdn)
  • T v (n) is OQsQi) + Ty(n))
  • SJn is O ⁇ n + mQi)
  • SrQi) is OQi + -? ⁇ ? ( «) + mQi)
  • pQi) is sQi); when k tuples are updated, these can be handled in O(ky/n ) time.
  • Theorem 2 By Theorem 2, all query operations can be authenticated in the three-party authentication model.
  • Theorem 3 gives a detailed complexity analysis of the authenticated data structures derived by the corresponding query authentication schemes. Still, the above description depends on the complexity of the certification data structure used. These results hold for the RAM model of computation, which strictly includes the pointer machine model.
  • Lemma 1 all query problems that have a query data structure have a certification data structure and thus these results generalize and improve previous known possibility results. Additionally, this framework provides insights for super- efficient verification, as described in the following meta-theorem.
  • certification data structures support verification of outsourced computations for data querying in an information-theoretic setting.
  • a data set S owned by a party A can be maintained by another party B and operations on S decided by A can be certified and executed by B and then can be verified by A using the certification image (£c. Hc) of S which is stored by A.
  • the idea is to use cryptographic primitives (e.g., hashing) that provide commitments of sets, subject to which membership can be securely and efficiently checked.
  • the source A can use either a one-way accumulator or a Merkle hash tree for creating and locally storing a secure digest (or a small collection of digests) of set (Sc, Rc) which is now explicitly stored at party B.
  • Algorithms Certify and Verify are appropriately extended to reflect this change.
  • the answer verification at party A now looks for each relation needed to process in the answer test (instead of looking for it in its local memory) and before operating on each relation it performs a check to verify the corresponding set- membership proof against one of the digests that are locally stored.
  • a novel general authentication method for completely outsourcing a dynamic data set S to an untrusted server, ordering the server to execute any series of update or query operations on S, where queries can be of any type (for which there exists an efficiently answering procedure), and then being able to verify the results of any operation, where the data owner needs only to store a short digest of set S.
  • a client e.g., a small computational device, A
  • A uses space super-efficient protocols to check its data that resides at a remote and untrusted server (B) (see Appendix E).
  • any certification data structure can be transformed to a secure, consistent, space-optimal data outsourced scheme in the client-server communication model.
  • the data source stores the certification image
  • the responder stores the data set and the certification image (parties A and B in the previous discussion).
  • the data source can outsource the certification image to the responder and still be able to check that the data set is correctly maintained and that space-optimality is achieved at the data source.
  • a data source can completely outsource a dynamic data set S to an untrusted server (responder), (2) by storing only a digest of S, the source can efficiently verify that any update operation on S is performed correctly by the server and (3) any user can query S — for any type of efficiently answerable query — and verify that the answer returned by the responder is authentic.
  • a data source can completely outsource a dynamic data set S to an untrusted peer-to-peer network (more generally, a distributed responder) that supports the basic put — get functionality over data objects, (2) by storing only a digest of S 1 , the source can efficiently verify that any update operation on S is performed correctly by the peer-to-peer network and (3) any user can query S — for any type of efficiently answerable query — and verify that the answer returned by the untrusted peer-to-peer network is authentic.
  • Cryptographic Primitives Some cryptographic primitives are now considered that are useful for this exposition. A definition is given of a signature scheme which has become the standard definition of security for signature schemes. Schemes that satisfy it are also known as signature schemes secure against adaptive chosen-message attack. A function v : N R is negligible if for every positive polynomial /»(•) and for
  • a cryptographic hash function h operates on a variable-length message M producing a fixed-length hash value h(M). Moreover, the desired security results are achieved by means of collision-resistance, an additional security property for hash function h.
  • J-C defines a family of collision resistant hash functions if.
  • ⁇ (-) operates on the concatenation of some well-defined and fixed-size binary representation of elements eil e h ,e, 2 ,..., e, t or on the concatenation of the individual hashes using ⁇ ( ) of some well-defined binary representation of elements e h , e h e, t .
  • h essentially operates on one binary string ⁇ , where the cost of the operation of h on ⁇ is at least proportional to the its length ⁇ .
  • a hash tree is a binary tree, where each node stores a hash value computed using a collision-resistant hash function. At leaf nodes the hash of the corresponding element is stored; at internal nodes the hash of the concatenation of the hash values of the children nodes.
  • Definition 14 ((One-way) Dynamic Accumulator).
  • Efficient Generation There is an efficient algorithm Gen that on input 1 ⁇ generates a random element f of J 7 ⁇ an auxiliary information awe / and a trapdoor information trd/. Both aux/and trd/have sizes that are linear in k.
  • Efficient Evaluation Function f is a computable function f : JAf ⁇ X*, where JA/, an efficiently samplable set of accumulation values, and X k , the proposed set of elements to be accumulated, constitute the input domain off.
  • Function f is polynomial- time computable given the auxiliary information aux/.
  • SA/ x X ⁇ denote the domains for which the computational procedure for function J ' e J 7 Ic is defined. That is, in principle, SAf z> SAf and X ⁇ ⁇ X k - For all probabilistic polynomial-time adversaries Adv k
  • time super-efficient certification data structures are described, further justifying the importance of the notion of answer testability.
  • the certification image may be as large as the query structure
  • the certification support of the answer to any query has size asymptotically less than the "searching trail" of the query answering algorithm Answer.
  • a super-efficient certification data structure exploits this gap in certifying queries.
  • E is a set Ko ⁇ n key elements with a totally ordering and a set Vofn values, and 1 R. consists of two indexed relations, the key- value relation R KV and the successor relation Rs over keys.
  • algorithm Certify returns as an answer test the indices in Rc of two tuples: if q e K, the indices of tuple ⁇ q, suc(q) > of the successor relation Rs and of tuple ⁇ q, v > of the key-value relation R KV are returned, otherwise, the indices of tuples ⁇ x, suc(x) >, ⁇ y, suc(y) > e Rs, such that x is the maximum element and y is the minimum element satisfying x ⁇ q ⁇ y, according to the total order of K. Algorithm Verify, accesses these tuples and accepts or rejects accordingly.
  • a time super-efficient certification data structure stores the trapezoidal decomposition of the subdivision.
  • Each trapezoid is expressed as a tuple of five data elements: two vertices (defining the top and bottom sides), two edges (defining the left and right sides), and a region (containing the trapezoid).
  • the answer test is the index of the trapezoid containing the query point, which can be computed by a simple modification of the query algorithm.
  • Auth, Res, Ver be a query authentication scheme for query operation Qs : Q JAs on structured data set S.
  • query authentication scheme (KeyG, Auth, Res, Ver) is secure if no probabilistic polynomial-time adversary A can win non-negligibly often in the following game:
  • A outputs an authentication string a, an answer a' and a proof p.
  • Algorithm Verify is defined as an augmentation of Answer operating as follows. On input a query q e Q, set (Ec, 1 Rc), an answer a and a sequence ⁇ , algorithm
  • Verify starts executing algorithm Answer on input (q, (EQ, RQ)) and checks the execution
  • Ver 1 is secure. Assume that (KeyG, Auth, Res, Ver) is not secure, then with overwhelming probability responder Res responds to a query q e Q incorrectly but still the verifier Ver fails to reject its input. Based on the soundness property of the certification data structure in use, one must admit that it is not algorithm Certify that cheats the verifier, that is, it is not the indices in sequence ⁇ that cause the problem, but rather the fact that algorithm Verily runs on incorrect data. Then there must be at least one tuple in Hc that although it was verified to be a member of CRc it is not authentic, meaning that its index is correct but one or more of the data elements in the tuple have been (maliciously) altered. One thus concludes that for at least one query, the verification algorithm Ver 1 of query authentication scheme (KeyG', Auth', Res', Ver 1 ) failed to reject on an invalid query-answer pair. This is a contradiction, since this scheme is assumed to be secure.
  • Dynamic accumulators have linear storage needs, constant time verification and constant proof, at an increased cost to support updates and processes (of witnesses). Note that trapdoor information can only be used by algorithm Auth and not by algorithm Res for it would destroy the security of the scheme. Some interesting trade-offs between the update and process time costs are discussed in other publications (e.g., one can achieve a -Jn trade-off). See M. T. Goodrich, R. Tamassia, and J. Hasic. An efficient dynamic and distributed cryptographic accumulator. In Proc. of Information Security Conference (ISC), volume 2433 of LNCS, pages 372-388. Springer- Verlag, 2002.
  • ISC Information Security Conference
  • a client Cl completely outsources a data set S owned by the client to a remote, untrusted server Ser. S is generated and queried through a series of update and query operations issued by Cl. At any time, the client Cl keeps some state information s that encodes information about the current state of the outsourced set S. Then the communication protocol is as follows:
  • the client Cl keeps state information s and issues an update operation to the server Ser.
  • Server Ser performs the operation, i.e., Ser accordingly updates S to a new version S', and generates a proof ⁇ , which is then returned to the client Cl.
  • ⁇ * certify( ⁇ , S, S * ).
  • the proofs returned to the client is called a consistency proof.
  • Client Cl runs a verification algorithm, which takes as input the current state s, the operation o and the corresponding consistency proof ⁇ and either accepts or rejects the input. If the input is accepted, the state s is appropriately updated to a new state s '.
  • Definition 16 (Security for data outsourced schemes.) Let (certify, verify) be a data outsourced scheme with security parameter K. Let s be any state that is consistent with the set S that corresponds to any series of operations in an initially empty set, and let o be any operation. Then, (certify, verify) is said to be secure if the following requirements are satisfied.
  • Consistency For any polynomial-time adversary A, having oracle-access to algorithms certify and verify, that on input a set S and an operation o produces a consistency proof ⁇ , whenever (yes, s') *— verify(.s, ⁇ ), then the probability that either (S 1 ) ⁇ — operate(o, S) does not hold or s' is not consistent with S' is negligible in the security parameter K.
  • a method comprises: receiving a query on a structured data set (box 21); transforming the query into a plurality of set membership queries on a plurality of sets, wherein the plurality of sets are obtained by modifying the structured data set (box 22); determining a plurality of first answers corresponding to the plurality of set membership queries (box 23); processing the plurality of first answers to obtain a second answer corresponding to the query (box 24); and returning the second answer (box 25).
  • a method as above further comprising: determining an answer test corresponding to the second answer; and returning the answer test.
  • the plurality of sets comprises a certification image.
  • a method as in any of the above, wherein the method is implemented by a computer program.
  • a computer program product comprises program instructions tangibly embodied on a computer-readable medium execution of which results in operations comprising: receiving a query on a structured data set; transforming the query into a plurality of set membership queries on a plurality of sets, wherein the plurality of sets are obtained by modifying the structured data set; determining a plurality of first answers corresponding to the plurality of set membership queries; processing the plurality of first answers to obtain a second answer corresponding to the query; and returning the second answer.
  • a computer program product as above further comprising: determining an answer test corresponding to the second answer; and returning the answer test.
  • a computer program product as above further comprising: verifying the second answer by utilizing the answer test.
  • a computer program product as in any of the above further comprising: in response to receiving an update for the structured data set, updating the structured data set and returning an update test.
  • a computer program product as in any of the above, wherein the plurality of sets comprises a certification image.
  • an electronic device comprises: a memory configured to store a structured data set and a plurality of sets, wherein the plurality of sets are obtained by modifying the structured data set; and a data processor configured to receive a query on the structured data set, to transform the query into a plurality of set membership queries on the plurality of sets, to determine a plurality of first answers corresponding to the plurality of set membership queries; to process the plurality of first answers to obtain a second answer corresponding to the query, and to return the second answer.
  • a method comprises: storing a data set and a certification image for the data set at a responder (box 31); storing at least one digest of the certification image at a data source (box 32); in response to an update to the data set by the data source, the data source using the at least one digest and the certification image to verify the update (box 33); receiving a query on the data set, wherein the query is received by the responder from the data source (box 34); transforming the query into a plurality of set membership queries on a plurality of sets, wherein the plurality of sets are obtained by modifying the data set (box 35); determining a plurality of first answers corresponding to the plurality of set membership queries (box 36); processing the plurality of first answers to obtain a second answer corresponding to the query (box 37); and returning the second answer and a corresponding proof for the second answer from the responder to the data source (box 38).
  • the method is implemented by a computer program.
  • a computer program product comprises program instructions tangibly embodied on a computer-readable medium execution of which results in operations comprising: storing a data set and a certification image for the data set at a responder; storing at least one digest of the certification image at a data source; in response to an update to the data set by the data source, the data source using the at least one digest and the certification image to verify the update; receiving a query on the data set, wherein the query is received by the responder from the data source; transforming the query into a plurality of set membership queries on a plurality of sets, wherein the plurality of sets are obtained by modifying the data set; determining a plurality of first answers corresponding to the plurality of set membership queries; processing the plurality of first answers to obtain a second answer corresponding to the query; and returning the second answer and a corresponding proof for the second answer from the responder to the data source.
  • a system comprises: a responder configured to store a data set and a certification image for the data set; and a data source configured to store at least one digest of the certification image, wherein in response to an update to the data set by the data source, the data source is configured to use the at least one digest and the certification image to verify the update, wherein the responder is configured to receive a query on the data set from the data source, to transform the query into a plurality of set membership queries on a plurality of sets, to determine a plurality of first answers corresponding to the plurality of set membership queries, to process the plurality of first answers to obtain a second answer corresponding to the query, and to return the second answer and a corresponding proof for the second answer to the data source, wherein the plurality of sets are obtained by modifying the data set.
  • the responder is further configured to determine an answer test corresponding to the second answer and to return the answer test to the data source.
  • the system comprises a peer-to-peer network.
  • a method includes: storing a data set and a certification image for the data set at a data source and a responder (box 41); storing at least one digest of the certification image at the data source and the responder (box 42); storing at least one signature of the at least one stored digest at the data source and the responder (box 43); in response to an update to the data set by the data source, updating the corresponding stored at least one digest by the data source and the responder (box 44); signing the updated at least one digest by the data source (box 45); transmitting the signed updated at least one digest from the data source to the responder (box 46); receiving a query on the data set, wherein the query is received by the responder from a query source (box 47); transforming the query into a plurality of set membership queries on a plurality of sets, wherein the transformation is performed by the responder and the plurality of sets are obtained by modifying the data set (box 48); determining by the respond
  • the responder is realized by means of a collection of first nodes in a peer-to-peer network, wherein each first node corresponds to a second node of a hash tree or an authenticated skip list.
  • a computer program product comprises program instructions tangibly embodied on a computer-readable medium execution of which results in operations comprising: storing a data set and a certification image for the data set at a data source and a responder; storing at least one digest of the certification image at the data source and the responder; storing at least one signature of the at least one stored digest at the data source and the responder; in response to an update to the data set by the data source, updating the corresponding stored at least one digest by the data source and the responder; signing the updated at least one digest by the data source; transmitting the signed updated at least one digest from the data source to the responder; receiving a query on the data set, wherein the query is received by the responder from a query source; transforming the query into a plurality of set membership queries on a plurality of sets, wherein the transformation is performed by the responder and the plurality of sets are obtained by modifying the data set; determining by the responder a plurality of
  • a computer program product as above further comprising: storing by the data source and the responder a set digest of all elements of the plurality of sets.
  • a computer program product as in any of the above further comprising: storing by the data source and the responder at least one hash tree or authenticated skip list for the plurality of sets; and storing by the data source and the responder at least one digest of at least one root node of said at least one hash tree or authenticated skip list.
  • the responder is realized by means of a collection of first nodes in a peer-to-peer network, wherein each first node corresponds to a second node of a hash tree or an authenticated skip list.
  • a system comprises: a data source configured to store a data set, a certification image for the data set, at least one digest of the certification image, at least one signature of the at least one digest, to update the at least one digest in response to an update operation, to sign the updated at least one digest, and to transmit the signed updated at least one digest to the responder; a query source; and a responder configured to receive the signed updated at least one digest from the data source, to receive the query on the data set from the query source, to transform the query into a plurality of set membership queries on a plurality of sets obtained by modifying the data set, to determine a plurality of first answers corresponding to the plurality of set membership queries, to process the plurality of first answers to obtain a second answer corresponding to the query, and to send the second answer, a corresponding proof for the second answer, and the signed at least one digest to the query source, wherein the query source is configured to verify the received second answer by using the proof and the signed at least
  • the responder is realized by means of a collection of first nodes in a peer-to-peer network, wherein each first node corresponds to a second node of a hash tree or an authenticated skip list.
  • the data source is further configured to store at least one hash tree or authenticated skip list for the plurality of sets and at least one digest of at least one root node of said at least one hash tree or authenticated skip list
  • the responder is further configured to store at least one hash tree or authenticated skip list for the plurality of sets and at least one digest of at least one root node of said at least one hash tree or authenticated skip list.
  • the data source is further configured to store at least one accumulator for the plurality of sets and at least one accumulator digest of accumulations of said at least one accumulator and wherein the responder is further configured to store at least one accumulator for the plurality of sets and at least one accumulator digest of accumulations of said at least one accumulator.
  • various exemplary embodiments of the invention can be implemented in different mediums, such as software, hardware, logic, special purpose circuits or any combination thereof.
  • some aspects may be implemented in software which may be run on a computing device, while other aspects may be implemented in hardware.

Abstract

One non-limiting, exemplary method includes: receiving a query on a structured data set; transforming the query into a plurality of set membership queries on a plurality of sets, wherein the plurality of sets are obtained by modifying the structured data set; determining a plurality of first answers corresponding to the plurality of set membership queries; processing the plurality of first answers to obtain a second answer corresponding to the query; and returning the second answer. This exemplary method can be extended, for example, to a three-party system involving a data source, a responder and a query source.

Description

CERTIFICATION AND AUTHENTICATION OF DATA STRUCTURES
TECHNICAL FIELD:
[0001] The teachings in accordance with the exemplary embodiments of this invention relate generally to data authentication and, more specifically, relate to authenticating general queries over structured data.
BACKGROUND:
[0002] The problem of data authentication is a fundamental one from both practical and theoretical aspects. More and more in distributed and pervasive computing environments, information is delivered through untrusted computing entities, raising security threats with respect to data authenticity. From a theoretical point of view, data authentication introduces new dimensions in both algorithm design and cryptography. On one hand, known data management and data structuring techniques often need to be reexamined, in the new data dissemination settings, where the data distributor and the data owner are different entities. On the other hand, directly applying traditional and well-studied message authentication techniques for data authentication - where data cannot be treated as a whole — is often inadequate to provide efficient solutions.
1 Previous and Related Work
[0003] Stream-based data authentication has been extensively studied, especially in the context of multicast authentication. See A. Lysyanskaya, R. Tamassia, and N. Triandopoulos. Multicast authentication in fully adversarial networks. In Proceedings of IEEE Symposium on Security and Privacy, pages 241-255, May 2004.
[0004] Authenticated Data Structures Extensive work exists on authenticated data structures, which model the security problem of data querying in untrusted or adversarial environments. See R. Tamassia. Authenticated data structures. In Prod. European Symp. on Algorithms, volume 2832 of Lecture Notes in Computer Science, pages 2—5. Springer- Verlag, 2003. This model augments a data structure such that along with an answer to a query, a cryptographic proof is provided that can be used to verify the answer authenticity. Research initially focused on authenticating membership queries (mostly in the context of the certificate revocation problem), where various authenticated dictionaries based on extensions of the hash tree introduced by Merkle have been studied. In one publication it was showed how the use of dynamic accumulators can realize a dynamic authenticated dictionary and in another an interesting combination of hashing with accumulators is used to improve the efficiency of one-dimensional authenticated range searching. See M. T. Goodrich, R. Tamassia, and J. Hasic. An efficient dynamic and distributed cryptographic accumulator. In Proc. of Information Security Conference (ISC), volume 2433 of LNCS, pages 372-388. Springer- Verlag, 2002. More general queries, beyond membership queries, have been studied as well, where extension of hash trees are used to authenticate various queries, including: basic operations (e.g., select, join) on databases, pattern matching in tries and orthogonal range searching, path queries and connectivity queries on graphs and queries on geometric objects (e.g., point location queries and segment intersection queries) and queries on XML documents.
[0005] General Authentication Techniques There also has been substantial progress in the design of generic authentication techniques, that is, development of general authentication frameworks that can be used for the design of authenticated data structures for authenticating concrete queries, or design of general authentication patterns that authenticate classes of queries. Work of this type is as follows. One model described how by hashing over the search structure of data structures in a specific class a broad class one can obtain authenticated versions of these data structures. The class of data structures is such that (i) the links of the structure form a directed acyclic graph G of bounded degree and with a single source node; and (Ji) queries on the data structure correspond to a traversal of a subdigraph of G starting at the source. The results hold for the pointer machine model of computation, where essentially the entire search algorithm is authenticated. This way, an answer carries a proof that is proportional to the search time spent for generating the answer itself, and the answer verification has analogous time complexity. The method only handles static problems. It has also been shown how extensions of hash trees can be used to authenticate abstract properties of data that is organized as paths, where the properties are decomposable, i.e., the properties of two subpaths can be combined to give the property of the resulting path. Also the authentication of the general fractional cascading data-structuring technique has been considered. This technique can lead to authentication of data structures that involve iterative searches over catalogs. The underlying model is the same as before, i.e., the pointer machine model. Although the techniques do not explicitly authenticate the corresponding search algorithm, the complexity of the resulted authenticated data structures is of the same order of magnitude as the searching algorithm. Another study described a general technique for designing consistency proofs for committed databases- a different problem than data authentication. However, the technique can be extended to provide a general framework for also designing authenticated data structures (that actually enjoy additional properties). The authentication technique authenticates the searching algorithm that is used to produce the answer. Also, the used model is the pointer machine; the RAM model can be captured at a O(log M) overhead, where Mis the total memory used. The results presented herein operate on the RAM model, thus, they include a broader class of both static and dynamic query problems and can lead to more efficient constructions, where the answer validity and not the algorithm is verified. Finally, it has been shown that for the dictionary problem and hash-based data authentication, the querying problem and the authentication problem are equivalent. That is, for authenticated dictionaries of size n, all costs related to authentication are at least logarithmic in n in the worst case. See R. Tamassia and N. Triandopoulos. Computational bounds on hierarchical data processing with applications to information security. In Proc. Int. Colloquium on Automata, Languages and Programming (ICALP), volume 3580 of LNCS, pages 153-165. Springer- Verlag, 2005.
[0006] Consistency Proofs and Privacy Recently, the study of an additional security property related to authenticated data structures has been initiated. Assuming a more adversarial environment for the user setting, one can consider the case where the data source can act unreliably. The new requirement is then data consistency, namely, the incapability of the data source to provide different, i.e., contradictory, verifiable answers to the same query. This issue has been studied for hash trees and show how to enforce data consistency by augmenting hash trees. Another study introduced zero-knowledge sets, where a prover commits to a value for a set and membership queries can be verified by a verifier consistently (and in zero-knowledge). Still another study extended consistency proofs to range queries and where also sufficient conditions are given for schemes to achieve consistency. Other works provide privacy-preserving verification but involve computationally more expensive operations. [0007] Certifying Algorithms and Checking Primitives Extensive work on certifying algorithms model a computational gap between the computation of a program and the verification of this correctness. This is related to the idea behind the presented authentication framework.
SUMMARY:
[0008] In an exemplary aspect of the invention, a method includes: receiving a query on a structured data set; transforming the query into a plurality of set membership queries on a plurality of sets, wherein the plurality of sets are obtained by modifying the structured data set; determining a plurality of first answers corresponding to the plurality of set membership queries; processing the plurality of first answers to obtain a second answer corresponding to the query; and returning the second answer.
[0009] In another exemplary aspect of the invention, an electronic device includes: a memory configured to store a structured data set and a plurality of sets, wherein the plurality of sets are obtained by modifying the structured data set; and a data processor configured to receive a query on the structured data set, to transform the query into a plurality of set membership queries on the plurality of sets, to determine a plurality of first answers corresponding to the plurality of set membership queries; to process the plurality of first answers to obtain a second answer corresponding to the query, and to return the second answer.
[0010] Tn a further exemplary aspect of the invention, a method includes: storing a data set and a certification image for the data set at a responder; storing at least one digest of the certification image at a data source; in response to an update to the data set by the data source, the data source using the at least one digest and the certification image to verify the update; receiving a query on the data set, wherein the query is received by the responder from the data source; transforming the query into a plurality of set membership queries on a plurality of sets, wherein the plurality of sets are obtained by modifying the data set; determining a plurality of first answers corresponding to the plurality of set membership queries; processing the plurality of first answers to obtain a second answer corresponding to the query; and returning the second answer and a corresponding proof for the second answer from the responder to the data source. [0011] In another exemplary aspect of the invention, a system includes: a responder configured to store a data set and a certification image for the data set; and a data source configured to store at least one digest of the certification image, wherein in response to an update to the data set by the data source, the data source is configured to use the at least one digest and the certification image to verify the update, wherein the responder is configured to receive a query on the data set from the data source, to transform the query into a plurality of set membership queries on a plurality of sets, to determine a plurality of first answers corresponding to the plurality of set membership queries, to process the plurality of first answers to obtain a second answer corresponding to the query, and to return the second answer and a corresponding proof for the second answer to the data source, wherein the plurality of sets are obtained by modifying the data set.
[0012] In a further exemplary aspect of the invention, a method includes: storing a data set and a certification image for the data set at a data source and a responder; storing at least one digest of the certification image at the data source and the responder; storing at least one signature of the at least one stored digest at the data source and the responder; in response to an update to the data set by the data source, updating the corresponding stored at least one digest by the data source and the responder; signing the updated at least one digest by the data source; transmitting the signed updated at least one digest from the data source to the responder; receiving a query on the data set, wherein the query is received by the responder from a query source; transforming the query into a plurality of set membership queries on a plurality of sets, wherein the transformation is performed by the responder and the plurality of sets are obtained by modifying the data set; determining by the responder a plurality of first answers corresponding to the plurality of set membership queries; processing by the responder the plurality of first answers to obtain a second answer corresponding to the query; sending the second answer, a corresponding proof for the second answer, and the respective signed at least one digest from the responder to the query source; and verifying the second answer by the query source using the proof and the signed at least one digest.
[0013] In another exemplary aspect of the invention, a system includes: a data source configured to store a data set, a certification image for the data set, at least one digest of the certification image, at least one signature of the at least one digest, to update the at least one digest in response to an update operation, to sign the updated at least one digest, and to transmit the signed updated at least one digest to the responder; a query source; and a responder configured to receive the signed updated at least one digest from the data source, to receive the query on the data set from the query source, to transform the query into a plurality of set membership queries on a plurality of sets obtained by modifying the data set, to determine a plurality of first answers corresponding to the plurality of set membership queries, to process the plurality of first answers to obtain a second answer corresponding to the query, and to send the second answer, a corresponding proof for the second answer, and the signed at least one digest to the query source, wherein the query source is configured to verify the received second answer by using the proof and the signed at least one digest.
BRIEF DESCRIPTION OF THE DRAWINGS:
[0014] The foregoing and other aspects of embodiments of this invention are made more evident in the following Detailed Description, when read in conjunction with the attached Drawing Figures, wherein:
[0015] FIG. 1 shows a system in which the exemplary embodiments of the invention may be employed;
[0016] FIG. 2 depicts a flowchart illustrating one non-limiting example of a method for practicing the exemplary embodiments of this invention;
[0017] FIG. 3 depicts a flowchart illustrating another non-limiting example of a method for practicing the exemplary embodiments of this invention; and
[0018] FIG. 4 depicts a flowchart illustrating another non-limiting example of a method for practicing the exemplary embodiments of this invention.
DETAILED DESCRIPTION:
[0019] Data authentication over structured data — where data is disseminated by issuing queries - is considered below. While examined in part from a theoretical point of view, there are numerous applications available. Aiming to general results, a very general computational model is used, the RAM model, and a very general data type and query model, where data is organized according to the relational data model, slightly modified to fit the RAM model. A formal definitional framework is provided for the problem of authenticating answers to queries. The inherent relationship of the framework with the concept of answer certification is identified. A new approach is put forward for solving the problem. Central in this work is the following: in contrast to previous general approaches, where the algorithm that answers a query is essentially being authenticated, an answer-based approach is proposed, where, instead, the information necessary for the answer verification is being authenticated. Interestingly, this new approach has connections with certifying algorithms. The related concept of certification data structures is introduced, structures that are able to provide answer certification, and prove a direct connection between them and authenticated data structure, structures that provide authenticated queries- but in the bounded computational model. It is shown that any certification data structure has a corresponding and of the same complexity authenticated data structure. This way, one can exploit the computational gap that has been observed between answering a query or solving a problem and checking or certifying the correctness of the answer or of the computation. The presented methodology to decouple the searching algorithm from the answer verification is modeled through a certification data structure, defined in Section 3.
[0020] Additionally, the exemplary embodiments of the invention introduce the important concept of (query) problem reducibility in data authentication. Informally, this means that a query of type A is authenticated reduced to a query of type B, when an authenticated data structure for B leads to an authenticated data structure for A. An important reduction is shown: any query problem in the proposed query model is authenticated reduced to the fundamental set membership problem. This is a non-trivial reduction, meaning that efficiency is preserved in the reduction. A trivial reduction authenticates the answer to a query by authenticating all possible query-answer pairs, which for most query problems is a set of infinite cardinality. This has many important consequences as will be subsequently discussed. Overall, through this reduction, one obtains general possibility results for the design of authenticated data structures.
[0021 ] Data authentication is a fundamental problem in data management, where it is desirable to design efficient cryptographic techniques that guarantee data authenticity in untrusted and adversarial data distribution environments. For example, herein the problem of authenticating general queries over structured data in the RAM model of computation is studied. The problem is formally defined in its general form and a new model is put forward for data authentication, where answer validity rather than the querying process is actually authenticated, and it is shown that this approach achieves generality. The notion of reducibility of query authentication primitives is introduced and it is shown that any general query based on the evaluation of relations over the data elements is reduced to authenticated membership queries. Using this, general possibility results under general assumptions and characterization theorems about the use of cryptographic techniques for the data authentication problem are illustrated. The proposed authentication framework enjoys important properties and certain advantages over previous approaches, including: generality, expressiveness and sufficient conditions for the design of new efficient authenticated data structures.
1 Contributions
[0022] The contributions of the approach presented can be summarized as follows:
Exemplary certification data structures are defined, which extend the concept of certifying algorithms to data structures. An exemplary rigorous cryptographic framework is provided for the analysis of authenticated data structures by defining query authentication schemes. Also, reductions between query authentication schemes are defined. An exemplary method is presented for constructing an authenticated data structure from a certification data structure that preserves super-efficient verification. Namely, a certification data structure with verification time asymptotically less than query time yields an authenticated data structure with the same property. Finally it is shown that the authentication of general queries can be reduced to the authentication of set membership queries. Note that these contributions may be viewed as non-limiting examples of the various exemplary embodiments of the invention described in further detail herein.
[0023] The disclosure is organized as follows. Section 2 provides definitions for the query model, Section 3 describes certification data structures and their properties and Section 4 provides the definitional framework, introducing query authentication schemes and their security requirements. Section 5 introduces the reducibility among authentication schemes and illustrates the primary results. Proofs are included in the Appendix. 2 Preliminaries
[0024] First, an exemplary relation-based data querying model, which is based on the RAM model of computation, is defined.
[0025] Definition 1 (Structured Data Set). A structured data set (or, simply, a data
set) S= (£, -R) consists of: (/) a collection E = {E\,...,Ei) of sets of data elements such that, or I ≤ i ≤ t , set E, is a subset of a universe Ii,, and (if) a collection "R. = [R],... Jtk} of indexed sequences of tuples of data elements such that, for l ≤ i ≤ k, sequence R, = (R1[I],..., R,[mi]) consists of m, distinct p, -tuples from EJt x ... x EJPι , where
1 <y i < ... ≤jPt ≤ t and p, < pfor some integers p and m,. The size n of data set S = (£,
!R) is defined as n
Figure imgf000011_0001
| . Also, it is assumed that t, k andp are constants (with respect to ri).
[0026] This definition shares concepts from the relational data model for databases. A relation, mathematically defined as a subset of the Cartesian product of sets, is typically viewed as a set of tuples of elements of these sets. The model presented uses instead indexed sequences of tuples. Namely, each member R1 of "K is an array of tuples, where each tuple can be indexed by an integer. In this way, very general data organization and algorithmic paradigms are captured. For instance, a graph G = (V, E) may correspond
to data set SQ - (6, K), where 6 = V said Ti consists of a single sequence of indexed pairs
representing relation E (edges in G ). More complex graphs, e.g., with edge directions, weights, costs or associated data elements, can be represented by appropriately including new primitive data-element sets in £ and corresponding sequences in Ε. describing data elements' structure and various relations among them.
[0027] Definition 2 (Querying Model). Let S = (C, 31) be a structured data set.
A query operation Qs on S is a computable function Qs : Q JΛ& where Q is the query space (the set of all possible queries q of specific type that can be issued about S) and JAs is the answer space {the set of all possible answers to queries drawn from Q). The answer of a query q e Q under Qs is Qs(q) ≡ Αs- Λn element a e JAs of the answer space is the correct answer for query q if and only ifQsiq) = a.
[0028] Observe that the above definitions capture general query operations on data sets that are based on relations. As a non-limiting example, there may be a requirement that any query in the query space is mapped to a unique answer in the answer space and that any answer corresponds to some query. For instance, if Sc = (β, ^R) represents a monotone subdivision of the plane into the polygons induced by the vertices and edges of a planar graph G, the point location query operation maps a point in the plane (query) to the unique region of the subdivision (answer) containing it. Regarding the complexity of query answering, one may require that query operation Qs is efficiently computable. Typically, function Qs is evaluated on query q e Q by a query answering algorithm that operates over S through an appropriate for the type of queries in Q query data structure.
[0029] Alternatively but less conveniently, query operation Qs can be defined independently of the data set S, such that the answer to query q is Q(Z?, q). In this case, the query and answer spaces are also independent of S. Unique answers are used without loss of generality. There may be query problems for which Qs is a mapping not a function. That is, there may be situations where more than one answer can exist for a given query. For instance, a path query on a graph, given two vertices asks for any connecting path, if it exists. One can appropriately augment the query space for this type of query to include the index of the answer (according to some fixed ordering) that one wishes to obtain. These are presented as exemplary expansions on the concepts and techniques further described herein. Those of ordinary skill in the art will appreciate such expansions and their various applications.
[0030] Definition 3 (Query Data Structure). A query data structure D(Qs) =
(E Q, -R^, Answer) for query operation Qs : Q JAs on data set S = (£, Η) consists of a
structured data set (SQ, /RQ), such that £ c £g and 1K cz 1RQ and an algorithm Answer, which on input a query q e Q and data set (SQ, RQ) returns QdLq) e JAs in time
polynomial in n and \q\ by accessing and processing tuples in IL
[0031] By Definition 1, for any data set (S, _R) of size n, the total number of relations that exist in 1K (and thus can be possibly accessed by Answer) is O(if) =poly(n).
This implies that the storage size of data set (S, !R) is polynomially related to its size.
[0032] On input query q, algorithm Answer operates over S through the use of
D(QsY by processing relations in RQ, Answer accesses relations in !R, evaluates conditions over elements in 5 and produces the answer. For instance, for a point location algorithm that is based on segment trees and operates on planar subdivision Sc = (S, R),
data set (SQ, RQ) represents a two-level search structure locating points in logarithmic time; here, data set SQ includes information about the regions defined by the edges of graph G.
[0033] A data set S is said to be static if it stays the same over time and dynamic if it evolves over time through update operations performed on S. An update operation Us for S is a function that given an update y e \f, where y is the set of all possible
updates, results in changing one or more data elements in S and accordingly one or more
tuples in R. If S is static (resp. dynamic), data set (Sq, RQ) can be constructed (resp. updated) by some algorithm Constrβ (resp. Update^) that runs on input S (resp. S and y e Λf) in polynomial time in n.
[0034] The presented data querying model achieves generality by combining the expressiveness of relational databases with the power of the RAM computation model. By using index-annotated relations, complex data organizations are easily represented and accessed. For instance, indirect addressing is supported by treating indexes as a distinct data type which is included in S, thus the model strictly contains the pointer machine model.
[0035] The cryptographic primitives used are presented in the Appendix below.
3 Certification Data Structures
[0036] In this section, the decoupling of query answering and answer verification is explored. The notion of answer testability is first defined, formally expressed through a certification data structure. Intuitively, this notion captures the following important property in data querying: query operations on any data set return validated answers, that is, answers that can be tested to be correct given a (minimal) subset of specially selected relations over elements of the data set. In essence, queries are certified to return valid answers; actually this holds in a safe way (i.e., cheating is effectively disallowed).
[0037] Definition 4 (Certification Data Structure). Let D(Qs) = (SQ, RQ,
Answer) be a query data structure for query operation Qs : Q JAs on data set S = (S, 1R) of size n. A certification data structure for S with respect to query data structure
DCQs) is a triplet C(QS) = ((Sc, 1Rc), Certify, Verify), where (Sc, 1Rc), called the certification image of S, is a structured data set and Certify and Verify are algorithms such that:
[0038] Answer tests: On input query q <≡ Q and data sets (SQ, RQ) and (Sc, "Rc), algorithm Certify returns answer a = Qs(q) and an answer test τ, which is a sequence of pairs (i, j), each indexing a tuple Rj[f] of Rc. Answer test τ defines a subset Rc (τ) c Rc, called the certification support of answer a.
[0039] Answer testability: On input query q e Q, data set (Sc, Rc), answer a e JAs and answer test τ, algorithm Verify accesses and processes only relations in Rc(τ) and returns either 0 (rejects) or 1 (accepts).
[0040] Completeness: For all queries q e Q, it holds that Verifyfø, Rc, Certifyfø,
Figure imgf000015_0001
[0041] Soundness: For all queries q e Q, answers a, answer tests τ, when
Verifytø, Kc, a, τ) = 1, a = QsLq).
[0042] Regarding complexity measures for certification data structure C(Qs): (1)
C(Qs) is answer-efficient if the time complexity Tc(n) o/Certify is asymptotically at most the time complexity TA(η) of Answer, i.e., Tc(n) is O(TA(n)); (2) C(Qs) is time-efficient (resp. time super-efficient) if the time complexity Ty(n) o/Verify is asymptotically at most (resp. less than) the time complexity TA(n) o/Αnswer, i.e., 7V(w) is O(TA(n)) (resp. o(TA(n))); and analogously, (3) C(Qs) is space-efficient (resp. space super-efficient) if the space requirement Sc(ri) of (6c, Rc) is asymptotically at most (resp. less than) the space
requirement SQ(H) of (SQ, 1RQ), i.e., Sc(n) is 0(S^n)) (resp. o(S(£n))). IfS is static, data
set (Sc, Rc) con be constructed by some algorithm Constrc that runs on input S in polynomial time in n.
[0043] For simplicity, the above definition corresponds to the static case. The dynamic case can be treated analogously. Informally, an update algorithm Updatec is responsible to handle updates in data set S by accordingly updating C(Qs); that is, it produces the updated set (£'c, R-'c) and, in particular, the set of tuples where "R'c and Rc differ at. Algorithm Updatθc additionally produces an update test (as the answer test above, a set of indices for tuples in Rc) that validates the performed changes. Similarly, an update testing algorithm Updtest, on input an update y e y, set Rc, a set of tuples (changes in Kc) and an update test, accepts if and only if the tuples correspond to the correct, according to y, new or deleted tuples in Kc- Similarly, one can define update efficiency and update-testing (super-) efficiency for C(Qs), with respect to the time complexity of Updatec and Updtest respectively, as they asymptotically compare to Update^.
[0044] Certification data structures introduce a general framework for studying data querying with respect to the answer validation and correctness verification. They support certification of queries in a computational setting where the notions of query answering and answer validation are conceptually and algorithmically separated in a clean way. In particular, answer validation may be based merely on the certification image (Sc, 1Rc) of data set S = (S, R); the two data sets are related by sharing tuples, possibly, through a subset relation. Also, query certification may depend only on the certification support of the answer, i.e., subset R(τ).
[0045] The first result shows that for every query structure there may be an efficient certification structure, that is, a completeness result showing that all queries can be certified without loss of efficiency.
[0046] Lemma 1. Any query data structure for any query operation on any structured data set admits an answer- time-, update-, update-testing- and space-efficient certification data structure. {Proof in the Appendix below.)
[0047] Certification data structures are designed to accompany two-party data query protocols in the straightforward way: party A possesses data sets (SQ, "RQ) and (Sc,
Rc) and runs algorithm Certify and party B possesses data set (Sc, Rc) and runs algorithm
Verify. The underlying dynamic set S is controlled by B by creating update and query operations for S. Although both operations are performed at A, B is able to verify their correctness. Thus, this setting models certified outsourced computation: at any point in time, B maintains a correct certification image of outsourced set S allowing verification without loss of efficiency, by Lemma 1. And although its existential proof is trivial (both parties execute the same algorithms on the same data) yet, its significance is justified by that: (1) in addition to showing that Definition 4 is meaningful, Lemma 1 proves the feasibility of answer testability for any computable query in a general querying and computational model; (2) time super-efficient certification is in general feasible (see Appendix B for some specific examples) so outsourced computations are important, and (3) in the bounded-computational model and using cryptography, certification data structures have important applications to popular and practical models of data querying, namely, authentication and consistency in third party models and space super efficiency of certified outsourced computations in the client-server model (Sections 5 and 6).
[0048] Relation to Certifying Algorithms The presented certification data structures are related to, and inspired by, certified algorithms. Both model the property of answer testability (of a program or an algorithm for a data structure) as distinct from algorithm execution. The main difference, though, is that herein the intrinsic property of a data structure is modeled to illustrate correctness for verification (and authentication, as will be seen) purposes. Certifying algorithms are generally designed to guard against an erroneous implementation of an algorithm. Thus, one can view Definition 4 as an extension of the theory of certifying algorithms for data structures. Indeed, certifying algorithms for data structures use the implementation of a data structure as a black box and add a wrapper data structure to catch errors. Instead, in Definition 4, the data structure is augmented to facilitate the certification process.
4 Authenticated Data Structures
[0049] In this section, a general model is formally described for data authentication in untrusted and adversarial environments by introducing query authentication schemes, cryptographic primitives (algorithms that use cryptography to satisfy certain properties) for the authentication of general queries over collections of structured data. Conceptually, query authentication schemes extend certification structures in that answer validation is not performed in a collaborative setting; instead, the prover may be adversarial and answer verification is now achieved in the bounded computational model. In particular, data authentication is examined in a non-conventional setting, where the creator (e.g. owner) of a data set is not the same entity as the one answering queries about the set and, in particular, the data owner does not control the corresponding data structure that is used to answer a query. In this setting, an intermediate, untrusted party answers the queries about the data set that are issued by an end-user. This model of data querying is formally defined as follows.
[0050] Definition 5 (Three-Party Data Querying Model). A three-party data querying model includes a source S, a responder "R and a user 11, where: (i) source S creates {and owns) a dynamic data set S, which is maintained by query data structure D(Qs) for query operation Qs : Q JAs on S; (U) responder "Rstores S, by maintaining a copy QfD(Qs) and some auxiliary information for S; (Ui) user 11 issues queries about S to responder TL by sending to ILa query q e Q; (iv) on a query q ≡ Q issued by Ti, TL computes answer a = Qs(q) and sends a to 11; (v) on an update y e Y for S issued by the source, S and D(Qs) are appropriately updated by S and TL
[0051] Referring to FIG. 1, a system 100 is shown in which the exemplary embodiments of the invention may be employed. The system 100 includes a Source (S) 102, a User (Ti) 104 and a Responder (TL) 106. As noted above in Definition 5, S 102 creates (and owns) a data set S 108, which is maintained by query data structure D(Qs) 110 for query operation Qs : Q J2Is on S 108. As noted further above, Qs is a query operation on data set S 108, Q is a query space and J2Is is an answer space. Note also the presence of a query authentication scheme QAS(Qs, S) 112 for query operation Qs on S
108 and a certification data structure C(Q5) 114 where C(Q5) = ((£c, 1Rc), Certify, Verify) as defined and explained above. QAS(Q5, 5) 112 is as defined and explained below.
[0052] With further reference to FIG. 1, 31 106 stores S 108, by maintaining a copy of D(Qs) 110 and some auxiliary information aux(S) (not shown) for S 108. TL 106 also maintains C(Qs) 114. 1/ 104 issues queries about S 108 to TL 106 by sending to IL 106 a query (q) 116 where q e Q. On a query q 116 issued by Ti 104, TL 106 computes an answer (a) 118, where a € J2Is and a = Qs(q). IL 106 then sends a 118 to 1U 104. For the dynamic case, on an update (y) 120 for S 108 issued by S 102, where y e Λ/, S 108 and Z)(Q5) 110 are appropriately updated by S 102 and .R 106.
[0053] In other embodiments, the system 100 may comprise a static system or static data set, where the system or data set stays the same over time (e.g. does not evolve over time through update operations). In further embodiments, the system 100 may comprise a confirmation response (c) 122 that is sent to 5 102 in response to TL 106 receivingy 120. In other embodiments, c 122 may be sent to 5102 in response to TL 106 successfully performing,)/ 120. In further embodiments, D(Qs) 110 may be coupled to QAS(QΛ S) 112, QAS(Q5, S) 112 may be coupled to C(Qs) 114, C(QS) 114 may be coupled to D(Q5) 1 10, or D(Q5) 110, QAS(Q5, S) 112 and C(Qs) 114 may all be coupled together (e.g. to each other). In other embodiments, y 120 may comprise an update to C(Qs) 114. In further embodiments, y 120 may comprise an update to QAS(Qs, S) 112. In further embodiments, one or more of aux(S), S lOS, D(Qs) 110, QAS(QΛ S) 112 and/or C(Qs) 114 may be stored (e.g. located) at a location external to 1R, 106. In such a case, Ε, 106 may act as an intermediary (e.g., controller) for access to or use of such external components by Ii 104.
[0054] In other embodiments, the system 100 may comprise a network over which S 102, Ii 104 and/or JR.106 can communicate. As a non-limiting example, such a network may comprise the internet. As a non-limiting example, 1/ 104 may comprise a client enabled to communicate with R. 106 over a network. As a further non-limiting example, 11 104 may comprise a client enabled to communicate with R. 106 over the internet. As a non-limiting example, Ii 104 may comprise an electronic device. Such an electronic device may comprise at least one data processor, at least one memory, a transceiver, and a user interface comprising a user input and a display device. One or more of S 102, Ii 104 and/or R. 106 include one or more components capable of implementing the exemplary embodiments of the invention. As a non-limiting example, an encryption component may be employed. As further non-limiting examples, the encryption component may be a separate entity (e.g. an integrated circuit, an Application Specific Integrated Circuit or ASIC) or may be integrated with other components (e.g. a program run by a data processor, functionality enabled by a data processor).
[0055] The model achieves generality and has many practical applications.
Regarding data authentication, it is desirable that the user can verify the validity of the answer given to him by the responder. For this verification process, it is desirable that the responder, along with the answer, gives to the user a proof that can be used in the verification. To capture this verification feature, the notion of a query authentication scheme is defined as follows.
[0056] Definition 6 (Query Authentication Scheme). A query authentication schemeyόr query operation Qs : Q J2I1S on structured data set S is a quadruple of PPT algorithms (KeyG, Auth, Res, Ver) such that: [0057] Key generation The key generation algorithm KeyG takes as input a security parameter \κ, and outputs a key pair (PK, SK). Note (PK, SK) <r KeyG(l*).
[0058] Authenticator The authenticator algorithm Auth takes as input the secret and public key (SK, PK), the query space Q (or an encoding of the query type) and data set S of size n and outputs an authentication string a and a verification structure V, that is (a, V) <r
Figure imgf000020_0001
PK, Q, S), where a, V e {0, 1}*.
[0059] Responder The responder algorithm Res takes as input a query q e Q, a data set S of size n and a verification structure V e {0, 1 } * and outputs an answer-proof pair (a, p) 4- Resfø, S, V), where a e JAs and p e {0, 1 }*.
[0060] Verifier The verifier algorithm Ver takes as input the public key PK, a query q e Q, an answer-proof pair (a,p) e SAs x {0, 1}* and an authentication string a G {0, 1}* and either accepts the input, returns 1, or rejects, returns 0, that is, {0, \ } <- Ver(PK, q, (a,p), ά).
[0061] Updates For the dynamic case, the existence of an update algorithm Auth^ is additional required such that it complements algorithm Auth and handles updates. In particular, Authf/ given update y e Y, it updates the authentication string and the verification structure: (a1, V) <- Auth^ (SK, PK, Q, S,y, a, V).
[0062] The first consideration for a query authentication scheme is correctness.
Intuitively, it is desirable that the verification algorithm accept answer-proof pairs generated by the responder algorithm and these answers to generally always be correct. There is also a security consideration for any query authentication scheme. Starting from the basis that in the three-party data querying model, the user 1/ trusts the data source S but not the responder .R, it is the responder that can act adversarially. First assume that 1R. always participates in the three-party protocol, i.e., it communicates with 5and 11, as the protocol dictates. Thus, do not consider denial-of-service attacks; they do not form an authentication attack but rather a data communication threat. However, 1R can adversarially try to cheat, by not providing the correct answer to a query and forging a false proof for this answer. Accordingly, the security consideration is that given any query issued by Ii, no polynomially computationally bounded 1K can reply with a pair of answer and an associated proof, such that both the answer is not correct and Zl verifies the authenticity of the answer and, thus, accepts it. The above considerations are expressed as the following two conditions for query authentication structure.
[0063] Definition 7 (Correctness). A query authentication scheme (KeyG, Auth,
Res, Ver) is said to be correct if for all queries q e Q, if (a, V) <- Δu\h(SK, PK, Q, S) and additionally (a, p) 4- Resfa, S, V), then with overwhelming probability it holds that 1 <r WeT(PK, q, (a, p), a) and Q<£q) = a.
[0064] Definition 8 (Security). A query authentication scheme (KeyG, Auth, Res,
Vet) for query operation Qs ■ Q -Rs on structured data set S is said to be secure, if no probabilistic polynomial-time adversary A, given any query q e Q, the public key P K and oracle access to the authenticator algorithm Auth, can output an authentication string a, an answer a' and a proof p', such that a' is an incorrect answer that passes the verification test, that is, a' ≠ Qsiq) and 1 <- MeX(PK, q, (a^p^, a). (See also Appendix below.)
[0065] Definition 9 (Authenticated Data Structure). An authenticated data structure for queries in query space Q on a data set S is a correct and secure query authentication scheme (KeyG, Auth, Res, Ver), or, as it is implied, a scheme where, given an authentication string a, for algorithm Ver it holds that, for all queries q e Q, with all but negligible probability (measured over the probability space of the responder algorithm): Qs(q) = a if and only if there exists p such that 1 4- Ver(PK, q, (a,p), a).
5 Authentication Reductions and General Authentication Results
[0066] The definitional framework of the previous sections will now be used in describing and proving the main results of this invention. The road map is as follows. First the notion of reducibility in data authentication is introduced, namely by defining reductions between query authentication schemes. It is then proven, using the disclosed framework of certification data structures, that the authentication of any query in the presented model is reduced to the authentication of set membership queries. In fact, one need authenticate only positive answers — that is, relation e and not (S needs to be authenticated. Implications of this result are then presented, in terms of concrete constructions. Using certification structures, a general methodology is provided for constructing correct and secure query authentication schemes and it is shown that any search structure for any query type in the presented querying model can be transformed into an authenticated data structure. Also, based on super-efficient query certification, a new approach is developed for data authentication, where only the information necessary for the answer verification is authenticated, and not the entire information used by search algorithm, which leads to a powerful framework for the design of authentication structures with super-efficient verification.
[0067] Let QAS(Qs, S) denote a query authentication scheme (or QAS) for query operation Qs and data set S. Intuitively, authenticated reductions among QASs allow the design of a QAS using no other cryptographic tools but what another QAS provides and in a way that preserves correctness and security.
[0068] Definition 10 (Reductions of Query Authentication Schemes). Let S and S' be data sets, Qs : Q JAs, Q's ■ Q' -Α's be query operations on S and S' respectively, and QAS(Qs, S) = (KeyG, Auth, Res, Ver), QAS(Q'S, S1) = (KeyG1, Aiith1, Res', Ver1) be query authentication schemes for Qs on S and Q 's on S' respectively. One has it that QAS(Qs, S) is authenticated reduced to QAS(Q 's, 5"), if key generation algorithms KeyG and KeyG1 are identical, QAS(Qs, S) uses the public and secret keys generated by KeyG never explicitly, but only implicitly through black-box invocations of algorithms Auth', Res' and Ver1, and QAS(Qs, S) is correct and secure whenever QAS(Q' s, 5") is correct and secure.
[0069] A general query authentication scheme Let S= (E, 1R) be a structured
data set and Qs : Q JAs be a query operation. Let D(Qs) = (SQ, RQ, Answer) be a query data structure for Qs. By Lemma 1, one knows that there exists a certification data structure C(QS) = ((8C, 1Rc), Certify, Verify) for 5 with respect to Qs. Let Qe : Q(1Rc)
{yes, no} be the set membership query operation, where the query space Q(0Rc) is the indexed tuples that exist in 1Rc- Assuming the existence of a secure and correct QAS(Qe, Rc) = (KeyG1, Auth1, Res', Ver1), next construct QAS(Qs, S) = (KeyG, Auth, Res, Ver), a query authentication scheme for Qs and 5", parameterized by QAS(Qe, Rc) for set membership queries.
[0070] (A) Key-generation algorithm By definition it is the same as KeyG1, thus,
SK= SK' and PK = PK'.
[0071] (B) Authenticator The authenticator algorithm Auth using S and Constrc computes the structured data set 5c = (Sc, ^Kc) of the corresponding certification structure
C(Qs) = ((Sc, "Rc), Certify, Verify). Then Auth runs algorithm Auth1 on input SK', PK', Qs and Rc. That is, algorithm Auth computes the pair (a', V) <- Auth'(5Λ:', PK', Qe, Rc), and then Auth outputs (a', Y).
[0072] (C) Responder The responder algorithm Res first computes the structured data sets SQ = (£Q, RQ) and Sc = (Sc, Rc) using S and algorithms Constre and Constrc.
Then, on input q, SQ and Sc it simply runs algorithm Certify to produce its pair (a, τ). Then Res constructs the certification support 1Rc (τ) of answer a by accessing set Rc with the use of indices in τ. For every tuple < t > in !Kc, algorithm Res runs the responder algorithm Res' on inputs < t >, 9ΪC and V to get (a'(t), p'(t)) <r Res'(< / >, Rc, V) and, if (I1 I,..., /tø) is the sequence of tuples accessed in total, Res creates sequence/?' = (p'(t\),...,
Figure imgf000023_0001
(r)» P") and finally outputs (a, p).
[0073] (D) Verifier The verifier algorithm Ver first checks if the proof/? and answer a are both well-formed and, if not, it rejects. Otherwise, by appropriately processing the proof/?, algorithm Ver runs algorithm Verify on inputs q, a, Rdj) and τ. Whenever algorithm Veήfy needs to access and process a tuple < tt >, where < t,- > is the i-th tuple accessed by Verify, algorithm Ver runs algorithm Ver1 on inputs PK', < t, >, (yes, p'(t,)) and a ' and if 0 «- Ver(PK', < t, >, (yes, p'(t,)), a *), algorithm Ver rejects. Otherwise, Ver continues with the computation. Finally, Ver accepts if and only if Verify accepts, i.e., if and only if 1 4- Verifyfø, -RcOO, a, τ), i.e., if Verify accepts. [0074] Thus QAS(Qs, S) has been constructed, where Qs is a general query operation of set S, parameterized by QAS(Qe, 1Rc), where Qe is the set membership query operation and Rc is the certification image of S with respect to the certification data structure in use. One can show the following results (see Appendix for proofs).
[0075] Theorem 1. Let QAS(Qe, Rc) be any query authentication structure for set membership queries and QAS(Qs, S) be the query authentication scheme constructed above. For any query operation Qs and any data set S, QAS(Qs, S) is correct and secure if and only if QAS(Q e, Rc) is correct and secure.
[0076] Theorem 2. For any query operation Qs on any structured data set S, there exists a secure and correct query authentication structure QAS(Qs, S) based on a certification data structure C(Qs). Moreover, QAS(Qe, Rc) is authenticated reduced to any secure and correct query authentication structure QAS(Qe, Rc) for the set membership query operation Qe on some certification image Rc of C(Qs).
[0077] The implications of Theorems 1 and Theorem 2 are now shown in terms of time and space complexity. First, define the cost measures that are of interest in a query authentication scheme QAS(Qs, S) - (KeyG, Auth, Res, Ver) for a set of size n. Let Ta(n), T^n), Tv(n) denote the time complexity of algorithms Auth, Res and Ver respectively, Sa(n), SXn) denote the space complexity of Auth, Res. Also for QAS(Q^, Rc) = (KeyG1, Auth1, Res', Ver'), let T'a(n), T'r(n), T'v(n) denote the time complexity of algorithms Auth', Res' and Ver1 respectively, S'a(n), S1V(O denote the space complexity of
Auth', Res'. Recall from Section 3 that for certification data structure C(Qs) = ((Sc, Rc),
Certify, Verify), TA(n), Tap), Ty(n), S"Q(«) and Sc(n) denote various time and space complexity measures. Also let p(ή) denote the proof size in QAS(Qs, S) and p'(n) the proof size in QAS(Qe, Rc)- One has the following.
[0078] Lemma 2. Let S be a structured data set and C(Qs) = ((6C, Rc), Certify,
Verify) be a certification data structure for S. Let n be the size of S and let m(ri) = \Rc | denote the size of the certification image and s(n) = \Rc(τ)\ the size of the certification support of an answer.
[0079] For any query operation Qs, the query authentication scheme QA S(Qs, S)
= (KeyG, Auth, Res, Ver) that is based on query authentication scheme QAS(Qe, Εc) =
(KeyG1, Auth1, Res', Ver1) and uses certification data structure C(Qs) has the following performance.
[0080] 1. With respect to time complexity, Ta(n) = O(T'a(n)), Tr(ri) = O(s(ri)T'J(ri)
+ Tdn)), Un) = O(s(n)T'Xn) + T^n));
[0081] 2. With respect to space complexity, Sa(n) = O(S'a(ri) + n + m(n)), SAn) =
0(S'r(n) + So(.n) + m(n));
[0082] 3. With respect to the proof size, p(n) = O(s(n)p '(«)).
[0083] The above lemma can be used to have general complexity results in terms of the presented parameterized query authentication scheme QAS(Qs, S). By appropriately choosing known (secure and correct) constructions for authenticating set membership queries one can achieve trade-offs on the efficiency of general query authentication schemes. Here, asymptotic analysis is of interest, omitting improvements of constant factors. So, only the related costs with respect to the set size n are studied and not the exact implementation of the cryptographic primitives.
[0084] Theorem 3. Assume the settings of Lemma 2. For any query operation Qs, the query authentication scheme QAS(Qs, S) = (KeyG, Auth, Res, Ver) that is based on query authentication scheme QAS(Qe, "Rc) = (KeyG1, Auth', Res', Ver1) and uses certification data structure C(Qs) has the following performance.
[0085] Static Case Using only signatures, one has the following performance: with respect to time complexity, Ta(ri) is O(m(n)), T^ri) is O(s(n)+Tc(n)), Tv(n) is O(s(n)+Ty(ri)); with respect to space complexity, 5a(/i) is O(n + m(n)), Sr(n) is O(S^(n) + m(n)); with respect to the proof size, p(n) is O(s(n)).
[0086] Dynamic Case Using signature amortization, one has the following performance: [0087] Hash Tree With respect to time complexity, TJn) is OQnQi)), TrQi) is
O(sQi) log n + TdQi)), TjQi) is O{TyQi) + sQi) log n); with respect to space complexity, SJn) is OQi + mQi)), SrQi) is OQt + S^n) + mQi)); with respect to the proof size, p{n) is O(sQi) log n); when k tuples are updated, these can be handled in O{k log n) time.
[0088] Dynamic Accumulator With respect to time complexity, TJn) is OQnQi)),
TrQi) is O{sQi) 4n + Tdn)), Tv(n) is OQsQi) + Ty(n)); with respect to space complexity, SJn) is O{n + mQi)), SrQi) is OQi + -?<?(«) + mQi)); with respect to the proof size, pQi) is sQi); when k tuples are updated, these can be handled in O(ky/n ) time.
[0089] By Theorem 2, all query operations can be authenticated in the three-party authentication model. Theorem 3 gives a detailed complexity analysis of the authenticated data structures derived by the corresponding query authentication schemes. Still, the above description depends on the complexity of the certification data structure used. These results hold for the RAM model of computation, which strictly includes the pointer machine model. By Lemma 1, all query problems that have a query data structure have a certification data structure and thus these results generalize and improve previous known possibility results. Additionally, this framework provides insights for super- efficient verification, as described in the following meta-theorem.
[0090] Theorem 4. Let S be a structured data set and Qs be a query operation on
S. If there exists a time {space) super-efficient certification data structure for Qs, then there exists a time {space) super-efficient authenticated data structure for Qs.
6 Further Applications of This Framework
[0091] Three additional applications of this framework are discussed below.
[0092] It has been shown how certification data structures support verification of outsourced computations for data querying in an information-theoretic setting. In particular, a data set S owned by a party A can be maintained by another party B and operations on S decided by A can be certified and executed by B and then can be verified by A using the certification image (£c. Hc) of S which is stored by A.
[0093] In the bounded computational model, one can also achieve storage outsourcing, where not only the computations for data querying over set S are outsourced to an (untrusted) entity, but also the certification image (Sc, Rc) of S is entirely outsourced to the same entity. For example, consider a certification data structure, where party A (source of data) runs algorithm Verify over certification image (Sc, 1Rc) for verifying the update and query operations on data set S that is maintained by party B (outsourcer) who runs algorithm Certify for certifying each operation, hi this setting and based on the presented framework of certification data structures, it is possible to produce an authentication method for any type of data query, where party A does not explicitly store the certification image of 5". Indeed, the idea is to have B storing only a cryptographic commitment of (6c, Rc) and still be able to verify its integrity throughout a series of updates on an initially empty set S. This property is referred to as consistency: data source A checks that any update on S", and thus on (Sc, Rc), is in accordance with the history of previous updates.
[0094] The idea is to use cryptographic primitives (e.g., hashing) that provide commitments of sets, subject to which membership can be securely and efficiently checked. For instance, the source A can use either a one-way accumulator or a Merkle hash tree for creating and locally storing a secure digest (or a small collection of digests) of set (Sc, Rc) which is now explicitly stored at party B. Algorithms Certify and Verify are appropriately extended to reflect this change. First, the answer test of any query on S now contains not only the indices of the relations needed for the answer-verification, but also the relations themselves as well as corresponding set-membership proofs that the each such relation belongs in set (Sc, Rc) (these proofs are subject to the digest(s) stored by
A). Second, the answer verification at party A now looks for each relation needed to process in the answer test (instead of looking for it in its local memory) and before operating on each relation it performs a check to verify the corresponding set- membership proof against one of the digests that are locally stored.
[0095] That is, a novel general authentication method is presented for completely outsourcing a dynamic data set S to an untrusted server, ordering the server to execute any series of update or query operations on S, where queries can be of any type (for which there exists an efficiently answering procedure), and then being able to verify the results of any operation, where the data owner needs only to store a short digest of set S. One, thus, obtains the following result, which finds applications in the popular and practical client-server data outsourcing model, where a client (e.g., a small computational device, A) uses space super-efficient protocols to check its data that resides at a remote and untrusted server (B) (see Appendix E).
[0096] Theorem 5. In the bounded computational model, any certification data structure can be transformed to a secure, consistent, space-optimal data outsourced scheme in the client-server communication model.
[0097] The above result is of independent interest but can also be applied to third- party data authentication. In Section 4, the data source stores the certification image, whereas the responder stores the data set and the certification image (parties A and B in the previous discussion). One can see that, using a secure data outsourced scheme, the data source can outsource the certification image to the responder and still be able to check that the data set is correctly maintained and that space-optimality is achieved at the data source. Thus, one has a general authentication method, where (1) a data source can completely outsource a dynamic data set S to an untrusted server (responder), (2) by storing only a digest of S, the source can efficiently verify that any update operation on S is performed correctly by the server and (3) any user can query S — for any type of efficiently answerable query — and verify that the answer returned by the responder is authentic.
[0098] Additionally, for example by using the distributed Merkle tree construction over peer-to-peer networks (e.g., it realizes a distributed authenticated dictionary that is a secure distributed query authentication scheme for membership queries) and the results of the previous section, one has that all queries can be authenticated even when the responder is a distributed peer-to-peer network. That is, one obtains a general authentication method, where (1) a data source can completely outsource a dynamic data set S to an untrusted peer-to-peer network (more generally, a distributed responder) that supports the basic put — get functionality over data objects, (2) by storing only a digest of S1, the source can efficiently verify that any update operation on S is performed correctly by the peer-to-peer network and (3) any user can query S — for any type of efficiently answerable query — and verify that the answer returned by the untrusted peer-to-peer network is authentic.
[0099] Theorem 6. For any query operation on structured data sets there exists a space-optimal (at the data source side) and distributed (at the responder side) authenticated data structure.
[00100] It is noted that general zero-knowledge proofs results can in general apply to the presented framework.
Appendix
A Definitions of Cryptographic Primitives
[00101] Cryptographic Primitives Some cryptographic primitives are now considered that are useful for this exposition. A definition is given of a signature scheme which has become the standard definition of security for signature schemes. Schemes that satisfy it are also known as signature schemes secure against adaptive chosen-message attack. A function v : N R is negligible if for every positive polynomial /»(•) and for
1 sufficiently large k, v(k) < p(k)
[00102] Definition 11 (Signature scheme). The triple of PPT algorithms (G( ),
Sigri(.)(-), Verify(.)(-, )), where G is the key generation algorithm, Sign is the signature algorithm, and Verify the verification algorithm, constitute a digital signature scheme for a family (indexed by the public key PK) of message spaces JWQ if the following two properties hold:
[00103] Correctness If a message m is in the message space for a given public key
PK, and SK is the corresponding secret key, then the output
Figure imgf000029_0001
will always be accepted by the verification algorithm
Figure imgf000029_0002
More formally , for all values m and k: Pr[(PK, SK) <r G(lk); σ <r Sign(m) : m <- MPK A -VeύfyPK(m, σ)] = 0.
[00104] Security Even if an adversary has oracle access to the signing algorithm that provides signatures on messages of the adversary 's choice, the adversary cannot create a valid signature on a message not explicitly queried. More formally, for all families of probabilistic polynomial-time oracle Turing machines {Ap}, there exists a negligible fiinction v(k) such that Pτ[(PK, SK) <- G(I*); (Q, m, σ) <r Ak Sisnsκi ) (1*) :
Figure imgf000030_0001
σ) = 1 Λ -<3σ' | (m, σ') e Q)] = v(k).
[00105] A cryptographic hash function h operates on a variable-length message M producing a fixed-length hash value h(M). Moreover, the desired security results are achieved by means of collision-resistance, an additional security property for hash function h. A cryptographic hash function h is called collision-resistant if (/) it takes as input a string of arbitrary length and outputs a short string; and (//) it is infeasible to find two different strings x ≠y that hash to the same value, i.e., form a collision h(x) = h(y). For completeness, a standard definition of a family of collision-resistant hash functions is given.
[00106] Definition 12 (Collision-resistant Hash Function). Let J-C be a probabilistic polynomial-time algorithm that, on input \k, outputs an algorithm h : {0,
1 }* I {0, 1 }*. Then J-C defines a family of collision resistant hash functions if.
[00107] Efficiency For all h e 3<(\k),for allx e {0, 1 } *, it takes polynomial time in k + |x| to compute h(x).
[00108] Collision-resistance For all families of probabilistic polynomial-time
Turing machines {Ak}, there exists a negligible function v(k) such that Pr[Λ <- SH(\k); (xu X2) <- A1^h) : x} ≠ x2 Λ Kx1) = /J(X2)] = v(*).
[00109] A cryptographic collision-resistant hash function h will be used to compute the digest of a structured data set S = (S, .ft). For this, assume some fixed, well-
defined binary representation for any data element e in £, so that h can operate on e. That is, over-notate h to operate on data elements. Also, assume that rules have been defined so that h can operate over any finite sequence of elements. That is, essentially, further over-notate h to also denote a multi-variate hash function. In particular, h( e , e^ ,..., e,t )is used to represent a hash value computed from elements e , e(j e,t , what is called a digest of these elements using hash function h(-). For now, leave the exact definition of the multivariate extension of λ unspecified. For instance, A( e(( , eti ,..., elk ) may denote that
Λ(-) operates on the concatenation of some well-defined and fixed-size binary representation of elements eil eh ,e,2 ,..., e,t or on the concatenation of the individual hashes using Λ( ) of some well-defined binary representation of elements eh , eh e,t . In both of these examples, h essentially operates on one binary string σ, where the cost of the operation of h on σ is at least proportional to the its length \σ\.
[00110] The following cryptographic primitive is based on the Merkle hash tree.
[00111] Definition 13 (Hash Tree). For a set ofn elements a hash tree is a binary tree, where each node stores a hash value computed using a collision-resistant hash function. At leaf nodes the hash of the corresponding element is stored; at internal nodes the hash of the concatenation of the hash values of the children nodes.
[00112] Finally, dynamic accumulators are reviewed. A standard definition is used.
[00113] Definition 14 ((One-way) Dynamic Accumulator). An accumulator for a family of inputs {Xk} is a family of families of functions Q = {JF*} with the following properties.
[00114] Efficient Generation There is an efficient algorithm Gen that on input 1 κ generates a random element f of J7^ an auxiliary information awe/ and a trapdoor information trd/. Both aux/and trd/have sizes that are linear in k.
[00115] Efficient Evaluation Function f is a computable function f : JAf χX*, where JA/, an efficiently samplable set of accumulation values, and Xk, the proposed set of elements to be accumulated, constitute the input domain off. Function f is polynomial- time computable given the auxiliary information aux/.
[00116] Quasi-Commutativity For allfe. fk, a e JAf and x\, x2 e Xk, it holds that Ma, X1), x2) =fif[a, X2), xi).
[00117] Witnesses Let a e JAf and x e Xk. A value w e JAf is called a witness for x in a, under f if a =βyv, x).
[00118] Updates Let X a Xk, x e X, a0, ax, w e SAf, such that flaQ JC) =βw, x) = θχ. Let OP = {insert, delete} be the set of update operations on set X, such that insert^ x ) =J u { 3c }, x e Xk ~X and delete{x ) =X— { 3c }, x e X. A one-way accumulator is dynamic if there exist efficient algorithms Uop, Wop, op e OP, such that:
[00119] - Uopifrpf, ax, 3c ) = a ^ e SAf such that a ^ =flflo, op(x)), that is a p =
axκjχ or aχ = ax-; >
[00120] — Wop(f, auxf, ax, a ^ , x, x ) = w' e J2U such that a j is as above and a ^ = ft*', x)-
[00121] Security An accumulator is one-way (secure) if the following holds true.
Let SA/ x X\ denote the domains for which the computational procedure for function J 'e J7Ic is defined. That is, in principle, SAf z> SAf and X\ ^ Xk- For all probabilistic polynomial-time adversaries Advk
[00122] Pτ\f*- Gen(lk); a0 <r SA/, (x, w, X) *r Advtf, auxβ SAf, a0) : X a Xk; w e Λfi' x e X\\ x zX,A™, x) =Aao, X)] = v(k).
B Time Super-Efficient Certification Data Structures
[00123] In this section, examples of time super-efficient certification data structures are described, further justifying the importance of the notion of answer testability. For time super-efficient certification structures, although the certification image may be as large as the query structure, the certification support of the answer to any query has size asymptotically less than the "searching trail" of the query answering algorithm Answer. In this case, a super-efficient certification data structure exploits this gap in certifying queries.
[00124] For example, a simple case is the dictionary problem, where S=(E, 1K) is
an ordered key- value set of size n: E is a set Koϊn key elements with a totally ordering and a set Vofn values, and 1R. consists of two indexed relations, the key- value relation RKV and the successor relation Rs over keys. The query operation Qs has as a query space the universe that key elements are drawn from and as an answer space the set of all possible key- value pairs; to any query (key) q, Qs maps the answer (key- value pair) (k, v) if q = k, q s K and (k, v) e RKV (V is the value of A), or the answer -*- (denoting negative membership answer) if no such condition is satisfied. Consider any search tree that implements the dictionary query data structure. Then the set (SQ, 1RQ) of a query data
structure is an augmentation of (S, K), for instance SQ now includes tree nodes and pointers, or 1RQ now includes the node-data and parent-child relations. There exists a time super-efficient (and space-efficient) certification data structure for the dictionary problem. Set (Sc, Kc) is simply (S, K). On input of a query q, algorithm Certify returns as an answer test the indices in Rc of two tuples: if q e K, the indices of tuple < q, suc(q) > of the successor relation Rs and of tuple < q, v > of the key-value relation RKV are returned, otherwise, the indices of tuples < x, suc(x) >, <y, suc(y) > e Rs, such that x is the maximum element and y is the minimum element satisfying x < q < y, according to the total order of K. Algorithm Verify, accesses these tuples and accepts or rejects accordingly. For instance, if a = -^- and the indices of two tuples < x, suc(x) >, <y, suc(y) > of the successor relation are in answer test r, then it accepts if x < q <y and suc(x) =y; Verify rejects in all other cases.
[00125] It is easy to see that the soundness and completeness conditions hold. Note that the completeness property depends on both the answer testing algorithm Verify and on the certification image (Sc, Kc). For instance, although a different (than the successor) relation could satisfy the soundness property, this choice may not be satisfying completeness. For instance, the "odd-rank-difference" relation (two keys have ranks in the sorted set S with odd difference), which includes the successor relation, satisfies only the soundness condition. Note that Ty(ri) — 0(1) although TA(TJ) = O(log «); also Sc£n) = O(5Q(«)) = O(ri). The dynamic extension of this certification data structure is straightforward. Further note that the successor relation can be used to support in a very similar way a time super-efficient certification data structure for one-dimensional range searching.
[00126] Also, consider the point location problem, where one asks to find the region of a planar subdivision of size n containing a given query point. Using existing efficient point-location algorithms, point location queries can be answered in time <9(log ri). A time super-efficient certification data structure stores the trapezoidal decomposition of the subdivision. Each trapezoid is expressed as a tuple of five data elements: two vertices (defining the top and bottom sides), two edges (defining the left and right sides), and a region (containing the trapezoid). The answer test is the index of the trapezoid containing the query point, which can be computed by a simple modification of the query algorithm. The inclusion of the answer point in the answer test trapezoid is tested in O( 1 ) time. That is, again, Ty(n) = 0(1) although Tj(n) = O(log n). This certification data structure also has a dynamic extension. Additional examples include data structures for other geometric problems (e.g., convex hull) and also index structures for queries on relational databases, where, for instance, the correctness of the results of complex SQL type of queries ("Select (), FROM (), WHERE ()") seems to be verifiable independently of the searching through multi-dimensional tree-like index structures (thus, with time complexity that is better by at least a logarithmic factor).
C Formal Security Definition for Query Authentication Schemes
[00127] Definition 15 (Security - Formal version of Definition 8). Let (KeyG,
Auth, Res, Ver) be a query authentication scheme for query operation Qs : Q JAs on structured data set S. One says that query authentication scheme (KeyG, Auth, Res, Ver) is secure if no probabilistic polynomial-time adversary A can win non-negligibly often in the following game:
[00128] 1. A key pair is generated: (PK, SK) <r KeyG(l*).
[00129] 2. The adversary A is given:
[00130] - The public key PK as input.
[00131] — Oracle access to the authenticator, i.e., for 1 < / <poly(k), where polyif) is a polynomial, the adversary can specify a structured data set S1 of size n and obtain
(fiin V,) 4- AuIh(SK, PK, Q, Si). However, the adversary cannot issue more than one query with the data set Sh That is, for all i ≠j, Sj ≠ S1.
[00132] -A query q e Q.
[00133] 3. At the end, A outputs an authentication string a, an answer a' and a proof p.
[00134] The adversary wins the game if the following violation occurs:
[00135] Violation of the security property. The adversary did manage to construct an authentication string a in such a way, that given a query q e Q, the adversary outputs an incorrect answer-proof pair {a', p1) that passes the verification test. Namely, the adversary wins if one of the following hold:
[00136] — The authenticator was never queried with S and yet the verification algorithm does not reject, i.e., 1 <- Ver(PAT, q, (Λ', /71), a).
[00137] — The authenticator was queried with S and yet a' ≠ Qs(q) and the verification algorithm accepts, i.e., 1 <- VQX(PK, q, (α'.p1), a).
D Proofs
[00138] Proof of Lemma 1
[00139] Proof. First the static case is discussed. Let S — (S, R) be a structured data
set of size n and Qs : Q JΛς be a query operation on S. Let D(Qs) = (SQ, "Rq, Answer)
be a query data structure for Qs- A certification data structure C(Qs) = ((Sc, 1Rc), Certify, Verify) is now described for S with' respect to query data structure D(Qs). First set (Sc, 1Rc) = (£Q, "Rq). Algorithm Certify is an augmented version of Answer. Given a query
q e Qanά sets (Sc, Rc), (SQ, RQ), Certify creates an empty sequence τ of indices of tuples
in Rc and then it runs Answer on input (q, (SQ, RQ)) to produce the answer Qs(q)- Also, when algorithm Answer accesses a tuple /?,•[/] in RQ, algorithm Certify adds (i,j) to the end of sequence τ. When Answer terminates, so does Certify, and returns the output a = Q^q) produced by Answer and sequence τ as the corresponding answer test.
[00140] Algorithm Verify is defined as an augmentation of Answer operating as follows. On input a query q e Q, set (Ec, 1Rc), an answer a and a sequence τ, algorithm
Verify starts executing algorithm Answer on input (q, (EQ, RQ)) and checks the execution
of Answer subject to sequence τ. That is, each time Answer retrieves a tuple R1I/] in (EQ,
"RQ), Verify removes the first element of r and compares it to (i,J), rejecting the input if the comparison fails. When Answer terminates, the answer computed by Answer is compared with the answer provided as input: if the two answers agree (are equal) then Verify accepts its input, otherwise it rejects.
[00141] It is now shown that the soundness and completeness conditions are satisfied. Soundness is easily seen to hold, since the tuple-access trail of the same — correctly implementing query operation Qs- algorithm Answer on executions of the same input is tested by algorithm Verify. Thus, algorithm Certify reports an answer that is correct for its input query. Also it reports an answer test that is not rejected by algorithm Verify. With respect to completeness, one can easily see that this requirement also holds: when algorithm Verify accepts on input (q, Rc, a, ), then it is always the case that a = Qsfø). Indeed, when operating on the valid data set and on input q, algorithm Answer returns the unique, correct answer for q. Finally, it is also easy to see that the certification data structure is answer-, time- and space-efficient. This follows from the fact that for inputs, Certify and Verify do a total amount of work that is only by a constant factor more than the work of Answer, thus Tdri) = O(TA(n)) and Ty(n) = O(TA(n)), and (Ec, Rc) = (EQ, RQ), thus Sc(ri) = O(SQ(Π)). Observe that each pair (/, J) in the answer test τ is accessed in constant time.
[00142] The dynamic case is treated analogously. Instead of the query answering algorithm Answer, augment the update algorithm Update^ of the query data structure to define the update and the update testing algorithms, Updatec and Updtest respectively, of certification data structure C(Qi). The soundness, completeness and complexity properties hold in a similar way as in the static case.
[00143] Proof Sketch of Theorem 1. First the correctness property is discussed.
Suppose that query authentication scheme (KeyG1, Auth', Res', Ver1) is correct. One wants to show that (KeyG, Auth, Res, Ver) is correct. This easily follows from checking that the verifier Ver does not reject when given an answer-proof pair from the responder Res, for a query issued in Q. Indeed, from the completeness property of the certification data structure, the answer testing algorithm Verify does not reject, and additionally the correctness of (KeyG', Auth1, Res', Ver1) guarantee that Ver does not reject because of a rejection by Ver1.
[00144] For the security one argues as follows. Suppose that (KeyG', Auth', Res',
Ver1) is secure. Assume that (KeyG, Auth, Res, Ver) is not secure, then with overwhelming probability responder Res responds to a query q e Q incorrectly but still the verifier Ver fails to reject its input. Based on the soundness property of the certification data structure in use, one must admit that it is not algorithm Certify that cheats the verifier, that is, it is not the indices in sequence τ that cause the problem, but rather the fact that algorithm Verily runs on incorrect data. Then there must be at least one tuple in Hc that although it was verified to be a member of CRc it is not authentic, meaning that its index is correct but one or more of the data elements in the tuple have been (maliciously) altered. One thus concludes that for at least one query, the verification algorithm Ver1 of query authentication scheme (KeyG', Auth', Res', Ver1) failed to reject on an invalid query-answer pair. This is a contradiction, since this scheme is assumed to be secure.
[00145] Proof Sketch of Theorem 2. The result follows by the construction
QAS(Qs, S) and the fact that there exist secure query authentication schemes QAS(Qe, ) for membership queries on any data set: in particular, digital signatures, Merkle's hash tree and one-way accumulators provide a correct and secure implementation of QAS(Qe, )•
[00146] Proof Sketch of Lemma 2. It follows directly by the construction of
QAS(Qs, S) and the use of QAS(Qe, Kc) and C(Q5) = (£c, 'Rc, Certify, Verify).
[00147] Proof Sketch of Theorem 3. For the static case, simply the use of signatures provides a satisfactory time-space trade-off. That is, every indexed tuple in the certification image Rc is signed. QAS(Qe, Rc) in this case is very simple: Auth signs all tuples in Rc and sets a to be all these signatures with V = -1-; Res, along with the (positive) answer to an e query, returns the corresponding tuples in Rc(j) and the corresponding signature; and Ver simply verifies a number of signatures.
[00148] For the dynamic case, the extensive use of signatures is not an efficient solution, since because of the updates on the set S, after every update all signatures have to be updated. Alternatively, signature amortization can be used, where only one digest of set Rc is signed (incurring O(l) update (signing) cost). Two alternative options for computing the digest of set Rc are: (i) the use of a hash tree and (if) the use of an accumulator. The construction of QAS(Qe, "Rc) is straightforward. Hash trees have linear storage needs, logarithmic access, update and verification times and logarithmic proof size. Dynamic accumulators, on the other hand, have linear storage needs, constant time verification and constant proof, at an increased cost to support updates and processes (of witnesses). Note that trapdoor information can only be used by algorithm Auth and not by algorithm Res for it would destroy the security of the scheme. Some interesting trade-offs between the update and process time costs are discussed in other publications (e.g., one can achieve a -Jn trade-off). See M. T. Goodrich, R. Tamassia, and J. Hasic. An efficient dynamic and distributed cryptographic accumulator. In Proc. of Information Security Conference (ISC), volume 2433 of LNCS, pages 372-388. Springer- Verlag, 2002.
E Consistency for Data Outsourcing in the Client-Server Model
[00149] The problem of secure data outsourcing in a client-server communication model is considered. In this setting, a client Cl completely outsources a data set S owned by the client to a remote, untrusted server Ser. S is generated and queried through a series of update and query operations issued by Cl. At any time, the client Cl keeps some state information s that encodes information about the current state of the outsourced set S. Then the communication protocol is as follows:
[00150] 1. The client Cl keeps state information s and issues an update operation to the server Ser. [00151] 2. Server Ser performs the operation, i.e., Ser accordingly updates S to a new version S', and generates a proof π, which is then returned to the client Cl. One can write π *— certify(ø, S, S *). The proofs returned to the client is called a consistency proof.
[00152] 3. Client Cl runs a verification algorithm, which takes as input the current state s, the operation o and the corresponding consistency proof π and either accepts or rejects the input. If the input is accepted, the state s is appropriately updated to a new state s '. One can write {(yes, s "), (no, -1-)} <— verify(.y, π).
[00153] The above protocol and pair of algorithms (certify, verify) are called a data outsourced scheme. Next described is the security requirement that it is desired for a data outsourced scheme to satisfy. Intuitively, one wants the scheme to satisfy correctness and consistency, meaning that a correct behavior by the server Ser to any operation will be accepted by the verification algorithm, but any inconsistency or misbehavior by Ser with respect to any single update operation will be immediately detected and rejected.
[00154] More formally, consider a data outsourced scheme (certify, verify) let operate(v) be the algorithm that, given the current set S and an operation o, performs the operation o and brings the file to the updated version S '. That is, S' <— operate(ø, S). Let τ = (p\, ... , Ot) be a sequence of / operations issued by the client Cl on an initially empty set So and initial empty state s = *- and let S be the set after the last operation is performed. One can say that s is a consistent state for series τ with respect to the scheme in consideration, if s has been computed by running algorithms operate, certify and verify sequentially for all operations o\, ... , o, in τ. In this case, one simply says that s is consistent with S.
[00155] Definition 16 (Security for data outsourced schemes.) Let (certify, verify) be a data outsourced scheme with security parameter K. Let s be any state that is consistent with the set S that corresponds to any series of operations in an initially empty set, and let o be any operation. Then, (certify, verify) is said to be secure if the following requirements are satisfied.
[00156] . Correctness. Whenever π *— certify(o, S, operate(σ, S)), then it holds that (yes, s') *— verify(j, π). That is, if the new operation o is performed correctly and the consistency proof is generated using algorithm certify, then the verification algorithm accepts and computes the new state s' (which is consistent with the new set).
[00157] Consistency. For any polynomial-time adversary A, having oracle-access to algorithms certify and verify, that on input a set S and an operation o produces a consistency proof π, whenever (yes, s') *— verify(.s, π), then the probability that either (S1) ♦— operate(o, S) does not hold or s' is not consistent with S' is negligible in the security parameter K. That is, assuming a polynomially bounded adversary that observes a polynomial number of protocol invocations and then produces a pair of consistency proof π, if π for the new operation o is accepted by the verification algorithm, then with overwhelming probability the operation has been performed correctly and the new state is consistent with the new set.
[00158] According to this definition, if the client Cl starts from an empty set and outsources it to the server Ser (through appropriate update operations) using a secure data outsourced scheme, Cl will end up with a consistent state with the final data set. Thus, the data set is consistent with the history of updates and all future operations will be verified. With respect to efficiency, let it be said that a data outsourced scheme is time-efficient if the verification time is sub-linear in the data set size. Furthermore,, an authenticated storage scheme is space-efficient if the state information stored by the client Cl is sub- linear on the data set size. It is space-optimal if the state information is of constant size.
[00159] In one non-limiting, exemplary embodiment, and as illustrated in FIG.2, a method comprises: receiving a query on a structured data set (box 21); transforming the query into a plurality of set membership queries on a plurality of sets, wherein the plurality of sets are obtained by modifying the structured data set (box 22); determining a plurality of first answers corresponding to the plurality of set membership queries (box 23); processing the plurality of first answers to obtain a second answer corresponding to the query (box 24); and returning the second answer (box 25).
[00160] A method as above, further comprising: determining an answer test corresponding to the second answer; and returning the answer test. A method as above, further comprising: verifying the second answer by utilizing the answer test. A method as in any of the above, further comprising: in response to receiving an update for the structured data set, updating the structured data set and returning an update test. A method as in any of the above, wherein the plurality of sets comprises a certification image. A method as in any of the above, wherein the method is implemented by a computer program.
[00161] hi another non-limiting, exemplary embodiment, a computer program product comprises program instructions tangibly embodied on a computer-readable medium execution of which results in operations comprising: receiving a query on a structured data set; transforming the query into a plurality of set membership queries on a plurality of sets, wherein the plurality of sets are obtained by modifying the structured data set; determining a plurality of first answers corresponding to the plurality of set membership queries; processing the plurality of first answers to obtain a second answer corresponding to the query; and returning the second answer.
[00162] A computer program product as above, further comprising: determining an answer test corresponding to the second answer; and returning the answer test. A computer program product as above, further comprising: verifying the second answer by utilizing the answer test. A computer program product as in any of the above, further comprising: in response to receiving an update for the structured data set, updating the structured data set and returning an update test. A computer program product as in any of the above, wherein the plurality of sets comprises a certification image.
[00163] In another non-limiting, exemplary embodiment, an electronic device comprises: a memory configured to store a structured data set and a plurality of sets, wherein the plurality of sets are obtained by modifying the structured data set; and a data processor configured to receive a query on the structured data set, to transform the query into a plurality of set membership queries on the plurality of sets, to determine a plurality of first answers corresponding to the plurality of set membership queries; to process the plurality of first answers to obtain a second answer corresponding to the query, and to return the second answer.
[00164] An electronic device as above, wherein the data processor is further configured to determine an answer test corresponding to the second answer and to return the answer test. An electronic device as in any of the above, wherein the data processor is further configured, in response to receiving an update for the structured data set, to update the structured data set and return an update test. An electronic device as in any of the above, wherein the plurality of sets comprises a certification image. An electronic device as in any of the above, embodied as a responder node in a network. An electronic device as in any of the above, wherein the network comprises a peer-to-peer network.
[00165] In another non-limiting, exemplary embodiment, and as illustrated in FIG.
3, a method comprises: storing a data set and a certification image for the data set at a responder (box 31); storing at least one digest of the certification image at a data source (box 32); in response to an update to the data set by the data source, the data source using the at least one digest and the certification image to verify the update (box 33); receiving a query on the data set, wherein the query is received by the responder from the data source (box 34); transforming the query into a plurality of set membership queries on a plurality of sets, wherein the plurality of sets are obtained by modifying the data set (box 35); determining a plurality of first answers corresponding to the plurality of set membership queries (box 36); processing the plurality of first answers to obtain a second answer corresponding to the query (box 37); and returning the second answer and a corresponding proof for the second answer from the responder to the data source (box 38).
[00166] A method as above, wherein the certification image comprises the plurality of sets. A method as in any of the above, further comprising: determining an answer test corresponding to the second answer; and returning the answer test. A method as in any of the above, further comprising: in response to receiving an update for the structured data set, updating the structured data set and returning an update test. A method as in any of the above, wherein the method is implemented by a computer program.
[00167] In another non-limiting, exemplary embodiment, a computer program product comprises program instructions tangibly embodied on a computer-readable medium execution of which results in operations comprising: storing a data set and a certification image for the data set at a responder; storing at least one digest of the certification image at a data source; in response to an update to the data set by the data source, the data source using the at least one digest and the certification image to verify the update; receiving a query on the data set, wherein the query is received by the responder from the data source; transforming the query into a plurality of set membership queries on a plurality of sets, wherein the plurality of sets are obtained by modifying the data set; determining a plurality of first answers corresponding to the plurality of set membership queries; processing the plurality of first answers to obtain a second answer corresponding to the query; and returning the second answer and a corresponding proof for the second answer from the responder to the data source.
[00168] A computer program product as above, wherein the certification image comprises the plurality of sets. A computer program product as in any of the above, further comprising: determining an answer test corresponding to the second answer; and returning the answer test. A computer program product as in any of the above, further comprising: in response to receiving an update for the structured data set, updating the structured data set and returning an update test.
[00169] In another non-limiting, exemplary embodiment, a system comprises: a responder configured to store a data set and a certification image for the data set; and a data source configured to store at least one digest of the certification image, wherein in response to an update to the data set by the data source, the data source is configured to use the at least one digest and the certification image to verify the update, wherein the responder is configured to receive a query on the data set from the data source, to transform the query into a plurality of set membership queries on a plurality of sets, to determine a plurality of first answers corresponding to the plurality of set membership queries, to process the plurality of first answers to obtain a second answer corresponding to the query, and to return the second answer and a corresponding proof for the second answer to the data source, wherein the plurality of sets are obtained by modifying the data set.
[00170] A system as above, wherein the certification image comprises the plurality of sets. A system as in any of the above, wherein the responder is further configured to determine an answer test corresponding to the second answer and to return the answer test to the data source. A system as in any of the above, wherein the system comprises a peer-to-peer network.
[00171] In another non-limiting, exemplary embodiment, and as illustrated in FIG. 4, a method includes: storing a data set and a certification image for the data set at a data source and a responder (box 41); storing at least one digest of the certification image at the data source and the responder (box 42); storing at least one signature of the at least one stored digest at the data source and the responder (box 43); in response to an update to the data set by the data source, updating the corresponding stored at least one digest by the data source and the responder (box 44); signing the updated at least one digest by the data source (box 45); transmitting the signed updated at least one digest from the data source to the responder (box 46); receiving a query on the data set, wherein the query is received by the responder from a query source (box 47); transforming the query into a plurality of set membership queries on a plurality of sets, wherein the transformation is performed by the responder and the plurality of sets are obtained by modifying the data set (box 48); determining by the responder a plurality of first answers corresponding to the plurality of set membership queries (box 49); processing by the responder the plurality of first answers to obtain a second answer corresponding to the query (box 50); sending the second answer, a corresponding proof for the second answer, and the respective signed at least one digest from the responder to the query source (box 51); and verifying the second answer by the query source using the proof and the signed at least one digest (box 52).
[00172] A method as above, further comprising: storing by the data source and the responder a set digest of all elements of the plurality of sets. A method as in any of the above, further comprising: storing by the data source and the responder at least one hash tree or authenticated skip list for the plurality of sets; and storing by the data source and the responder at least one digest of at least one root node of said at least one hash tree or authenticated skip list. A method as in any of the above, wherein the responder is realized by means of a collection of first nodes in a peer-to-peer network, wherein each first node corresponds to a second node of a hash tree or an authenticated skip list. A method as in any of the above, further comprising: storing by the data source and the responder at least one accumulator for the plurality of sets; and storing by the data source and die responder at least one accumulator digest of accumulations of said at least one accumulator. A method as in any of the above, wherein the method is implemented by a computer program.
[00173] In another non-limiting, exemplary embodiment, a computer program product comprises program instructions tangibly embodied on a computer-readable medium execution of which results in operations comprising: storing a data set and a certification image for the data set at a data source and a responder; storing at least one digest of the certification image at the data source and the responder; storing at least one signature of the at least one stored digest at the data source and the responder; in response to an update to the data set by the data source, updating the corresponding stored at least one digest by the data source and the responder; signing the updated at least one digest by the data source; transmitting the signed updated at least one digest from the data source to the responder; receiving a query on the data set, wherein the query is received by the responder from a query source; transforming the query into a plurality of set membership queries on a plurality of sets, wherein the transformation is performed by the responder and the plurality of sets are obtained by modifying the data set; determining by the responder a plurality of first answers corresponding to the plurality of set membership queries; processing by the responder the plurality of first answers to obtain a second answer corresponding to the query; sending the second answer, a corresponding proof for the second answer, and the respective signed at least one digest from the responder to the query source; and verifying the second answer by the query source using the proof and the signed at least one digest.
[00174] A computer program product as above, further comprising: storing by the data source and the responder a set digest of all elements of the plurality of sets. A computer program product as in any of the above, further comprising: storing by the data source and the responder at least one hash tree or authenticated skip list for the plurality of sets; and storing by the data source and the responder at least one digest of at least one root node of said at least one hash tree or authenticated skip list. A computer program product as in any of the above, wherein the responder is realized by means of a collection of first nodes in a peer-to-peer network, wherein each first node corresponds to a second node of a hash tree or an authenticated skip list. A computer program product as in any of the above, further comprising: storing by the data source and the responder at least one accumulator for the plurality of sets; and storing by the data source and the responder at least one accumulator digest of accumulations of said at least one accumulator.
[00175] In another non-limiting, exemplary embodiment, a system comprises: a data source configured to store a data set, a certification image for the data set, at least one digest of the certification image, at least one signature of the at least one digest, to update the at least one digest in response to an update operation, to sign the updated at least one digest, and to transmit the signed updated at least one digest to the responder; a query source; and a responder configured to receive the signed updated at least one digest from the data source, to receive the query on the data set from the query source, to transform the query into a plurality of set membership queries on a plurality of sets obtained by modifying the data set, to determine a plurality of first answers corresponding to the plurality of set membership queries, to process the plurality of first answers to obtain a second answer corresponding to the query, and to send the second answer, a corresponding proof for the second answer, and the signed at least one digest to the query source, wherein the query source is configured to verify the received second answer by using the proof and the signed at least one digest.
[00176] A system as above, wherein the data source is further configured to store a set digest of all elements of the plurality of sets and wherein the responder is further configured to store a set digest of all elements of the plurality of sets. A system as in any of the above, wherein the responder is realized by means of a collection of first nodes in a peer-to-peer network, wherein each first node corresponds to a second node of a hash tree or an authenticated skip list. A system as in any of the above, wherein the data source is further configured to store at least one hash tree or authenticated skip list for the plurality of sets and at least one digest of at least one root node of said at least one hash tree or authenticated skip list and wherein the responder is further configured to store at least one hash tree or authenticated skip list for the plurality of sets and at least one digest of at least one root node of said at least one hash tree or authenticated skip list. A system as in any of the above, wherein the data source is further configured to store at least one accumulator for the plurality of sets and at least one accumulator digest of accumulations of said at least one accumulator and wherein the responder is further configured to store at least one accumulator for the plurality of sets and at least one accumulator digest of accumulations of said at least one accumulator.
[00177] Generally, various exemplary embodiments of the invention can be implemented in different mediums, such as software, hardware, logic, special purpose circuits or any combination thereof. As a non-limiting example, some aspects may be implemented in software which may be run on a computing device, while other aspects may be implemented in hardware.
[00178] The foregoing description has provided by way of exemplary and non- limiting examples a full and informative description of the best method and apparatus presently contemplated by the inventors for carrying out the invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. However, all such and similar modifications of the teachings of this invention will still fall within the scope of this invention.
[00179] Furthermore, some of the features of the preferred embodiments of this invention could be used to advantage without the corresponding use of other features. As such, the foregoing description should be considered as merely illustrative of the principles of the invention, and not in limitation thereof.

Claims

CLAIMSWhat is claimed is:
1. A method comprising: receiving a query on a structured data set; transforming the query into a plurality of set membership queries on a plurality of sets, wherein the plurality of sets are obtained by modifying the structured data set; determining a plurality of first answers corresponding to the plurality of set membership queries; processing the plurality of first answers to obtain a second answer corresponding to the query; and returning the second answer.
2. A method as in claim 1, further comprising: determining an answer test corresponding to the second answer; and returning the answer test.
3. A method as in claim 2, further comprising: verifying the second answer by utilizing the answer test.
4. A method as in any one of the above, further comprising: in response to receiving an update for the structured data set, updating the structured data set and returning an update test.
5. A method as in any one of the above, wherein the plurality of sets comprises a certification image.
6. A method as in any one of the above, wherein the method is implemented by a computer program.
7. An electronic device comprising: a memory configured to store a structured data set and a plurality of sets, wherein the plurality of sets are obtained by modifying the structured data set; and a data processor configured to receive a query on the structured data set, to transform the query into a plurality of set membership queries on the plurality of sets, to determine a plurality of first answers corresponding to the plurality of set membership queries; to process the plurality of first answers to obtain a second answer corresponding to the query, and to return the second answer.
8. An electronic device as in claim 7, wherein the data processor is further configured to determine an answer test corresponding to the second answer and to return the answer test.
9. An electronic device as in claim 7 or 8, wherein the data processor is further configured, in response to receiving an update for the structured data set, to update the structured data set and return an update test.
10. An electronic device as in claim 7, 8 or 9, wherein the plurality of sets comprises a certification image.
11. An electronic device as in claim 7, 8, 9 or 10, embodied as a responder node in a network.
12. An electronic device as in claim 11 , wherein the network comprises a peer-to-peer network.
13. A method comprising: storing a data set and a certification image for the data set at a responder; storing at least one digest of the certification image at a data source; in response to an update to the data set by the data source, the data source using the at least one digest and the certification image to verify the update; receiving a query on the data set, wherein the query is received by the responder from the data source; transforming the query into a plurality of set membership queries on a plurality of sets, wherein the plurality of sets are obtained by modifying the data set; determining a plurality of first answers corresponding to the plurality of set membership queries; processing the plurality of first answers to obtain a second answer corresponding to the query; and returning the second answer and a corresponding proof for the second answer from the responder to the data source.
14. A method as in claim 13 , wherein the certification image comprises the plurality of sets.
15. A method as in claim 13 or 14, further comprising: determining an answer test corresponding to the second answer; and returning the answer test.
16. A method as in claim 13, 14 or 15, further comprising: in response to receiving an update for the structured data set, updating the structured data set and returning an update test.
17. A method as in claim 13, 14, 15 or 16, wherein the method is implemented by a computer program.
18. A system comprising: a responder configured to store a data set and a certification image for the data set; and a data source configured to store at least one digest of the certification image, wherein in response to an update to the data set by the data source, the data source is configured to use the at least one digest and the certification image to verify the update, wherein the responder is configured to receive a query on the data set from the data source, to transform the query into a plurality of set membership queries on a plurality of sets, to determine a plurality of first answers corresponding to the plurality of set membership queries, to process the plurality of first answers to obtain a second answer corresponding to the query, and to return the second answer and a corresponding proof for the second answer to the data source, wherein the plurality of sets are obtained by modifying the data set.
19. A system as in claim 18, wherein the certification image comprises the plurality of sets.
20. A system as in claim 18 or 19, wherein the responder is further configured to determine an answer test corresponding to the second answer and to return the answer test to the data source.
21. A system as in claim 18, 19 or 20, wherein the system comprises a peer-to-peer network.
22. A method comprising: storing a data set and a certification image for the data set at a data source and a responder; storing at least one digest of the certification image at the data source and the responder; storing at least one signature of the at least one stored digest at the data source and the responder; in response to an update to the data set by the data source, updating the corresponding stored at least one digest by the data source and the responder; signing the updated at least one digest by the data source; transmitting the signed updated at least one digest from the data source to the responder; receiving a query on the data set, wherein the query is received by the responder from a query source; transforming the query into a plurality of set membership queries on a plurality of sets, wherein the transformation is performed by the responder and the plurality of sets are obtained by modifying the data set; determining by the responder a plurality of first answers corresponding to the plurality of set membership queries; processing by the responder the plurality of first answers to obtain a second answer corresponding to the query; sending the second answer, a corresponding proof for the second answer, and the respective signed at least one digest from the responder to the query source; and verifying the second answer by the query source using the proof and the signed at least one digest.
23. A method as in claim 22, further comprising: storing by the data source and the responder a set digest of all elements of the plurality of sets.
24. A method as in claim 22 or 23, further comprising: storing by the data source and the responder at least one hash tree or authenticated skip list for the plurality of sets; and storing by the data source and the responder at least one digest of at least one root node of said at least one hash tree or authenticated skip list.
25. A method as in claim 24, wherein the responder is realized by means of a collection of first nodes in a peer-to-peer network, wherein each first node corresponds to a second node of a hash tree or an authenticated skip list.
26. A method as in claim 22, 23, 24 or 25, further comprising: storing by the data source and the responder at least one accumulator for the plurality of sets; and storing by the data source and the responder at least one accumulator digest of accumulations of said at least one accumulator.
27. A method as in claim 22, 23, 24, 25 or 26, wherein the method is implemented by a computer program.
28. A system comprising: a data source configured to store a data set, a certification image for the data set, at least one digest of the certification image, at least one signature of the at least one digest, to update the at least one digest in response to an update operation, to sign the updated at least one digest, and to transmit the signed updated at least one digest to the responder; a query source; and a responder configured to receive the signed updated at least one digest from the data source, to receive the query on the data set from the query source, to transform the query into a plurality of set membership queries on a plurality of sets obtained by modifying the data set, to determine a plurality of first answers corresponding to the plurality of set membership queries, to process the plurality of first answers to obtain a second answer corresponding to the query, and to send the second answer, a corresponding proof for the second answer, and the signed at least one digest to the query source, wherein the query source is configured to verify the received second answer by using the proof and the signed at least one digest.
29. A system as in claim 28, wherein the data source is further configured to store a set digest of all elements of the plurality of sets and wherein the responder is further configured to store a set digest of all elements of the plurality of sets.
30. A system as in claim 28 or 29, wherein the responder is realized by means of a collection of first nodes in a peer-to-peer network, wherein each first node corresponds to a second node of a hash tree or an authenticated skip list.
PCT/US2007/017072 2006-07-28 2007-07-31 Certification and authentication of data structures WO2008014007A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US83387706P 2006-07-28 2006-07-28
US60/833,877 2006-07-28

Publications (2)

Publication Number Publication Date
WO2008014007A2 true WO2008014007A2 (en) 2008-01-31
WO2008014007A3 WO2008014007A3 (en) 2008-12-24

Family

ID=38982148

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2007/017072 WO2008014007A2 (en) 2006-07-28 2007-07-31 Certification and authentication of data structures

Country Status (1)

Country Link
WO (1) WO2008014007A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2338127A1 (en) * 2008-08-29 2011-06-29 Brown University Cryptographic accumulators for authenticated hash tables
CN113972980A (en) * 2020-07-24 2022-01-25 国民技术股份有限公司 Method and device for optimizing lattice code polynomial multiplication operation based on number theory transformation

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040230843A1 (en) * 2003-08-20 2004-11-18 Wayne Jansen System and method for authenticating users using image selection
US7100049B2 (en) * 2002-05-10 2006-08-29 Rsa Security Inc. Method and apparatus for authentication of users and web sites

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7100049B2 (en) * 2002-05-10 2006-08-29 Rsa Security Inc. Method and apparatus for authentication of users and web sites
US20040230843A1 (en) * 2003-08-20 2004-11-18 Wayne Jansen System and method for authenticating users using image selection

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2338127A1 (en) * 2008-08-29 2011-06-29 Brown University Cryptographic accumulators for authenticated hash tables
EP2338127A4 (en) * 2008-08-29 2013-12-04 Univ Brown Cryptographic accumulators for authenticated hash tables
US8726034B2 (en) 2008-08-29 2014-05-13 Brown University Cryptographic accumulators for authenticated hash tables
US9098725B2 (en) 2008-08-29 2015-08-04 Brown University Cryptographic accumulators for authenticated hash tables
CN113972980A (en) * 2020-07-24 2022-01-25 国民技术股份有限公司 Method and device for optimizing lattice code polynomial multiplication operation based on number theory transformation

Also Published As

Publication number Publication date
WO2008014007A3 (en) 2008-12-24

Similar Documents

Publication Publication Date Title
Zhang et al. Gem^ 2-tree: A gas-efficient structure for authenticated range queries in blockchain
Xu et al. vchain: Enabling verifiable boolean range queries over blockchain databases
Hu et al. Spatial query integrity with voronoi neighbors
Meng et al. Grecs: Graph encryption for approximate shortest distance queries
Cormode et al. Verifying computations with streaming interactive proofs
Papamanthou et al. Authenticated hash tables
EP2338127B1 (en) Cryptographic accumulators for authenticated hash tables
Goodrich et al. Efficient authenticated data structures for graph connectivity and geometric search problems
Tamassia et al. Certification and Authentication of Data Structures.
Hu et al. Authenticating location-based services without compromising location privacy
Papamanthou et al. Time and space efficient algorithms for two-party authenticated data structures
US20120030468A1 (en) System and method for optimal verification of operations on dynamic sets
Rahman et al. A blockchain-enabled privacy-preserving verifiable query framework for securing cloud-assisted industrial internet of things systems
US9049185B1 (en) Authenticated hierarchical set operations and applications
Li et al. Integrity-verifiable conjunctive keyword searchable encryption in cloud storage
Ostrovsky et al. Efficient consistency proofs for generalized queries on a committed database
Azraoui et al. Publicly verifiable conjunctive keyword search in outsourced databases
Goodrich et al. Efficient verification of web-content searching through authenticated web crawlers
Yang et al. Authentication of function queries
Hong et al. Privacy protection and integrity verification of aggregate queries in cloud computing
Zhang et al. CorrectMR: Authentication of distributed SQL execution on MapReduce
WO2008014002A2 (en) Super-efficient verification of dynamic outsourced databases
WO2008014007A2 (en) Certification and authentication of data structures
He et al. FMSM: A fuzzy multi-keyword search scheme for encrypted cloud data based on multi-chain network
Zhang et al. Distributed kNN query authentication

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07836353

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

NENP Non-entry into the national phase in:

Ref country code: RU

122 Ep: pct application non-entry in european phase

Ref document number: 07836353

Country of ref document: EP

Kind code of ref document: A2