CN102750354A - Method for analyzing and processing non-structured data query operating language - Google Patents

Method for analyzing and processing non-structured data query operating language Download PDF

Info

Publication number
CN102750354A
CN102750354A CN2012101908325A CN201210190832A CN102750354A CN 102750354 A CN102750354 A CN 102750354A CN 2012101908325 A CN2012101908325 A CN 2012101908325A CN 201210190832 A CN201210190832 A CN 201210190832A CN 102750354 A CN102750354 A CN 102750354A
Authority
CN
China
Prior art keywords
enquiry module
index
storehouse
internal command
query
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012101908325A
Other languages
Chinese (zh)
Other versions
CN102750354B (en
Inventor
王建民
丁贵广
卓安
黄向东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201210190832.5A priority Critical patent/CN102750354B/en
Publication of CN102750354A publication Critical patent/CN102750354A/en
Application granted granted Critical
Publication of CN102750354B publication Critical patent/CN102750354B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method for analyzing and processing a non-structured data query operating language, which belongs to the technical field of management of computer data. According to the method for analyzing and processing a non-structured data query operating language provided by the invention, a structured query language is defined specific to the query of non-structured data, and the language is easy to extend and can be fused with customized query functions like the query language grammar of the conventional relation database. The method comprises the following steps of starting a query module in a key value library, receiving a query language request of a user, analyzing a language and converting into an internal command; calling each functional module in the key value library to execute by using the query module according to the internal command; and returning a result to the user after the command is executed. According to the method, the query module is taken as a core, and a key value library on a bottom layer is accessed in a way of designing a similar SQL (Structured Query Language), so that the user can operate the key value library easily and manage non-structured data.

Description

A kind of parsing of non-structural data enquiry operating language and disposal route
Technical field
The present invention relates to a kind of parsing and disposal route of non-structural data enquiry operating language, belong to the computer data management technical field.
Background technology
Along with becoming increasingly abundant and the continuous development of IT application in enterprise of emerging application such as internet, a large amount of unstructured datas has appearred.Because the unstructured data data type is abundant; Complex structure does not have data structure constraint clear and definite, unified Definition, in addition the data scale of its magnanimity; Highly dynamic data characteristic; Various application scenarios, unified associating requirements for access makes the unstructured data management face huge challenge.
Traditional relational database is difficult to propose effective solution on the unstructured data of handling magnanimity.The data model of traditional database all is the logical organization of mode prioritization, and unstructured data then is the logical organization that pattern lags behind, and this makes and is based upon the data managing method no longer valid on the problem that solves unstructured data on the relational algebra basis.The mass property of unstructured data also makes traditional database powerless on performance and extendability.
The mode prioritization logic of traditional database has been broken with the mode of non-mode in emerging key assignments storehouse, and it has guaranteed read-write at a high speed with the mode of key assignments simultaneously.Current trend also develops rapidly that there are HBase, MangoDB, Dynamo and Cassandra or the like in the key assignments storehouse.They have guaranteed the storage and the extendability of mass data with the distributed type assemblies mode, and the present invention just is being based on such key assignments storehouse.
Yet emerging key assignments storehouse does not have perfect inquiry mode and query language.Like HBase API Access is provided, Cassandra provides the SQL-like language mode of API and a kind of CQL by name to visit.Yet they are because the restriction in its data storehouse, only can simply inquire about and upgrade unstructured data, and complicated analytic function is not provided, and also do not consider the language description mode of Large Volume Data.CouchDB and SQLite two founders unite at the unified query language UnQL that attempts to design the key assignments storehouse, yet also only have only blank at present, also effectively do not consider for these characteristics of many characteristic queries of unstructured data.
From final user and application point of view, the non-structural data enquiry language should solve following problem:
(1) seating surface is to the non-structural data enquiry of key assignments library storage;
Existing unstructured data is many being stored in the key assignments storehouse as the solution of magnanimity with efficient read-write, and the key assignments storehouse does not often provide easy-to-use query language.
Manifold unification that (2) can effectively solve different unstructured datas is inquired about;
Language such as existing C QL only provide simple query function, can't carry out characteristic key to unstructured data.Such as view data being carried out characteristic key such as histogram, color, audio frequency is carried out MFCC characteristic key or the like.
(3) how to carry out data query and analysis effectively.
The traditional data inquiry only realizes index and simple statistical function.For the unstructured data of magnanimity, the analysis that a lot of results need carry out data draws, so the more data analytic function should be supported as much as possible in query language.
Summary of the invention
The objective of the invention is to propose a kind of parsing and disposal route of non-structural data enquiry operating language; Problem to the existence of unstructured data management domain; Visit the key assignments storehouse of bottom with a kind of mode of similar sql like language, to reach the purpose that lets user's easy manipulation key assignments storehouse manage unstructured data.
The parsing and the disposal route of the unstructured data managing queries language that the present invention proposes may further comprise the steps:
(1) enquiry module in the startup key assignments storehouse, the query language request of enquiry module monitoring users;
(2) enquiry module receive the user the query language request, language is resolved, analyzing step is following:
(2-1) user side adopts the query language type of drive to connect enquiry module, sets up the session between user side and the enquiry module, and preserves the session information in the conversation procedure, and the access queries module is sent query language to enquiry module;
(2-2) through the resolver in the enquiry module, enquiry module converts the query language request that user side sends into internal command;
(3) above-mentioned internal command is judged that if the order that this internal command is shown for the key assignments storehouse of specifying this session, then enquiry module is preserved this name of specifying key assignments storehouse table, and this session of acquiescence is carried out under showing in this key assignments storehouse in follow-up order; If the optional position in the query language has a similar key word, then enquiry module is handed to the index calling module in the key assignments storehouse with this internal command; If the optional position in the query language has a function key word, then enquiry module is handed to the function call module in the key assignments storehouse with this internal command;
(4) enquiry module in the key assignments storehouse is according to internal command, and each functional module of calling in the key assignments storehouse is carried out internal command, and detailed process is following:
(4-1), then adopt the server fill order in the key assignments storehouse if internal command is a structured query commands;
(4-2) if internal command for creating the index order of key assignments storehouse, then adopts the server fill order in the key assignments storehouse;
(4-3) if internal command for creating the index order of non-key assignments storehouse, then makes up an index and realizes the storehouse, and call index and realize the storehouse fill order;
(4-4) if internal command for service data Functional Analysis order, then makes up a data Functional Analysis module, and calls data function analysis module fill order, enquiry module obtains the executing state and the execution result of order;
(4-5) if internal command is big data transmission, then use independently data transmission stream to wait for and is connected with user side, after the completion connection, carry out file transfer through data transmission stream; After the end of transmission (EOT), enquiry module is preserved the file of transmission, and keeps the session between user side and the enquiry module;
(4-6) be self-defined establishment index, search index and set up function as if internal command; The fill order of self-defined establishment index and search index; Then indicate the establishment parameter and the index creation type of index, accomplish the establishment and the inquiry of index through a key word; For the self-defined fill order of setting up function, enquiry module is according to the elongated parameter of function key word in the query language and function, and the function of from the configuration file of enquiry module, listing is supported to select corresponding function in the type, accomplishes the foundation of function;
(4-7) if internal command is the conjunctive query of polytype index; Then enquiry module breaks to the polytype index; Obtain the inquiry clause of each types index,, read the priority of different index inquiry in the configuration file of enquiry module according to inquiry clause; Adjust the search order of a plurality of inquiry clauses, inquire about;
(5) enquiry module returns Query Result to user side.
The parsing and the disposal route of the unstructured data managing queries language that the present invention proposes; Inquiry to unstructured data; Defined structurized query language, with the query language syntactic class of traditional relational seemingly, this language is prone to expansion and can merges self-defining query function.The core of the inventive method is an enquiry module, makes enquiry module and the loose coupling of key assignments storehouse through design interface, can easily the enquiry module that has the key assignments storehouse now be transplanted in other key assignments storehouses; The inventive method provides the multiple self-defined characteristic key that comprises, therefore can directly manage multiple unstructured data; The inventive method can be supported the read-write operation of big data (like file), and the executable operations of the data analysis distributed that provides support function and characteristics such as the inquiry priority that can dispose is provided with guarantee to manage efficiently unstructured data.
Embodiment
The parsing and the disposal route of the unstructured data managing queries language that the present invention proposes may further comprise the steps:
(1) enquiry module in the startup key assignments storehouse, the query language request of enquiry module monitoring users;
(2) enquiry module receive the user the query language request, language is resolved, analyzing step is following:
(2-1) user side adopts the query language type of drive to connect enquiry module, sets up the session between user side and the enquiry module, and preserves the session information in the conversation procedure, and the access queries module is sent query language to enquiry module;
(2-2) through the resolver in the enquiry module, enquiry module converts the query language request that user side sends into internal command;
(3) above-mentioned internal command is judged that if the order that this internal command is shown for the key assignments storehouse of specifying this session, then enquiry module is preserved this name of specifying key assignments storehouse table, and this session of acquiescence is carried out under showing in this key assignments storehouse in follow-up order; If the optional position in the query language has similar (like) key word, then enquiry module is handed to the index calling module in the key assignments storehouse with this internal command; If the optional position in the query language has a function (function) key word, then enquiry module is handed to the function call module in the key assignments storehouse with this internal command;
(4) enquiry module in the key assignments storehouse is according to internal command, and each functional module of calling in the key assignments storehouse is carried out internal command, and detailed process is following:
(4-1) if internal command is a structured query commands, be listed as family or in row family, add deleted data like establishment table, establishment, then adopt the server fill order in the key assignments storehouse;
(4-2) if internal command for creating the index order of key assignments storehouse, then adopts the server fill order in the key assignments storehouse;
(4-3) if internal command for creating the index order of non-key assignments storehouse, like the high dimensional indexing of picture, the full-text index of text, then makes up an index and realizes the storehouse, and call index and realize the storehouse fill order;
(4-4) if internal command for service data Functional Analysis order, then makes up a data Functional Analysis module, and calls data function analysis module fill order, enquiry module obtains the executing state and the execution result of order;
(4-5) if internal command is big data transmission, then use independently data transmission stream to wait for and is connected with user side, after the completion connection, carry out file transfer through data transmission stream; After the end of transmission (EOT), enquiry module is preserved the file of transmission, and keeps the session between user side and the enquiry module;
(4-6) if internal command is self-defined establishment index, search index and sets up function that the query language that the present invention proposes reaches the effect of multiple index creation and inquiry, the support of multiple function through the setting of semi open model key word; For the fill order of self-defined establishment index and search index, then indicate the establishment parameter and the index creation type of index through a key word (for example with), accomplish the establishment and the inquiry of index; For the self-defined fill order of setting up function, enquiry module is according to the elongated parameter of function key word in the query language and function, and the function of from the configuration file of enquiry module, listing is supported to select corresponding function in the type, accomplishes the foundation of function;
(4-7) if internal command is the conjunctive query of polytype index, in complicated query statement comparatively, can there be the conjunctive query of key assignments storehouse acquiescence search index (filtration of train value or key assignments), a plurality of self-defined search indexs simultaneously; Then enquiry module breaks to the polytype index, obtains the inquiry clause of each types index, according to inquiry clause, reads the priority of different index inquiry in the configuration file of enquiry module, adjusts the search order of a plurality of inquiry clauses, inquires about;
(5) enquiry module returns Query Result to user side.

Claims (1)

1. the parsing and the disposal route of a unstructured data managing queries language is characterized in that this method may further comprise the steps:
(1) enquiry module in the startup key assignments storehouse, the query language request of enquiry module monitoring users;
(2) enquiry module receive the user the query language request, language is resolved, analyzing step is following:
(2-1) user side adopts the query language type of drive to connect enquiry module, sets up the session between user side and the enquiry module, and preserves the session information in the conversation procedure, and the access queries module is sent query language to enquiry module;
(2-2) through the resolver in the enquiry module, enquiry module converts the query language request that user side sends into internal command;
(3) above-mentioned internal command is judged that if the order that this internal command is shown for the key assignments storehouse of specifying this session, then enquiry module is preserved this name of specifying key assignments storehouse table, and this session of acquiescence is carried out under showing in this key assignments storehouse in follow-up order; If the optional position in the query language has a similar key word, then enquiry module is handed to the index calling module in the key assignments storehouse with this internal command; If the optional position in the query language has a function key word, then enquiry module is handed to the function call module in the key assignments storehouse with this internal command;
(4) enquiry module in the key assignments storehouse is according to internal command, and each functional module of calling in the key assignments storehouse is carried out internal command, and detailed process is following:
(4-1), then adopt the server fill order in the key assignments storehouse if internal command is a structured query commands;
(4-2) if internal command for creating the index order of key assignments storehouse, then adopts the server fill order in the key assignments storehouse;
(4-3) if internal command for creating the index order of non-key assignments storehouse, then makes up an index and realizes the storehouse, and call index and realize the storehouse fill order;
(4-4) if internal command for service data Functional Analysis order, then makes up a data Functional Analysis module, and calls data function analysis module fill order, enquiry module obtains the executing state and the execution result of order;
(4-5) if internal command is big data transmission, then use independently data transmission stream to wait for and is connected with user side, after the completion connection, carry out file transfer through data transmission stream; After the end of transmission (EOT), enquiry module is preserved the file of transmission, and keeps the session between user side and the enquiry module;
(4-6) be self-defined establishment index, search index and set up function as if internal command; The fill order of self-defined establishment index and search index; Then indicate the establishment parameter and the index creation type of index, accomplish the establishment and the inquiry of index through a key word; For the self-defined fill order of setting up function, enquiry module is according to the elongated parameter of function key word in the query language and function, and the function of from the configuration file of enquiry module, listing is supported to select corresponding function in the type, accomplishes the foundation of function;
(4-7) if internal command is the conjunctive query of polytype index; Then enquiry module breaks to the polytype index; Obtain the inquiry clause of each types index,, read the priority of different index inquiry in the configuration file of enquiry module according to inquiry clause; Adjust the search order of a plurality of inquiry clauses, inquire about;
(5) enquiry module returns Query Result to user side.
CN201210190832.5A 2012-06-11 2012-06-11 Method for analyzing and processing non-structured data query operating language Active CN102750354B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210190832.5A CN102750354B (en) 2012-06-11 2012-06-11 Method for analyzing and processing non-structured data query operating language

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210190832.5A CN102750354B (en) 2012-06-11 2012-06-11 Method for analyzing and processing non-structured data query operating language

Publications (2)

Publication Number Publication Date
CN102750354A true CN102750354A (en) 2012-10-24
CN102750354B CN102750354B (en) 2014-08-20

Family

ID=47030539

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210190832.5A Active CN102750354B (en) 2012-06-11 2012-06-11 Method for analyzing and processing non-structured data query operating language

Country Status (1)

Country Link
CN (1) CN102750354B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103425779A (en) * 2013-08-19 2013-12-04 曙光信息产业股份有限公司 Data processing method and data processing device
CN107122418A (en) * 2017-03-31 2017-09-01 北京奇艺世纪科技有限公司 A kind of querying method and device
CN108090139A (en) * 2017-11-30 2018-05-29 北京邮电大学 A kind of document retrieval method and device
CN108090219A (en) * 2014-12-24 2018-05-29 北京奇虎科技有限公司 The processing method and processing device of database onboard data
CN108846003A (en) * 2018-04-20 2018-11-20 广东电网有限责任公司 A kind of unstructured machine data processing method and processing device
CN113326033A (en) * 2021-06-09 2021-08-31 北京八分量信息科技有限公司 Key-value storage system with multi-language API
CN113468209A (en) * 2021-07-27 2021-10-01 广西电网有限责任公司 High-speed memory database access method for power grid monitoring system
CN116303581A (en) * 2023-05-19 2023-06-23 山东浪潮数字商业科技有限公司 Method, system, equipment and medium for adapting split-flow query load among heterogeneous databases

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7194483B1 (en) * 2001-05-07 2007-03-20 Intelligenxia, Inc. Method, system, and computer program product for concept-based multi-dimensional analysis of unstructured information
US20080201290A1 (en) * 2007-02-16 2008-08-21 International Business Machines Corporation Computer-implemented methods, systems, and computer program products for enhanced batch mode processing of a relational database
CN102129469A (en) * 2011-03-23 2011-07-20 华中科技大学 Virtual experiment-oriented unstructured data accessing method
CN102298641A (en) * 2011-09-14 2011-12-28 清华大学 Method for uniformly storing files and structured data based on key value bank

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7194483B1 (en) * 2001-05-07 2007-03-20 Intelligenxia, Inc. Method, system, and computer program product for concept-based multi-dimensional analysis of unstructured information
US20080201290A1 (en) * 2007-02-16 2008-08-21 International Business Machines Corporation Computer-implemented methods, systems, and computer program products for enhanced batch mode processing of a relational database
CN102129469A (en) * 2011-03-23 2011-07-20 华中科技大学 Virtual experiment-oriented unstructured data accessing method
CN102298641A (en) * 2011-09-14 2011-12-28 清华大学 Method for uniformly storing files and structured data based on key value bank

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
田万鹏 等: "一种基于特征的非结构化数据演化管理建模框架", 《计算机研究与发展》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103425779A (en) * 2013-08-19 2013-12-04 曙光信息产业股份有限公司 Data processing method and data processing device
CN108090219A (en) * 2014-12-24 2018-05-29 北京奇虎科技有限公司 The processing method and processing device of database onboard data
CN107122418A (en) * 2017-03-31 2017-09-01 北京奇艺世纪科技有限公司 A kind of querying method and device
CN108090139A (en) * 2017-11-30 2018-05-29 北京邮电大学 A kind of document retrieval method and device
CN108090139B (en) * 2017-11-30 2021-10-01 北京邮电大学 File retrieval method and device
CN108846003A (en) * 2018-04-20 2018-11-20 广东电网有限责任公司 A kind of unstructured machine data processing method and processing device
CN113326033A (en) * 2021-06-09 2021-08-31 北京八分量信息科技有限公司 Key-value storage system with multi-language API
CN113326033B (en) * 2021-06-09 2023-08-11 北京八分量信息科技有限公司 Key-value storage system with multi-language API
CN113468209A (en) * 2021-07-27 2021-10-01 广西电网有限责任公司 High-speed memory database access method for power grid monitoring system
CN116303581A (en) * 2023-05-19 2023-06-23 山东浪潮数字商业科技有限公司 Method, system, equipment and medium for adapting split-flow query load among heterogeneous databases
CN116303581B (en) * 2023-05-19 2023-08-04 山东浪潮数字商业科技有限公司 Method, system, equipment and medium for adapting split-flow query load among heterogeneous databases

Also Published As

Publication number Publication date
CN102750354B (en) 2014-08-20

Similar Documents

Publication Publication Date Title
CN102750354B (en) Method for analyzing and processing non-structured data query operating language
CN109299102B (en) HBase secondary index system and method based on Elastcissearch
CN105260403B (en) General integration across database access method
EP3158480B1 (en) Data query method and apparatus
CN105868411B (en) A kind of non-relational and relevant database integration data querying method and system
CN110837492B (en) Method for providing data service by multi-source data unified SQL
KR102177190B1 (en) Managing data with flexible schema
CN102750358B (en) Mapping method and system of system data model to common information model (CIM)
CN107451214A (en) A kind of non-primary key querying method and distributed NewSQL Database Systems
CN103455540B (en) The system and method for generating memory model from data warehouse model
CN103823815B (en) server and database access method
CN100590621C (en) Editing method of semantic mapping information between ontology schema and relational database schema
CN106777108A (en) A kind of data query method and apparatus based on mixing storage architecture
EP3005164A1 (en) Value based windows on relations in continuous data streams
WO2020135613A1 (en) Data query processing method, device and system, and computer-readable storage medium
CN109947791A (en) A kind of database statement optimization method, device, equipment and storage medium
CN113032423B (en) Query method and system based on dynamic loading of multiple data engines
CN103810219B (en) Line storage database-based data processing method and device
CN112579626A (en) Construction method and device of multi-source heterogeneous SQL query engine
CN107656951B (en) A kind of method of real time data in synchronous and heterogeneous Database Systems
US20160019242A1 (en) Migrating Federated Data to Multi-Source Universe Database Environment
CN108959538A (en) Text retrieval system and method
KR101339766B1 (en) Integrated cloud service system using mash-up between cloud service components
WO2019015364A1 (en) Method and device for executing structured query language (sql) instruction
CN104156640A (en) Data access right control method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant