CN102289483A - Method for automatically gathering metadata of space science data facing global change research - Google Patents

Method for automatically gathering metadata of space science data facing global change research Download PDF

Info

Publication number
CN102289483A
CN102289483A CN2011102205375A CN201110220537A CN102289483A CN 102289483 A CN102289483 A CN 102289483A CN 2011102205375 A CN2011102205375 A CN 2011102205375A CN 201110220537 A CN201110220537 A CN 201110220537A CN 102289483 A CN102289483 A CN 102289483A
Authority
CN
China
Prior art keywords
data
metadata
file
server
download
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011102205375A
Other languages
Chinese (zh)
Other versions
CN102289483B (en
Inventor
杨风雷
林青慧
黎建辉
沈志宏
胡良霖
周园春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Computer Network Information Center of CAS
Original Assignee
Computer Network Information Center of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Computer Network Information Center of CAS filed Critical Computer Network Information Center of CAS
Priority to CN 201110220537 priority Critical patent/CN102289483B/en
Publication of CN102289483A publication Critical patent/CN102289483A/en
Application granted granted Critical
Publication of CN102289483B publication Critical patent/CN102289483B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a method for automatically gathering metadata of space science data facing a global change research and belongs to the technical field of information. The method comprises the following steps of: 1) selecting a data source, and regularly accessing the data source by a server to generate a download list and a download task; 2) accessing a data source server according to the current download list and the download task to download a metadata file, and storing the metadata file to the server; 3) checking the quality of the downloaded metadata file, extracting a metadata item from the qualified metadata file, and calculating and converting; 4) storing a metadata file route and the metadata item into a metadata item database, and establishing an index; and 5) establishing a one-to-one corresponding mapping relation among the metadata file, the data of the metadata item database and the index data according to the file route, an id value of the metadata item database and an id value of an index item. By the method, massive global change space science data metadata resources which are distributed around the world can be queried in a one-stop manner.

Description

The automatic assemblage method of space science data metadata towards global change research due
Technical field
The invention belongs to infotech, global change research due field, relate in particular to and in the global change research due field, adopting infotech that distribution, multi-source, isomery space science data metadata are carried out automatic assemblage method.
Background technology
Along with the negative effect of whole world change is more and more serious, the becoming increasingly conspicuous of global environment problem, the research of whole world change is being subjected to unprecedented attention.This point can show from the four big global environmental change projects that International Council of Scientific Unions's International Science Organization (ISO)s such as (ICSU) is initiated successively: the IHDP (IHDP) of initiating WCRP (WCRP), initiating IGBP (IGBP), initiated bio-diversity plan (DIVERSITAS), initiated international global environmental change in 1996 in 1991 in 1987 in 1980.And, this four big sciences plan in calendar year 2001 amalgamated consolidation ESSP (ESSP), the purpose of alliance be to promote to earth system comprehensive integration research, promote between each project of earth system cooperation, strengthen human deep understanding to earth system.
Means that present global change research due mainly adopts and method comprise the global analysis of development, simulation, data mining etc.In the current big science epoch, no matter above-mentioned any research means all need a large amount of science data as the basis of research with support.The fast development of observation technology, particularly satellite remote sensing technology provides the ability that whole earth system action is monitored, and then provides direct basis for the research of whole world change based on this space science data.
In the global change research due field, the research model of different levels, angle is numerous, and this needs the science data in various sources as the basis usually.Even at a research model, the remote sensing space science data that also usually need to converge a plurality of data sources are supported, and these remote sensing science data sources are in distribution, heterogeneous states generally speaking.Owing to effect of natural conditions such as meteorologies, the remote sensing image of single data source can't cover whole survey region such as remotely-sensed data, needs to use other data sources to substitute usually.Can't be fast, accurately locate, converge under distribution, the isomery remote sensing space science data conditions, the space distribution of these remote sensing space science data, structure diversity have limited the calculating range scale of scientific research to a great extent.The space science data of a perfect in shape and function of these problem needs converge platform automatically and go to solve.
Consider that the prerequisite that the space science data converge automatically is converging automatically of metadata, the project of whole world change and scientist press for a kind of platform that can converge automatically distributed, multi-source, isomery space science data metadata.It can make things convenient for inquiry, the location of science data metadata, the magnanimity whole world change space science data metadata resource that finally makes scientist easily, one-stoply to inquire about to be distributed in all parts of the world.Through the inquiry document, not finding as yet at present can solution to the problems described above and platform.
Summary of the invention
At the above-mentioned problem that need converge automatically whole world change space science data metadata resource, the object of the present invention is to provide a kind of automatic assemblage method of space science data metadata towards global change research due.The present invention proposes in conjunction with the space science data characteristic, the thinking of dividing and rule according to systems engineering, by metadata resource dynamically find, the converging automatically of step implementation space science data metadata resource such as metadata dynamically converges, the expression of metadata unification and conversion, the unified retrieval quick and precisely of metadata.
The present invention includes following steps (as shown in Figure 1):
(1) metadata resource is dynamically found
Consider that whole world change science data magnanimity expands, emerged a large amount of high-quality data resources, and the metadata resource of most data resources all adopts friendly data sharing policy.The dynamic discovery of the constantly new metadata resource of expansion, and realization metadata resource is the key that metadata resource converges automatically.Need to set up compatible strong data source for this reason and converge interface, by transparent, mode is dynamically found metadata resource and is confirmed efficiently, to integrate distributed remote sensing space science data metadata resource better.
(2) metadata dynamically converges
Promptly how to keep the data provider and converge the consistency problem of metadata record between the platform towards the space science data metadata of global change research due automatically.For this reason, need upgrade on the situation basis in the metadata resource of analyzing data source, form different metadata harvesting mode and frequency, reach metadata download module based on this, satisfying the good compromise between Data Update demand and the mitigation system load, realize that metadata in real time or quasi real time synchronous.
(3) the metadata unification is expressed and conversion
The expression of science data metadata has the diversity feature between distributed space, realize the unitized management and retrieval to metadata, must realize that the unification of whole world change science data metadata is expressed.For solving the expression diverse problems of separate sources metadata, need on the basis of the international data representation in whole world change science data field and data interoperation standard, set up local compatible good metadata information model, and develop local metadata conversion device to the metadata of separate sources resolve, conversion, information extraction, express with the unification of realizing metadata.
(4) the unified retrieval quick and precisely of metadata
As above-mentioned, the whole world change related data have relation complexity, destructuring, data volume greatly, characteristics such as multi-scale, variation in time, isomerism is strong.In the unification conversion of carrying out metadata with express on the basis, in the face of ultra-large magnanimity science data metadata, need set up efficient quick indexing (numeral) system and metadatabase system,, fast and accurately metadata retrieval, location unified to realize.
For achieving the above object, the present invention adopts following technical scheme:
A kind of automatic assemblage method of space science data metadata towards global change research due comprises the steps:
(1) server is to different data sources, following process is carried out in circulation regularly: according to the difference of data product type, generate the effectively id tabulation of this data source meta data file, and wherein each id carried out validation verification (standard of checking is whether the meta data file that id indicates exists) thus obtain current effective meta data file id tabulation, to wherein effectively id generate complete meta data file url (containing corresponding picture file url), and with these meta data files url (containing corresponding picture file url) combination generation download list and downloading task, and start downloading task (as shown in Figure 2).
(2) server starts corresponding download thread in recognition objective task whether (contain normally, task type etc.) back according to task amount and resource situation, and according to the situation of thread and fair regular dynamic assignment download list (for having downloaded of task, can acquiescence do not carry out repeated downloads), each thread connects data server according to configuration file and obtains document flow, stores downloading content, picture file is carried out that size is dwindled and to recovering unusually and correction etc. (as shown in Figure 3) in the downloading process according to the tab sequential of being distributed afterwards.
(3) meta data file (contain corresponding picture file) is downloaded and is finished the back server and downloaded files is finished whether one by one quality check (comprise whether file can normally be opened, whether file size mates, between meta data file and picture file and the picture file that dwindles correspondence etc.), metadata item extract, (as shown in Figure 4) such as Numerical Indexs put and set up to latitude and longitude value, metadata conversion, the metadata on four summits of identification in storage on the basis of calculating.
(4) server to metadata according to multi-form storages such as file (step is finished), metadata item database data, index datas before, and shine upon the unified in logic metadata environment of formation according to rule, provide unified in logic metadata retrieval interface (as shown in Figure 5) based on this.
(5) user search partly provides the user search interface, and finishes inquiry (longitude and latitude, spatial object) to the user and carry out correlation calculations, according to rule (as distance etc.) result data sorted, finish (as shown in Figure 6) such as metadata query, location according to user's needs.
By above-mentioned steps, intactly realized towards work such as the space science data metadata of global change research due converge automatically.
Compared with prior art, advantage of the present invention and good effect:
Based on method of the present invention, can be so that scientists easily, one-stoply to be inquired about the magnanimity whole world change space science data metadata resource that is distributed in all parts of the world, the metadata that solves in the global change research due converges problem automatically.
Description of drawings
Fig. 1 converges process flow diagram automatically towards the space science data metadata of global change research due;
Fig. 2 metadata resource is dynamically found;
Fig. 3 metadata dynamically converges;
Fig. 4 metadata unification is expressed and conversion;
The management of Fig. 5 metadata store;
Fig. 6 user search part.
Embodiment
With the landsat data instance embodiments of the present invention once are described below.
At first the particular source that needs are converged is put the address (the data network station address such as landsat is http://glovis.usgs.gov/) of its data website and the data product type (such as LANDSAT-7 SLC_off) that need converge in order.
Next server is according to differences such as landsat data product type, time, numbers of days, generate effectively data product id tabulation, the data product id of landsat is that a kind of form is: the character string of LXSPPPRRRYYYYDDDGSIVV type, and wherein the implication of each is:
L-represents the Landsat data.
X-represents product type (M represents MSS, and T represents TM, and E represents ETM+).
S-represents satellite (1,2,3,4,5,7).
PPP-represents the WRS passage.The scope in the whole world is 001-251, and the scope of China is 114-151.
RRR-represents the WRS row.Global range is 001-248, and the scope of China is 011-051.
YYYY-represents the time of data product.
DDD-represents data product number of days (001-366).
GSI-represents land station's sign (such as the North America website is AAA, and BeiJing, China's website is BJC etc.).
VV-represents version (two digits).
Server carries out validation verification to each id in the data product id tabulation that generates afterwards, and (interface such as landsat is checking interface that can be by being submitted to the data website
Http:// edcsns17.cr.usgs.gov/EarthExplorer/order/bulkDownload.ph p) whether there is the validity of verifying id according to meta data file; Perhaps whether the mode by direct accesses meta-data file exists the validity of verifying id according to meta data file on following splicing url basis.
To active data product id, does splicing generate the url address of target (metadata and corresponding picture thereof) file (such as the address of the meta data file of data product id=LE71370312010294SGS00: http://edcsns17.cr.usgs.gov/cgi-bin/EarthExplorer/fgdc.cgi on the basis of data product id and meta data file url fixed part?
dataset_name=LANDSAT_ETM&entity_id=LE71370312010294SGS00&format=HTM)。For the effective meta data file url that generates (containing corresponding picture file url), combination generates file in download tabulation and downloading task, and (the url quantity that a certain categorical data product need be downloaded in downloading task corresponding download tabulation accounts for this downloading task need download the ratio that effective id quantity that the ratio of url total quantity is equal to the type data product accounts for total effective id quantity in the current data source; And url arranges according to data product type, time sequencing successively), and start it.
The dynamic assemblying part of server metadata is after detecting the landsat downloading task that can start, resource situation according to the data server of the url quantity of download list and data source starts corresponding download thread, and give download thread with download list dynamic assignment (allocation rule: take the mode of completely random to distribute) according to the situation of thread and the fairness doctrine, each thread connects meta data server and obtains document flow according to the tab sequential of being distributed according to configuration file afterwards, store downloading content, picture file is carried out that size is dwindled and to recovering unusually and correction etc. in the downloading process.
The meta data file that server is finished download (containing corresponding picture file), at first need through quality check, comprise whether file can be opened, file size with download before whether identical, one by one whether meta data file, picture file, the picture file that dwindles correspondence etc.Meta data file to the quality passed examination can carry out the extraction of metadata item (such as the data generation time, the latitude and longitude value on four summits of solid data data area that file comprises, cloud amount etc.), and determine that by calculating judgment mode (calculate the principle of judging is the latitude and longitude value on four summits: the latitude value maximum of north point, the latitude value minimum of south point, the longitude maximum of east point, the longitude minimum of Western-style pastry), (handle more convenient to positive number metadata item being carried out necessary conversion afterwards such as considering in the process of numeral being set up index, unified with all fixing positive numbers of latitude and longitude value increase, for example 180, thus convert all longitudes and latitudes to number more than or equal to 0; Make peace in order to express one that to set up index efficient, convert the unifications such as product type of data to numerical coding) the basis on four summit latitude and longitude value etc. are set up Numerical Index (each data all has a unique data id value), simultaneously other content metadata (path that comprises meta data file) is inserted in the metadata item database of server of the present invention (each data all has a unique data id value).In this process meta data file, metadata item database data, Numerical Index item number according between by the data id value in file path, metadata item data of database id value, the Numerical Index item (with the data id value in the metadata item database be same value, corresponding consistent) set up mapping relations one to one, thus form the data retrieval interface that can unify to visit on this basis.
After this, the Retrieval Interface that the user can partly provide at user search, by direct input latitude and longitude value or submit the particular space characteristics of objects to (server is set up the corresponding relation between these spatial object features and the longitude and latitude in advance, when the user submits the spatial object feature to, server is a latitude and longitude value with these Feature Conversion at first) etc. as querying condition, server is with user's input or after the latitude and longitude value after the conversion increases fixed numeric values (such as 180) querying condition is carried out correlation calculations, and to the distance of Query Result according to regional center point and query region central point, time, condition series arrangement such as cloud amount, show the spatial data situation of specific region with this, such as this regional data list that exists in the landsat data product, data generation time, cloud amount or the like.If the user needs these data, the predetermined application of data can be proposed to the data source (such as landsat) of correspondence.
On this basis, functions such as converging automatically of distribution, multi-source, isomery space science data metadata have been realized.

Claims (9)

1. the automatic assemblage method of space science data metadata towards global change research due the steps include:
1) select data source, server regularly conduct interviews to the selected data source, generate download list and downloading task; Its method is: at first according to the data product type of data source, generate the effectively id tabulation of this data source meta data file; Whether the meta data file that each id indicated in the effective id tabulation of checking possibility then exists, thereby obtains current effective meta data file id tabulation; Last basis effectively id generates complete meta data file url, and generates described download list and downloading task according to meta data file url;
2) server is according to current download list and downloading task, and the visit data source data server is carried out the data download and it is saved in server;
3) server carries out quality check to the downloaded metadata file, and the meta data file of passed examination is carried out the metadata item extraction;
4) metadata item and meta data file path are saved in the metadata item database and set up index;
5) server is by the id value of meta data file path, metadata item data of database id value, index entry, sets up between meta data file, metadata item database data, the index data mapping relations one to one.
2. the method for claim 1 is characterized in that the described metadata item that extracts comprises: latitude and longitude value, the cloud amount on data generation time, four summits of solid data data area that file comprises.
3. method as claimed in claim 2, the method for building up that it is characterized in that described index is: at first the latitude and longitude value on four summits of solid data data area that file comprises of extracting is calculated the longitude of the longitude of the latitude value of the latitude value of definite north point, south point, east point, Western-style pastry; Four summit latitude and longitude value to determining then are converted to the number more than or equal to 0; Set up described index according to the summit latitude and longitude value after the conversion at last.
4. method as claimed in claim 3 is characterized in that described index is the Numerical Index of metadata item.
5. method as claimed in claim 3 is characterized in that described four summit latitude and longitude value are added 180 respectively, thereby described four summit latitude and longitude value are converted to number more than or equal to 0.
6. the method for claim 1, it is characterized in that the effective id of described basis generates complete meta data file url, and according to the method that meta data file url generates described download list and downloading task be: each downloading task correspondence lists table once, when set downloading the url total quantity and be N, the url quantity that a certain categorical data product need be downloaded in each download list accounts for this downloading task and need download effective id quantity that the ratio of url total quantity is equal to the type data product and account for total effectively ratio of id quantity in the current data source; And url arranges according to data product type, time sequencing successively; Wherein, N is less than or equal to total effectively id number.
7. method as claimed in claim 6 is characterized in that server distributes to download thread with the url in the described download list according to the mode of Random assignment.
8. as claim 1 or 2 or 3 or 4 or 5 or 6 or 7 described methods, it is characterized in that described quality check comprises: whether meta data file can be opened, file size with download before whether identical, whether meta data file, picture file, the picture file that dwindles corresponding one by one.
9. method as claimed in claim 8 is characterized in that described server provides a Retrieval Interface.
CN 201110220537 2011-08-02 2011-08-02 Method for automatically gathering metadata of space science data facing global change research Active CN102289483B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110220537 CN102289483B (en) 2011-08-02 2011-08-02 Method for automatically gathering metadata of space science data facing global change research

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110220537 CN102289483B (en) 2011-08-02 2011-08-02 Method for automatically gathering metadata of space science data facing global change research

Publications (2)

Publication Number Publication Date
CN102289483A true CN102289483A (en) 2011-12-21
CN102289483B CN102289483B (en) 2012-12-19

Family

ID=45335910

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110220537 Active CN102289483B (en) 2011-08-02 2011-08-02 Method for automatically gathering metadata of space science data facing global change research

Country Status (1)

Country Link
CN (1) CN102289483B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104361437A (en) * 2014-10-31 2015-02-18 北京思特奇信息技术股份有限公司 Quality inspection and management method of diversified data interfaces and quality inspection and management system of diversified data interfaces
CN107315767A (en) * 2017-05-17 2017-11-03 中国科学院计算机网络信息中心 A kind of convergence method for reconstructing of flux data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5970490A (en) * 1996-11-05 1999-10-19 Xerox Corporation Integration platform for heterogeneous databases
CN101159603A (en) * 2007-10-30 2008-04-09 中兴通讯股份有限公司 Wireless network mass data storing method
CN101329682A (en) * 2008-07-22 2008-12-24 华北电力大学 Method for integrating distribution type isomerization information resource

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5970490A (en) * 1996-11-05 1999-10-19 Xerox Corporation Integration platform for heterogeneous databases
CN101159603A (en) * 2007-10-30 2008-04-09 中兴通讯股份有限公司 Wireless network mass data storing method
CN101329682A (en) * 2008-07-22 2008-12-24 华北电力大学 Method for integrating distribution type isomerization information resource

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104361437A (en) * 2014-10-31 2015-02-18 北京思特奇信息技术股份有限公司 Quality inspection and management method of diversified data interfaces and quality inspection and management system of diversified data interfaces
CN107315767A (en) * 2017-05-17 2017-11-03 中国科学院计算机网络信息中心 A kind of convergence method for reconstructing of flux data
CN107315767B (en) * 2017-05-17 2020-08-04 中国科学院计算机网络信息中心 Convergent reconstruction method of flux data

Also Published As

Publication number Publication date
CN102289483B (en) 2012-12-19

Similar Documents

Publication Publication Date Title
Le Bagousse-Pinguet et al. Phylogenetic, functional, and taxonomic richness have both positive and negative effects on ecosystem multifunctionality
Nativi et al. Big data challenges in building the global earth observation system of systems
Rutledge et al. NOMADS: A climate and weather model archive at the National Oceanic and Atmospheric Administration
CN104820714B (en) Magnanimity tile small documents memory management method based on hadoop
CN102254030B (en) Global change research-oriented automatic space science data gathering method
Lunga et al. Apache spark accelerated deep learning inference for large scale satellite image analytics
US8862566B2 (en) Systems and methods for intelligent parallel searching
CN107193847B (en) Method and device for inquiring satellite real-time orbit information
Hasani et al. Lambda architecture for real time big data analytic
CN102917009B (en) A kind of stock certificate data collection based on cloud computing technology and storage means and system
CN103530168B (en) Multi-satellite remote sensing data processing system and method based on virtualization technology
CN109189723A (en) A kind of distributed satellites data center multi- source Remote Sensing Data data processing method
CN103049496A (en) Method, apparatus and device for dividing multiple users into user groups
CN103198097A (en) Massive geoscientific data parallel processing method based on distributed file system
Silva et al. Integrating big data into the computing curricula
Wu et al. A new data-grouping-aware dynamic data placement method that take into account jobs execute frequency for Hadoop
CN102289483B (en) Method for automatically gathering metadata of space science data facing global change research
Saymote Develop a village information system (VIS) application using visual basic (VB) programming
Manjunath et al. A Big Data MapReduce Hadoop distribution architecture for processing input splits to solve the small data problem
Cook et al. Implementation of data citations and persistent identifiers at the ORNL DAAC
Li et al. Research on the inheritance and protection of data mining technology in national sports
CN103092574B (en) A kind of based on recurrence autonomous type complex task decomposing system and method
Wilson et al. SciSpark: Highly interactive in-memory science data analytics
CN101667192B (en) Integration method of multi-satellite heterogeneous remote sensing data based on SOA architecture
CN115048456A (en) User label generation method and device, computer equipment and readable storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant