CN104239477A - Method and device for analyzing time series data - Google Patents

Method and device for analyzing time series data Download PDF

Info

Publication number
CN104239477A
CN104239477A CN201410447046.8A CN201410447046A CN104239477A CN 104239477 A CN104239477 A CN 104239477A CN 201410447046 A CN201410447046 A CN 201410447046A CN 104239477 A CN104239477 A CN 104239477A
Authority
CN
China
Prior art keywords
time series
series data
characteristic information
data
extraction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410447046.8A
Other languages
Chinese (zh)
Inventor
陈军
梁玫娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING YOUTEJIE INFORMATION TECHNOLOGY Co Ltd
Original Assignee
BEIJING YOUTEJIE INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING YOUTEJIE INFORMATION TECHNOLOGY Co Ltd filed Critical BEIJING YOUTEJIE INFORMATION TECHNOLOGY Co Ltd
Priority to CN201410447046.8A priority Critical patent/CN104239477A/en
Publication of CN104239477A publication Critical patent/CN104239477A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Abstract

The invention discloses a method and a device for analyzing time series data. The method is utilized to accurately and directly display the time series data. The method comprises the following steps: extracting characteristic information of the time series data; searching for time series data with same characteristic information; analyzing if the time series data with same characteristic information are generated from a same source code; if yes, displaying the time series data with same characteristic information in a clustering form.

Description

A kind of data time series analysis method and device
Technical field
The present invention relates to Internet technical field, particularly a kind of data time series analysis method and device.
Background technology
In today of infotech develop rapidly, people produce a large amount of numerical information in various society and economic activity, corporate information technology infrastructure construction scale constantly expands, IT monitoring, operational system are also found broad application, the data of various sensor, intelligent appliance generation simultaneously, and the daily record enormous amount that various transaction system (securities exchange system, electronic commerce transaction system) produces, form is also not quite similar, and is difficult to be utilized.
How a large amount of log informations is checked also is a great problem.Along with the growth of daily record capacity and type, daily record data, beyond the cognitive ability of the mankind, cannot catch up with only according to manpower the speed that machine produces data.Log content to be analyzed and to follow the trail of potential problem more and more difficult, especially after many daily records correlation analysis occurs, need veteran operating personnel to follow the tracks of event chain, filtering noise, the basic reason occurred and last diagnostic is gone wrong.
Existing techniques in realizing is to the key search of log content and result presentation, and the key word that system inputs according to user is retrieved, Search Results is presented to user by the sequencing then according to timestamp.
The current retrieval to daily record and analytical technology represent order usually using timestamp as daily record, after input keyword, in the Search Results that user sees, the correlation log of a certain type is not concentrated and is shown, dissimilar daily record is mixed in together, this needs user oneself grasp searching skill to carry out filter operation, considerably increases the time of checking difficulty He spending.
Summary of the invention
The invention provides a kind of data time series analysis method and device, in order to accurately, intuitively to show time series data.
The invention provides a kind of data time series analysis method, comprising:
Extraction time sequence data characteristic information;
Search the time series data that described characteristic information is identical;
Analyze the identical time series data of described characteristic information whether to be produced by same source code;
The time series data identical when described characteristic information is produced by same source code, and the time series data identical to described characteristic information carries out cluster displaying.
Optionally, described extraction time sequence data characteristic information, comprising:
Regular expression according to presetting mates described time series data;
When the match is successful, determine that described default regular expression is the characteristic information of described time series data.
Optionally, described extraction time sequence data characteristic information, comprising:
Extract the non-letter in described time series data, non-numeric special character in order;
Determine that described special character is the characteristic information of described time series data.
Optionally, described extraction time sequence data characteristic information, comprising:
Obtain the text feature template of described time series data;
Determine that described text feature template is the characteristic information of described time series data.
Optionally, the described time series data identical to described characteristic information carries out cluster displaying, comprising:
Time series data identical for described characteristic information is concentrated and shows.
The invention provides a kind of data time series analysis device, comprising:
Extraction module, for the characteristic information of extraction time sequence data;
Search module, for searching the identical time series data of described characteristic information;
Whether analysis module, produced by same source code for analyzing the identical time series data of described characteristic information;
Cluster display module, produced by same source code for the time series data identical when described characteristic information, the time series data identical to described characteristic information carries out cluster displaying.
Optionally, described extraction module, comprising:
Matched sub-block, for mating described time series data according to the regular expression preset;
Determine submodule, for when the match is successful, determine that described default regular expression is the characteristic information of described time series data.
Optionally, described extraction module, comprising:
Extract submodule, for extracting the non-letter in described time series data, non-numeric special character in order;
Determine submodule, for determining that described special character is the characteristic information of described time series data;
Optionally, described extraction module, comprising:
Obtain submodule, for obtaining the text feature template of described time series data;
Determine submodule, for determining that described text feature template is the characteristic information of described time series data.
Optionally, described cluster display module, shows for time series data identical for described characteristic information being concentrated.
In the present embodiment, show together by the time series data produced by same source code is aggregated to, make it possible to accurately, intuitively show time series data.Carry out complicated search or filter operation without the need to user, also do not need to be grasped the literary style of regular expression and the utilization of other statements, only need direct uploading data content and input keyword to carry out inquiring about.Search Results can be carried out cluster by system automatically, and represents together, facilitates user to check and analyzes.
Other features and advantages of the present invention will be set forth in the following description, and, partly become apparent from instructions, or understand by implementing the present invention.Object of the present invention and other advantages realize by structure specifically noted in write instructions, claims and accompanying drawing and obtain.
Below by drawings and Examples, technical scheme of the present invention is described in further detail.
Accompanying drawing explanation
Accompanying drawing is used to provide a further understanding of the present invention, and forms a part for instructions, together with embodiments of the present invention for explaining the present invention, is not construed as limiting the invention.In the accompanying drawings:
Fig. 1 is the process flow diagram of embodiment of the present invention data time series analysis method;
Fig. 2 is the process flow diagram of another embodiment of the present invention data time series analysis method;
Fig. 3 is the process flow diagram of another embodiment of the present invention data time series analysis method;
Fig. 4 is the process flow diagram of another embodiment of the present invention data time series analysis method;
Fig. 5 is the block diagram of another embodiment of the present invention data time series analysis device;
Fig. 6 is the block diagram of embodiment of the present invention extraction module;
Fig. 7 is the block diagram of embodiment of the present invention extraction module;
Fig. 8 is the block diagram of embodiment of the present invention extraction module.
Embodiment
Below in conjunction with accompanying drawing, the preferred embodiments of the present invention are described, should be appreciated that preferred embodiment described herein is only for instruction and explanation of the present invention, is not intended to limit the present invention.
In the embodiment of the present invention, mainly time series data is analyzed.Time series data is the data collected in different time points, and this kind of data reflect state or the degree over time such as a certain things, phenomenon.Such as, the change of China's gross domestic product (GDP) from 1949 to 2009 is exactly time series data.Time series data in the embodiment of the present invention not only comprises daily record, also comprises all data with timestamp that various sensor, intelligent appliance and various transaction system (electric business, bank, internet finance) etc. produce.
Fig. 1 is the process flow diagram of embodiment of the present invention data time series analysis method, and as shown in Figure 1, this time series data search method, comprising:
Step S11, extraction time sequence data characteristic information;
Step S12, searches the time series data that characteristic information is identical;
Step S13, whether the time series data that analytical characteristic information is identical is produced by same source code;
Step S14, the time series data identical when characteristic information is produced by same source code, and the time series data identical to characteristic information carries out cluster displaying.
Such as, certain time series data is:
01/Aug/2014:12:07:39[Error]:status?code?is?1。
By analyzing, the characteristic information that can obtain this time series data is:
“[Error]:status?code?is%d”。
Can find the other times sequence data identical with the characteristic information of this time series data is:
02/Aug/2014:12:08:40[Error]:status?code?is?5;
03/Aug/2014:12:09:59[Error]:status?code?is?10;
......
By analyzing, can obtain the source code producing above-mentioned many time series datas is:
logging("[Error]:status?code?is%d",code)。
Therefore, above-mentioned many time series datas are classified as a class, can displaying be concentrated in Search Results, facilitate user to check.
In the present embodiment, show together by the time series data produced by same source code is aggregated to, make it possible to accurately, intuitively show time series data.Carry out complicated search or filter operation without the need to user, also do not need to be grasped the literary style of regular expression and the utilization of other statements, only need direct uploading data content and input keyword to carry out inquiring about.Search Results can be carried out cluster by system automatically, and represents together, facilitates user to check and analyzes.
Fig. 2 is the process flow diagram of another embodiment of the present invention data time series analysis method, and as shown in Figure 2, optionally, above-mentioned steps S11 comprises:
Step S21, the regular expression according to presetting mates time series data;
Step S22, when the match is successful, determines that the regular expression preset is the characteristic information of time series data.
Fig. 3 is the process flow diagram of another embodiment of the present invention data time series analysis method, and as shown in Figure 3, optionally, above-mentioned steps S11 comprises:
Step S31, the non-letter in order in extraction time sequence data, non-numeric special character;
Step S32, determines that special character is the characteristic information of time series data.
Such as, the non-letter in this daily record, non-numeric symbol (comprising order and the number of appearance) is extracted in order, if space, punctuation mark, bracket, middle line, underscore etc. are as the characteristic information of time series data.
Fig. 4 is the process flow diagram of another embodiment of the present invention data time series analysis method, and as shown in Figure 4, optionally, above-mentioned steps S11 comprises:
Step S41, the text feature template of acquisition time sequence data;
Step S42, determines that text feature template is the characteristic information of time series data.
Such as, by the approach such as data mining, machine learning, extract the text feature masterplate of certain time series data type, using the characteristic information of text feature template as time series data.
In above-mentioned possibility, the characteristic information of time series data can be obtained by least one in above-mentioned three kinds of methods, by the characteristic information of extraction time sequence data, to find the time series data that characteristic information is identical, can determine that the time series data that characteristic information is identical is that same source code produces.Like this, the time series data that same source code produces can be analyzed more exactly, after the time series data produced same source code carries out cluster, show intuitively, facilitate user to check and analyze.
Optionally, above-mentioned steps S14 comprises:
Time series data identical for described characteristic information is concentrated and shows.
In possibility, by time series data identical for characteristic information, i.e. the time series data of same source code generation, concentrates on a region and shows, facilitate user to check and analyze.
Fig. 5 is the block diagram of embodiment of the present invention data time series analysis device, and as shown in Figure 5, this time series data indexing unit, comprising:
Extraction module 51, for the characteristic information of extraction time sequence data;
Search module 52, for searching the identical time series data of described characteristic information;
Whether analysis module 53, produced by same source code for analyzing the identical time series data of described characteristic information;
Cluster display module 54, produced by same source code for the time series data identical when described characteristic information, the time series data identical to described characteristic information carries out cluster displaying.
Fig. 6 is the block diagram of embodiment of the present invention extraction module, and as shown in Figure 6, optionally, described extraction module 51, comprising:
Matched sub-block 61, for mating described time series data according to the regular expression preset;
Determine submodule 62, for when the match is successful, determine that described default regular expression is the characteristic information of described time series data.
Fig. 7 is the block diagram of embodiment of the present invention extraction module, and as shown in Figure 7, optionally, described extraction module, comprising:
Extract submodule 71, for extracting the non-letter in described time series data, non-numeric special character in order;
Determine submodule 72, for determining that described special character is the characteristic information of described time series data;
Fig. 8 is the block diagram of embodiment of the present invention extraction module, and as shown in Figure 8, optionally, described extraction module, comprising:
Obtain submodule 81, for obtaining the text feature template of described time series data;
Determine submodule 82, for determining that described text feature template is the characteristic information of described time series data.
Optionally, described cluster display module 54, shows for time series data identical for described characteristic information being concentrated.
About the device in above-described embodiment, wherein the concrete mode of modules executable operations has been described in detail in about the embodiment of the method, will not elaborate explanation herein.
In the present embodiment, show together by the time series data produced by same source code is aggregated to, make it possible to accurately, intuitively show time series data.Carry out complicated search or filter operation without the need to user, also do not need to be grasped the literary style of regular expression and the utilization of other statements, only need direct uploading data content and input keyword to carry out inquiring about.Search Results can be carried out cluster by system automatically, and represents together, facilitates user to check and analyzes.
Those skilled in the art should understand, embodiments of the invention can be provided as method, system or computer program.Therefore, the present invention can adopt the form of complete hardware embodiment, completely software implementation or the embodiment in conjunction with software and hardware aspect.And the present invention can adopt in one or more form wherein including the upper computer program implemented of computer-usable storage medium (including but not limited to magnetic disk memory and optical memory etc.) of computer usable program code.
The present invention describes with reference to according to the process flow diagram of the method for the embodiment of the present invention, equipment (system) and computer program and/or block scheme.Should understand can by the combination of the flow process in each flow process in computer program instructions realization flow figure and/or block scheme and/or square frame and process flow diagram and/or block scheme and/or square frame.These computer program instructions can being provided to the processor of multi-purpose computer, special purpose computer, Embedded Processor or other programmable data processing device to produce a machine, making the instruction performed by the processor of computing machine or other programmable data processing device produce device for realizing the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.
These computer program instructions also can be stored in can in the computer-readable memory that works in a specific way of vectoring computer or other programmable data processing device, the instruction making to be stored in this computer-readable memory produces the manufacture comprising command device, and this command device realizes the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.
These computer program instructions also can be loaded in computing machine or other programmable data processing device, make on computing machine or other programmable devices, to perform sequence of operations step to produce computer implemented process, thus the instruction performed on computing machine or other programmable devices is provided for the step realizing the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.
Obviously, those skilled in the art can carry out various change and modification to the present invention and not depart from the spirit and scope of the present invention.Like this, if these amendments of the present invention and modification belong within the scope of the claims in the present invention and equivalent technologies thereof, then the present invention is also intended to comprise these change and modification.

Claims (10)

1. a data time series analysis method, is characterized in that, comprising:
Extraction time sequence data characteristic information;
Search the time series data that described characteristic information is identical;
Analyze the identical time series data of described characteristic information whether to be produced by same source code;
The time series data identical when described characteristic information is produced by same source code, and the time series data identical to described characteristic information carries out cluster displaying.
2. the method for claim 1, is characterized in that, described extraction time sequence data characteristic information, comprising:
Regular expression according to presetting mates described time series data;
When the match is successful, determine that described default regular expression is the characteristic information of described time series data.
3. the method for claim 1, is characterized in that, described extraction time sequence data characteristic information, comprising:
Extract the non-letter in described time series data, non-numeric special character in order;
Determine that described special character is the characteristic information of described time series data.
4. the method for claim 1, is characterized in that, described extraction time sequence data characteristic information, comprising:
Obtain the text feature template of described time series data;
Determine that described text feature template is the characteristic information of described time series data.
5. the method for claim 1, is characterized in that, the described time series data identical to described characteristic information carries out cluster displaying, comprising:
Time series data identical for described characteristic information is concentrated and shows.
6. a data time series analysis device, is characterized in that, comprising:
Extraction module, for the characteristic information of extraction time sequence data;
Search module, for searching the identical time series data of described characteristic information;
Whether analysis module, produced by same source code for analyzing the identical time series data of described characteristic information;
Cluster display module, produced by same source code for the time series data identical when described characteristic information, the time series data identical to described characteristic information carries out cluster displaying.
7. device as claimed in claim 6, it is characterized in that, described extraction module, comprising:
Matched sub-block, for mating described time series data according to the regular expression preset;
Determine submodule, for when the match is successful, determine that described default regular expression is the characteristic information of described time series data.
8. device as claimed in claim 6, it is characterized in that, described extraction module, comprising:
Extract submodule, for extracting the non-letter in described time series data, non-numeric special character in order;
Determine submodule, for determining that described special character is the characteristic information of described time series data.
9. device as claimed in claim 6, it is characterized in that, described extraction module, comprising:
Obtain submodule, for obtaining the text feature template of described time series data;
Determine submodule, for determining that described text feature template is the characteristic information of described time series data.
10. device as claimed in claim 6, is characterized in that, described cluster display module, shows for time series data identical for described characteristic information being concentrated.
CN201410447046.8A 2014-09-03 2014-09-03 Method and device for analyzing time series data Pending CN104239477A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410447046.8A CN104239477A (en) 2014-09-03 2014-09-03 Method and device for analyzing time series data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410447046.8A CN104239477A (en) 2014-09-03 2014-09-03 Method and device for analyzing time series data

Publications (1)

Publication Number Publication Date
CN104239477A true CN104239477A (en) 2014-12-24

Family

ID=52227536

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410447046.8A Pending CN104239477A (en) 2014-09-03 2014-09-03 Method and device for analyzing time series data

Country Status (1)

Country Link
CN (1) CN104239477A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110275489A (en) * 2018-03-13 2019-09-24 发那科株式会社 Data time series analysis device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040103116A1 (en) * 2002-11-26 2004-05-27 Lingathurai Palanisamy Intelligent retrieval and classification of information from a product manual
CN101641674A (en) * 2006-10-05 2010-02-03 斯普兰克公司 Time series search engine

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040103116A1 (en) * 2002-11-26 2004-05-27 Lingathurai Palanisamy Intelligent retrieval and classification of information from a product manual
CN101641674A (en) * 2006-10-05 2010-02-03 斯普兰克公司 Time series search engine

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
梁晓雪 等: "基于聚类的日志分析技术综述与展望", 《云南大学学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110275489A (en) * 2018-03-13 2019-09-24 发那科株式会社 Data time series analysis device
CN110275489B (en) * 2018-03-13 2021-04-20 发那科株式会社 Time series data analysis device

Similar Documents

Publication Publication Date Title
US11907244B2 (en) Modifying field definitions to include post-processing instructions
Brehmer et al. Overview: The design, adoption, and analysis of a visual document mining tool for investigative journalists
CN108829858B (en) Data query method and device and computer readable storage medium
US11941016B2 (en) Using specified performance attributes to configure machine learning pipepline stages for an ETL job
US11947550B1 (en) Fast ad-hoc filtering of time series analytics
CN110532019B (en) Method for tracing history of software code segment
CN105187242B (en) A kind of user's anomaly detection method excavated based on variable-length pattern
CN113688288B (en) Data association analysis method, device, computer equipment and storage medium
CN110543571A (en) knowledge graph construction method and device for water conservancy informatization
CN108304382B (en) Quality analysis method and system based on text data mining in manufacturing process
CN102521316A (en) Pattern matching framework for log analysis
CN105354325A (en) Document retrieval and analysis system
JP7375861B2 (en) Related score calculation systems, methods and programs
CN105550375A (en) Heterogeneous data integrating method and system
US20150269138A1 (en) Publication Scope Visualization and Analysis
CN103500158A (en) Method and device for annotating electronic document
CN105302730A (en) Calculation model detection method, testing server and service platform
CN110990445A (en) Data processing method, device, equipment and medium
CN103324407B (en) Information processing unit and information processing method
CN112634004B (en) Method and system for analyzing blood-cause atlas of credit investigation data
Shrivastava et al. Implementation of Apriori algorithm using WEKA
CN104240107B (en) Community data screening system and method thereof
CN103092987A (en) Fast document retrieval method and device
CN116484084B (en) Metadata blood-margin analysis method, medium and system based on application information mining
KR102345410B1 (en) Big data intelligent collecting method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20141224