CN104239477A - Method and device for analyzing time series data - Google Patents
Method and device for analyzing time series data Download PDFInfo
- Publication number
- CN104239477A CN104239477A CN201410447046.8A CN201410447046A CN104239477A CN 104239477 A CN104239477 A CN 104239477A CN 201410447046 A CN201410447046 A CN 201410447046A CN 104239477 A CN104239477 A CN 104239477A
- Authority
- CN
- China
- Prior art keywords
- time series
- series data
- characteristic information
- data
- extraction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Abstract
The invention discloses a method and a device for analyzing time series data. The method is utilized to accurately and directly display the time series data. The method comprises the following steps: extracting characteristic information of the time series data; searching for time series data with same characteristic information; analyzing if the time series data with same characteristic information are generated from a same source code; if yes, displaying the time series data with same characteristic information in a clustering form.
Description
Technical field
The present invention relates to Internet technical field, particularly a kind of data time series analysis method and device.
Background technology
In today of infotech develop rapidly, people produce a large amount of numerical information in various society and economic activity, corporate information technology infrastructure construction scale constantly expands, IT monitoring, operational system are also found broad application, the data of various sensor, intelligent appliance generation simultaneously, and the daily record enormous amount that various transaction system (securities exchange system, electronic commerce transaction system) produces, form is also not quite similar, and is difficult to be utilized.
How a large amount of log informations is checked also is a great problem.Along with the growth of daily record capacity and type, daily record data, beyond the cognitive ability of the mankind, cannot catch up with only according to manpower the speed that machine produces data.Log content to be analyzed and to follow the trail of potential problem more and more difficult, especially after many daily records correlation analysis occurs, need veteran operating personnel to follow the tracks of event chain, filtering noise, the basic reason occurred and last diagnostic is gone wrong.
Existing techniques in realizing is to the key search of log content and result presentation, and the key word that system inputs according to user is retrieved, Search Results is presented to user by the sequencing then according to timestamp.
The current retrieval to daily record and analytical technology represent order usually using timestamp as daily record, after input keyword, in the Search Results that user sees, the correlation log of a certain type is not concentrated and is shown, dissimilar daily record is mixed in together, this needs user oneself grasp searching skill to carry out filter operation, considerably increases the time of checking difficulty He spending.
Summary of the invention
The invention provides a kind of data time series analysis method and device, in order to accurately, intuitively to show time series data.
The invention provides a kind of data time series analysis method, comprising:
Extraction time sequence data characteristic information;
Search the time series data that described characteristic information is identical;
Analyze the identical time series data of described characteristic information whether to be produced by same source code;
The time series data identical when described characteristic information is produced by same source code, and the time series data identical to described characteristic information carries out cluster displaying.
Optionally, described extraction time sequence data characteristic information, comprising:
Regular expression according to presetting mates described time series data;
When the match is successful, determine that described default regular expression is the characteristic information of described time series data.
Optionally, described extraction time sequence data characteristic information, comprising:
Extract the non-letter in described time series data, non-numeric special character in order;
Determine that described special character is the characteristic information of described time series data.
Optionally, described extraction time sequence data characteristic information, comprising:
Obtain the text feature template of described time series data;
Determine that described text feature template is the characteristic information of described time series data.
Optionally, the described time series data identical to described characteristic information carries out cluster displaying, comprising:
Time series data identical for described characteristic information is concentrated and shows.
The invention provides a kind of data time series analysis device, comprising:
Extraction module, for the characteristic information of extraction time sequence data;
Search module, for searching the identical time series data of described characteristic information;
Whether analysis module, produced by same source code for analyzing the identical time series data of described characteristic information;
Cluster display module, produced by same source code for the time series data identical when described characteristic information, the time series data identical to described characteristic information carries out cluster displaying.
Optionally, described extraction module, comprising:
Matched sub-block, for mating described time series data according to the regular expression preset;
Determine submodule, for when the match is successful, determine that described default regular expression is the characteristic information of described time series data.
Optionally, described extraction module, comprising:
Extract submodule, for extracting the non-letter in described time series data, non-numeric special character in order;
Determine submodule, for determining that described special character is the characteristic information of described time series data;
Optionally, described extraction module, comprising:
Obtain submodule, for obtaining the text feature template of described time series data;
Determine submodule, for determining that described text feature template is the characteristic information of described time series data.
Optionally, described cluster display module, shows for time series data identical for described characteristic information being concentrated.
In the present embodiment, show together by the time series data produced by same source code is aggregated to, make it possible to accurately, intuitively show time series data.Carry out complicated search or filter operation without the need to user, also do not need to be grasped the literary style of regular expression and the utilization of other statements, only need direct uploading data content and input keyword to carry out inquiring about.Search Results can be carried out cluster by system automatically, and represents together, facilitates user to check and analyzes.
Other features and advantages of the present invention will be set forth in the following description, and, partly become apparent from instructions, or understand by implementing the present invention.Object of the present invention and other advantages realize by structure specifically noted in write instructions, claims and accompanying drawing and obtain.
Below by drawings and Examples, technical scheme of the present invention is described in further detail.
Accompanying drawing explanation
Accompanying drawing is used to provide a further understanding of the present invention, and forms a part for instructions, together with embodiments of the present invention for explaining the present invention, is not construed as limiting the invention.In the accompanying drawings:
Fig. 1 is the process flow diagram of embodiment of the present invention data time series analysis method;
Fig. 2 is the process flow diagram of another embodiment of the present invention data time series analysis method;
Fig. 3 is the process flow diagram of another embodiment of the present invention data time series analysis method;
Fig. 4 is the process flow diagram of another embodiment of the present invention data time series analysis method;
Fig. 5 is the block diagram of another embodiment of the present invention data time series analysis device;
Fig. 6 is the block diagram of embodiment of the present invention extraction module;
Fig. 7 is the block diagram of embodiment of the present invention extraction module;
Fig. 8 is the block diagram of embodiment of the present invention extraction module.
Embodiment
Below in conjunction with accompanying drawing, the preferred embodiments of the present invention are described, should be appreciated that preferred embodiment described herein is only for instruction and explanation of the present invention, is not intended to limit the present invention.
In the embodiment of the present invention, mainly time series data is analyzed.Time series data is the data collected in different time points, and this kind of data reflect state or the degree over time such as a certain things, phenomenon.Such as, the change of China's gross domestic product (GDP) from 1949 to 2009 is exactly time series data.Time series data in the embodiment of the present invention not only comprises daily record, also comprises all data with timestamp that various sensor, intelligent appliance and various transaction system (electric business, bank, internet finance) etc. produce.
Fig. 1 is the process flow diagram of embodiment of the present invention data time series analysis method, and as shown in Figure 1, this time series data search method, comprising:
Step S11, extraction time sequence data characteristic information;
Step S12, searches the time series data that characteristic information is identical;
Step S13, whether the time series data that analytical characteristic information is identical is produced by same source code;
Step S14, the time series data identical when characteristic information is produced by same source code, and the time series data identical to characteristic information carries out cluster displaying.
Such as, certain time series data is:
01/Aug/2014:12:07:39[Error]:status?code?is?1。
By analyzing, the characteristic information that can obtain this time series data is:
“[Error]:status?code?is%d”。
Can find the other times sequence data identical with the characteristic information of this time series data is:
02/Aug/2014:12:08:40[Error]:status?code?is?5;
03/Aug/2014:12:09:59[Error]:status?code?is?10;
......
By analyzing, can obtain the source code producing above-mentioned many time series datas is:
logging("[Error]:status?code?is%d",code)。
Therefore, above-mentioned many time series datas are classified as a class, can displaying be concentrated in Search Results, facilitate user to check.
In the present embodiment, show together by the time series data produced by same source code is aggregated to, make it possible to accurately, intuitively show time series data.Carry out complicated search or filter operation without the need to user, also do not need to be grasped the literary style of regular expression and the utilization of other statements, only need direct uploading data content and input keyword to carry out inquiring about.Search Results can be carried out cluster by system automatically, and represents together, facilitates user to check and analyzes.
Fig. 2 is the process flow diagram of another embodiment of the present invention data time series analysis method, and as shown in Figure 2, optionally, above-mentioned steps S11 comprises:
Step S21, the regular expression according to presetting mates time series data;
Step S22, when the match is successful, determines that the regular expression preset is the characteristic information of time series data.
Fig. 3 is the process flow diagram of another embodiment of the present invention data time series analysis method, and as shown in Figure 3, optionally, above-mentioned steps S11 comprises:
Step S31, the non-letter in order in extraction time sequence data, non-numeric special character;
Step S32, determines that special character is the characteristic information of time series data.
Such as, the non-letter in this daily record, non-numeric symbol (comprising order and the number of appearance) is extracted in order, if space, punctuation mark, bracket, middle line, underscore etc. are as the characteristic information of time series data.
Fig. 4 is the process flow diagram of another embodiment of the present invention data time series analysis method, and as shown in Figure 4, optionally, above-mentioned steps S11 comprises:
Step S41, the text feature template of acquisition time sequence data;
Step S42, determines that text feature template is the characteristic information of time series data.
Such as, by the approach such as data mining, machine learning, extract the text feature masterplate of certain time series data type, using the characteristic information of text feature template as time series data.
In above-mentioned possibility, the characteristic information of time series data can be obtained by least one in above-mentioned three kinds of methods, by the characteristic information of extraction time sequence data, to find the time series data that characteristic information is identical, can determine that the time series data that characteristic information is identical is that same source code produces.Like this, the time series data that same source code produces can be analyzed more exactly, after the time series data produced same source code carries out cluster, show intuitively, facilitate user to check and analyze.
Optionally, above-mentioned steps S14 comprises:
Time series data identical for described characteristic information is concentrated and shows.
In possibility, by time series data identical for characteristic information, i.e. the time series data of same source code generation, concentrates on a region and shows, facilitate user to check and analyze.
Fig. 5 is the block diagram of embodiment of the present invention data time series analysis device, and as shown in Figure 5, this time series data indexing unit, comprising:
Extraction module 51, for the characteristic information of extraction time sequence data;
Search module 52, for searching the identical time series data of described characteristic information;
Whether analysis module 53, produced by same source code for analyzing the identical time series data of described characteristic information;
Cluster display module 54, produced by same source code for the time series data identical when described characteristic information, the time series data identical to described characteristic information carries out cluster displaying.
Fig. 6 is the block diagram of embodiment of the present invention extraction module, and as shown in Figure 6, optionally, described extraction module 51, comprising:
Matched sub-block 61, for mating described time series data according to the regular expression preset;
Determine submodule 62, for when the match is successful, determine that described default regular expression is the characteristic information of described time series data.
Fig. 7 is the block diagram of embodiment of the present invention extraction module, and as shown in Figure 7, optionally, described extraction module, comprising:
Extract submodule 71, for extracting the non-letter in described time series data, non-numeric special character in order;
Determine submodule 72, for determining that described special character is the characteristic information of described time series data;
Fig. 8 is the block diagram of embodiment of the present invention extraction module, and as shown in Figure 8, optionally, described extraction module, comprising:
Obtain submodule 81, for obtaining the text feature template of described time series data;
Determine submodule 82, for determining that described text feature template is the characteristic information of described time series data.
Optionally, described cluster display module 54, shows for time series data identical for described characteristic information being concentrated.
About the device in above-described embodiment, wherein the concrete mode of modules executable operations has been described in detail in about the embodiment of the method, will not elaborate explanation herein.
In the present embodiment, show together by the time series data produced by same source code is aggregated to, make it possible to accurately, intuitively show time series data.Carry out complicated search or filter operation without the need to user, also do not need to be grasped the literary style of regular expression and the utilization of other statements, only need direct uploading data content and input keyword to carry out inquiring about.Search Results can be carried out cluster by system automatically, and represents together, facilitates user to check and analyzes.
Those skilled in the art should understand, embodiments of the invention can be provided as method, system or computer program.Therefore, the present invention can adopt the form of complete hardware embodiment, completely software implementation or the embodiment in conjunction with software and hardware aspect.And the present invention can adopt in one or more form wherein including the upper computer program implemented of computer-usable storage medium (including but not limited to magnetic disk memory and optical memory etc.) of computer usable program code.
The present invention describes with reference to according to the process flow diagram of the method for the embodiment of the present invention, equipment (system) and computer program and/or block scheme.Should understand can by the combination of the flow process in each flow process in computer program instructions realization flow figure and/or block scheme and/or square frame and process flow diagram and/or block scheme and/or square frame.These computer program instructions can being provided to the processor of multi-purpose computer, special purpose computer, Embedded Processor or other programmable data processing device to produce a machine, making the instruction performed by the processor of computing machine or other programmable data processing device produce device for realizing the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.
These computer program instructions also can be stored in can in the computer-readable memory that works in a specific way of vectoring computer or other programmable data processing device, the instruction making to be stored in this computer-readable memory produces the manufacture comprising command device, and this command device realizes the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.
These computer program instructions also can be loaded in computing machine or other programmable data processing device, make on computing machine or other programmable devices, to perform sequence of operations step to produce computer implemented process, thus the instruction performed on computing machine or other programmable devices is provided for the step realizing the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.
Obviously, those skilled in the art can carry out various change and modification to the present invention and not depart from the spirit and scope of the present invention.Like this, if these amendments of the present invention and modification belong within the scope of the claims in the present invention and equivalent technologies thereof, then the present invention is also intended to comprise these change and modification.
Claims (10)
1. a data time series analysis method, is characterized in that, comprising:
Extraction time sequence data characteristic information;
Search the time series data that described characteristic information is identical;
Analyze the identical time series data of described characteristic information whether to be produced by same source code;
The time series data identical when described characteristic information is produced by same source code, and the time series data identical to described characteristic information carries out cluster displaying.
2. the method for claim 1, is characterized in that, described extraction time sequence data characteristic information, comprising:
Regular expression according to presetting mates described time series data;
When the match is successful, determine that described default regular expression is the characteristic information of described time series data.
3. the method for claim 1, is characterized in that, described extraction time sequence data characteristic information, comprising:
Extract the non-letter in described time series data, non-numeric special character in order;
Determine that described special character is the characteristic information of described time series data.
4. the method for claim 1, is characterized in that, described extraction time sequence data characteristic information, comprising:
Obtain the text feature template of described time series data;
Determine that described text feature template is the characteristic information of described time series data.
5. the method for claim 1, is characterized in that, the described time series data identical to described characteristic information carries out cluster displaying, comprising:
Time series data identical for described characteristic information is concentrated and shows.
6. a data time series analysis device, is characterized in that, comprising:
Extraction module, for the characteristic information of extraction time sequence data;
Search module, for searching the identical time series data of described characteristic information;
Whether analysis module, produced by same source code for analyzing the identical time series data of described characteristic information;
Cluster display module, produced by same source code for the time series data identical when described characteristic information, the time series data identical to described characteristic information carries out cluster displaying.
7. device as claimed in claim 6, it is characterized in that, described extraction module, comprising:
Matched sub-block, for mating described time series data according to the regular expression preset;
Determine submodule, for when the match is successful, determine that described default regular expression is the characteristic information of described time series data.
8. device as claimed in claim 6, it is characterized in that, described extraction module, comprising:
Extract submodule, for extracting the non-letter in described time series data, non-numeric special character in order;
Determine submodule, for determining that described special character is the characteristic information of described time series data.
9. device as claimed in claim 6, it is characterized in that, described extraction module, comprising:
Obtain submodule, for obtaining the text feature template of described time series data;
Determine submodule, for determining that described text feature template is the characteristic information of described time series data.
10. device as claimed in claim 6, is characterized in that, described cluster display module, shows for time series data identical for described characteristic information being concentrated.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410447046.8A CN104239477A (en) | 2014-09-03 | 2014-09-03 | Method and device for analyzing time series data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410447046.8A CN104239477A (en) | 2014-09-03 | 2014-09-03 | Method and device for analyzing time series data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104239477A true CN104239477A (en) | 2014-12-24 |
Family
ID=52227536
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410447046.8A Pending CN104239477A (en) | 2014-09-03 | 2014-09-03 | Method and device for analyzing time series data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104239477A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110275489A (en) * | 2018-03-13 | 2019-09-24 | 发那科株式会社 | Data time series analysis device |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040103116A1 (en) * | 2002-11-26 | 2004-05-27 | Lingathurai Palanisamy | Intelligent retrieval and classification of information from a product manual |
CN101641674A (en) * | 2006-10-05 | 2010-02-03 | 斯普兰克公司 | Time series search engine |
-
2014
- 2014-09-03 CN CN201410447046.8A patent/CN104239477A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040103116A1 (en) * | 2002-11-26 | 2004-05-27 | Lingathurai Palanisamy | Intelligent retrieval and classification of information from a product manual |
CN101641674A (en) * | 2006-10-05 | 2010-02-03 | 斯普兰克公司 | Time series search engine |
Non-Patent Citations (1)
Title |
---|
梁晓雪 等: "基于聚类的日志分析技术综述与展望", 《云南大学学报》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110275489A (en) * | 2018-03-13 | 2019-09-24 | 发那科株式会社 | Data time series analysis device |
CN110275489B (en) * | 2018-03-13 | 2021-04-20 | 发那科株式会社 | Time series data analysis device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11907244B2 (en) | Modifying field definitions to include post-processing instructions | |
Brehmer et al. | Overview: The design, adoption, and analysis of a visual document mining tool for investigative journalists | |
CN108829858B (en) | Data query method and device and computer readable storage medium | |
US11941016B2 (en) | Using specified performance attributes to configure machine learning pipepline stages for an ETL job | |
US11947550B1 (en) | Fast ad-hoc filtering of time series analytics | |
CN110532019B (en) | Method for tracing history of software code segment | |
CN105187242B (en) | A kind of user's anomaly detection method excavated based on variable-length pattern | |
CN113688288B (en) | Data association analysis method, device, computer equipment and storage medium | |
CN110543571A (en) | knowledge graph construction method and device for water conservancy informatization | |
CN108304382B (en) | Quality analysis method and system based on text data mining in manufacturing process | |
CN102521316A (en) | Pattern matching framework for log analysis | |
CN105354325A (en) | Document retrieval and analysis system | |
JP7375861B2 (en) | Related score calculation systems, methods and programs | |
CN105550375A (en) | Heterogeneous data integrating method and system | |
US20150269138A1 (en) | Publication Scope Visualization and Analysis | |
CN103500158A (en) | Method and device for annotating electronic document | |
CN105302730A (en) | Calculation model detection method, testing server and service platform | |
CN110990445A (en) | Data processing method, device, equipment and medium | |
CN103324407B (en) | Information processing unit and information processing method | |
CN112634004B (en) | Method and system for analyzing blood-cause atlas of credit investigation data | |
Shrivastava et al. | Implementation of Apriori algorithm using WEKA | |
CN104240107B (en) | Community data screening system and method thereof | |
CN103092987A (en) | Fast document retrieval method and device | |
CN116484084B (en) | Metadata blood-margin analysis method, medium and system based on application information mining | |
KR102345410B1 (en) | Big data intelligent collecting method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20141224 |