CN102982012A - Method and device used for obtaining target character strings in disorder text - Google Patents

Method and device used for obtaining target character strings in disorder text Download PDF

Info

Publication number
CN102982012A
CN102982012A CN2011102644476A CN201110264447A CN102982012A CN 102982012 A CN102982012 A CN 102982012A CN 2011102644476 A CN2011102644476 A CN 2011102644476A CN 201110264447 A CN201110264447 A CN 201110264447A CN 102982012 A CN102982012 A CN 102982012A
Authority
CN
China
Prior art keywords
text
character
sequence text
sequence
string
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011102644476A
Other languages
Chinese (zh)
Other versions
CN102982012B (en
Inventor
李彦宏
舒迅
方勇
王波
徐文涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201110264447.6A priority Critical patent/CN102982012B/en
Publication of CN102982012A publication Critical patent/CN102982012A/en
Application granted granted Critical
Publication of CN102982012B publication Critical patent/CN102982012B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention aims at proving a method and device used for obtaining target character strings in a disorder text, wherein a disorder text processing device obtains the disorder text to be processed, permutation and combination of characters in the disorder text is conducted, and one character sequence or a plurality of character sequences are obtained and correspond to the disorder text. According to the one character sequence or the plurality of character sequences, matching inquiry is carried out in a target pattern bank to obtain the target character strings in the disorder text. Compared with the prior art, due to the facts that the permutation and combination of the characters in the disorder text is conducted , the matching inquiry is carried out in the target pattern bank according to a result of the permutation and combination, and the target character strings comprising forbidding information in the disorder text are obtained, the forbidding information in the disorder text is effectively indentified, and therefore the capacity that a system applies and filters the forbidding information is enhanced.

Description

A kind of method and apparatus of the target string for obtaining out-of-sequence text
Technical field
The present invention relates to field of computer technology, relate in particular to a kind of method and apparatus of the target string for obtaining out-of-sequence text.
Background technology
In the network forum, often occur the user with the network forum forbid such as advertisement, pornographic, violence, illegal Information Embedding in out-of-sequence text, such as perpendicular style of writing basis, diagonal text etc., and can successfully this out-of-sequence text be committed to the network forum, thereby reach the purpose of issue prohibition information, because the above-mentioned prohibition information in the normal alignment text sequentially mainly can be identified and filter to prior art, but can not effectively identify the prohibition information in the out-of-sequence text.
Therefore, the target string that comprises this prohibition information of how effectively identifying in the out-of-sequence text becomes problem demanding prompt solution.
Summary of the invention
The method and apparatus that the purpose of this invention is to provide a kind of target string for obtaining out-of-sequence text.
According to an aspect of the present invention, provide a kind of method of the target string for obtaining out-of-sequence text, wherein, the method may further comprise the steps:
A obtains pending out-of-sequence text;
B carries out permutation and combination to the character in the described out-of-sequence text, obtains the one or more character strings corresponding with described out-of-sequence text;
C carries out matching inquiry according to described one or more character strings in the target pattern storehouse, to obtain the target string in the described out-of-sequence text.
According to an aspect of the present invention, provide a kind of equipment of the target string for obtaining out-of-sequence text, wherein, this equipment comprises:
The text deriving means is used for obtaining pending out-of-sequence text;
The permutation and combination device is used for the character of described out-of-sequence text is carried out permutation and combination, obtains the one or more character strings corresponding with described out-of-sequence text;
The matching inquiry device is used for according to described one or more character strings, carries out matching inquiry in the target pattern storehouse, to obtain the target string in the described out-of-sequence text.
Compared with prior art, the present invention is by carrying out permutation and combination to the character in the out-of-sequence text, and its result carried out matching inquiry in the target pattern storehouse, to obtain the target string that comprises prohibition information in the described out-of-sequence text, thereby effectively identify the prohibition information in the out-of-sequence text, strengthen thus the ability that system applies is filtered prohibition information.
Description of drawings
By reading the detailed description that non-limiting example is done of doing with reference to the following drawings, it is more obvious that other features, objects and advantages of the present invention will become:
Fig. 1 illustrates according to the equipment synoptic diagram of one aspect of the invention for the target string that obtains out-of-sequence text;
Fig. 2 illustrates the equipment synoptic diagram for the target string that obtains out-of-sequence text in accordance with a preferred embodiment of the present invention;
Fig. 3 illustrates according to a further aspect of the present invention the method flow diagram of the target string that is used for obtaining out-of-sequence text;
Fig. 4 illustrates the method flow diagram for the target string that obtains out-of-sequence text in accordance with a preferred embodiment of the present invention.
Same or analogous Reference numeral represents same or analogous parts in the accompanying drawing.
Embodiment
Below in conjunction with accompanying drawing the present invention is described in further detail.
Fig. 1 illustrates according to the equipment synoptic diagram of one aspect of the invention for the target string that obtains out-of-sequence text.Out-of-sequence text-processing equipment 1 comprises text deriving means 11, permutation and combination device 12 and matching inquiry device 13.At this, out-of-sequence text-processing equipment 1 includes but not limited to the cloud that computing machine, network host, single network server, a plurality of webserver collection or a plurality of server consist of.At this, cloud is by consisting of based on a large amount of computing machines of cloud computing (Cloud Computing) or the webserver, and wherein, cloud computing is a kind of of Distributed Calculation, a super virtual machine that is comprised of the loosely-coupled computing machine collection of a group.
Particularly, text deriving means 11 obtains pending out-of-sequence text.More specifically, text deriving means 11 regularly or answer Event triggered to obtain in real time pending out-of-sequence text, the out-of-sequence text of for example submitting to by subscriber equipment by the real-time listening user is submitted request to, to obtain the out-of-sequence text of user's input, perhaps directly read this out-of-sequence text from third party device by the communication mode of arranging termly.At this, described " out-of-sequence text " means improperly to write according to the order that people read usually, but the out-of-sequence regular word content that people can identify, include but not limited to perpendicular style of writing this, diagonal text, S compose a piece of writing this etc.For example, suppose that out-of-sequence text-processing equipment 1 is the network forum server, the user is by subscriber equipment one section perpendicular this information of style of writing of inputting interface input at the network forum webpage, then, subscriber equipment is posted this information and is packaged into http request and is submitted to the text deriving means 11 of out-of-sequence text-processing equipment 1 by http communication protocol as forum, then, text deriving means 11 is by the real-time listening user message, receive and resolve this http request, obtain perpendicular this information of style of writing wherein.For another example, text deriving means 11 is pressed some cycles, sends to third party device by the application programming interface (API) of calling setting termly and obtains the request of out-of-sequence text, and receive the perpendicular style of writing document originally that this third party device returns based on this request.Those skilled in the art will be understood that the above-mentioned mode of out-of-sequence text of obtaining is only for giving an example; other existing or modes of obtaining out-of-sequence text that may occur from now on are as applicable to the present invention; also should be included in the protection domain of the present invention, and be contained in this at this with way of reference.
Subsequently, the character in 12 pairs of described out-of-sequence texts of permutation and combination device carries out permutation and combination, obtains the one or more character strings corresponding with described out-of-sequence text.Particularly, the out-of-sequence text that permutation and combination device 12 provides according to text deriving means 11, for example by the character in this out-of-sequence text is set gradually corresponding index, and this index carried out permutation and combination, to obtain the one or more character strings corresponding with this out-of-sequence text, perhaps generate the ranks number character matrix identical with the ranks number of this out-of-sequence text, and each character in will this out-of-sequence text is mapped to the character element of this matrix correspondence position, realize permutation and combination to out-of-sequence text by this character matrix being carried out matrix operation, and then obtain the one or more character strings corresponding with this out-of-sequence text.For example, the out-of-sequence text that provides according to text deriving means 11 of permutation and combination device 12:
Do soon #
# speed card
With each character in this out-of-sequence text according to from left to right, order is from top to bottom set up index successively, be 1 such as character ' soon ' index, index ' done ' in character is 2, character ' speed ' index is 5, character ' card ' index is 6, permutation and combination device 12 is by carrying out full permutation and combination to index 1 to 6, and according to the permutation and combination of index, be its corresponding character with index-mapping, to obtain the one or more character strings corresponding with this out-of-sequence text, be " doing soon # " such as index combination " 123 " corresponding character string, index combination " 26 " corresponding character string is " certificates handling ", index combination " 15 " corresponding character string is " fast ".For another example, the out-of-sequence text of 9 row 2 row that provide according to text deriving means 11 of permutation and combination device 12:
×a
×b
Subtract@
Fertile 1
Tea 2
Special
Valency c
Short o
Pin m
Generate the 9x2 rank character matrix A that has identical ranks number with this out-of-sequence text, and each character in will this out-of-sequence text is mapped to the character element of this character matrix correspondence position, as:
Figure BDA0000089706910000051
12 couples of character matrix A of permutation and combination device carry out the matrix transpose computing, obtain transposed matrix A ':
Figure BDA0000089706910000052
And every row element among the A ' is mapped as a character string corresponding with this out-of-sequence text, such as character string " * * sales promotion of slim tea special price " and " ab@12.com ".Those skilled in the art will be understood that the above-mentioned mode of character string of obtaining is only for giving an example; other existing or modes of obtaining character string that may occur from now on are as applicable to the present invention; also should be included in the protection domain of the present invention, and be contained in this at this with way of reference.
Then, matching inquiry device 13 carries out matching inquiry according to described one or more character strings in the target pattern storehouse, to obtain the target string in the described out-of-sequence text.Particularly, one or more character strings that matching inquiry device 13 obtains according to permutation and combination device 12, for example each character string is carried out successively the inquiry of string matching inquiry or matching regular expressions in the target pattern storehouse, when if the content in this character string and the target pattern storehouse is complementary, then this character string is the target string in this out-of-sequence text, and obtains accordingly the one or more target strings in this out-of-sequence text.At this, described " target pattern storehouse " is used for the regular expression of storage target string and target string, and this target pattern storehouse includes but not limited to relational database, memory storage, harddisk memory etc.At this, described " target string " includes but not limited to telephone number, e-mail address, website URL etc.For example, suppose to comprise in a plurality of character strings corresponding with out-of-sequence text character string " certificates handling ", matching inquiry device 13 is by carrying out matching inquiry with it in the target pattern storehouse, determine that the character string " certificates handling " in this character string and this target pattern storehouse is complementary, obtain thus this character string and be the target string in this out-of-sequence text.For another example, suppose to comprise in a plurality of character strings corresponding with out-of-sequence text character string " ab@12.com ", matching inquiry device 13 is by carrying out matching inquiry with it in the target pattern storehouse, determine in this character string and this target pattern storehouse regular expression "/^ w+ ((-w+) | (. w+)) * [A-Za-z0-9]+((.|-) [A-Za-z0-9]+) * .[A-Za-z0-9]+/ " be complementary, obtain thus this character string and be the target string in this out-of-sequence text.Those skilled in the art will be understood that the mode of above-mentioned acquisition target string is only for giving an example; the mode of other acquisition target strings existing or that may occur from now on is as applicable to the present invention; also should be included in the protection domain of the present invention, and be contained in this at this with way of reference.
Preferably, be to work continuously between text deriving means 11, permutation and combination device 12 and the matching inquiry device 13.Particularly, text deriving means 11 obtains pending out-of-sequence text; Subsequently, the character in 12 pairs of described out-of-sequence texts of permutation and combination device carries out permutation and combination, obtains the one or more character strings corresponding with described out-of-sequence text; Then, matching inquiry device 13 carries out matching inquiry according to described one or more character strings in the target pattern storehouse, to obtain the target string in the described out-of-sequence text; At this, it will be understood by those skilled in the art that " continuing " refers to that each device requires to carry out the obtaining of the obtaining of out-of-sequence text, character string, and the obtaining of target string according to the mode of operation of setting or adjust in real time respectively, until text deriving means 11 stops obtaining pending out-of-sequence text in a long time.
Fig. 2 illustrates the equipment synoptic diagram for the target string that obtains out-of-sequence text in accordance with a preferred embodiment of the present invention, and wherein, permutation and combination device 12 ' also comprises matrix generation unit 121 ' and matrix operation unit 122 '.
Particularly, matrix generation unit 121 ' generates the character matrix corresponding with described out-of-sequence text according to described out-of-sequence text, and wherein, each character in the described out-of-sequence text is corresponding to the character element of correspondence position in the described character matrix; Then, matrix operation unit 122 ' carries out the permutation and combination of described character element to described character matrix by matrix operation, to obtain described one or more character string.More specifically, matrix generation unit 121 ' generates the character matrix of identical ranks number with having of this out-of-sequence text for example according to the ranks number of out-of-sequence text, and each character in will this out-of-sequence text is mapped to the character element of this matrix correspondence position; Then, matrix operation unit 122 ' for example carries out one or more matrix operations by the character matrix that matrix generation unit 121 ' is obtained, such as matrix transpose, matrixings etc. obtain corresponding one or more new character matrixs, then, the character element that will be somebody's turn to do delegation in (a bit) new character matrix or multirow is mapped as a character string, realizing the permutation and combination to out-of-sequence text character, thereby obtains the one or more character strings corresponding with this out-of-sequence text.At this, the mode that the character element that will be somebody's turn to do delegation in (a bit) new character matrix or multirow is mapped as a character string includes but not limited to: by every row from left to right, every row right-to-left, multirow from top to bottom or multirow wait these character elements of order from bottom to top, obtain a character string with splicing, the character element that maybe will be somebody's turn to do every delegation in (a bit) new character matrix connects sequentially from beginning to end and is mapped as a character string.For example, matrix generation unit 121 ' is according to out-of-sequence text:
Perpendicular literary composition
The row word
Generate the 2x corresponding with this out-of-sequence text 2 rank character matrix B:
Figure BDA0000089706910000071
Matrix operation unit 122 ' obtains a plurality of new character matrixs by it being carried out matrix operation, comprising:
Figure BDA0000089706910000072
Figure BDA0000089706910000073
Then, each line character of the character matrix that each is new according to from left to right order, multirow character in accordance with the order from top to bottom head and the tail connect and obtain the character string corresponding with B1 " perpendicular style of writing word ", the character string corresponding with B2 " the row word is perpendicular civilian ".At this, above-mentioned matrix operation includes but not limited to matrix transpose, matrixing, matrix multiplication etc.Those skilled in the art will be understood that above-mentioned generation character matrix and the mode that obtains character string only are for example; other generation character matrixs existing or that may occur from now on or the mode that obtains character string are as applicable to the present invention; also should be included in the protection domain of the present invention, and be contained in this at this with way of reference.
Preferably, matrix operation unit 122 ' is according to default out-of-sequence text, described character matrix carried out the permutation and combination of described character element by the matrix operation corresponding with described out-of-sequence text, to obtain described one or more character string.Particularly, matrix operation unit 122 ' is according to default out-of-sequence text, such as perpendicular style of writing basis, diagonal text etc., for example in the matrix operation formula mapping table that presets, obtain the matrix operation formula corresponding with this (a bit) out-of-sequence text, for example perpendicular this corresponding matrix operation formula of style of writing is the matrix transpose formula, according to this (a bit) formula character matrix is carried out matrix operation, and no longer this character matrix is carried out other matrix operations, realize permutation and combination to character in the out-of-sequence text with this, and obtain described one or more character string.Preferred this, out-of-sequence text-processing equipment 1 can be by adding up the out-of-sequence text that occurred in the application corresponding with this out-of-sequence text, obtain the higher out-of-sequence text of the frequency of occurrences, and should (a bit) out-of-sequence text, as described default out-of-sequence text.For example, suppose that default out-of-sequence text comprises perpendicular style of writing originally and the diagonal text, described character matrix C is:
Figure BDA0000089706910000081
Matrix operation unit 122 ' at first carries out matching inquiry according to perpendicular style of writing text ID originally in matrix operation formula mapping table, obtain the matrix operation formula corresponding with text type, such as matrix transpose, then matrix operation unit 122 ' utilizes this matrix operation formula that character matrix C is carried out matrix operation, realizing a kind of permutation and combination of character element in this character matrix, and obtain corresponding with it one or more character strings; Then, matrix operation unit 122 ' utilizes the same method to obtain the matrix operation formula corresponding with the diagonal text, realizing the another kind of permutation and combination of character element in this character matrix, and obtains with it one or more character strings of correspondence.At this, described matrix operation formula mapping table maybe can be stored in matrix operation unit 122 ', and perhaps the communication interface by agreement directly reads from other parts or the third party device of out-of-sequence text-processing equipment 1.Those skilled in the art will be understood that the mode of above-mentioned acquisition character string is only for giving an example; the mode of other acquisition character strings existing or that may occur from now on is as applicable to the present invention; also should be included in the protection domain of the present invention, and be contained in this at this with way of reference.
Preferably, described out-of-sequence text include but not limited to following at least each:
-perpendicular style of writing is originally;
-diagonal text;
-S composes a piece of writing this.
Particularly, out-of-sequence text comprise perpendicular style of writing this, it means word content according to or tactic text from bottom to top from top to bottom, as:
Originally perpendicular
Style of writing
The literary composition row
This is perpendicular
Out-of-sequence text comprises the diagonal text, and it means the text that word content is arranged according to certain slope, as:
Oblique Zhe ﹠amp; ﹠amp;
﹠amp; Row Shi ﹠amp;
﹠amp; Lift lift ﹠amp;
﹠amp; ﹠amp; The example word
Text; Out-of-sequence text comprises the S style of writing originally, and it means the text that word content is arranged according to alphabetical S shape, as:
^^^0^
^^1^^
^0^^^
8^^^^
^9^^^
^^0^^
^^^^0
^^^1^
^^2^^
^3^^^
4^^^^
Text.Those skilled in the art will be understood that above-mentioned every text not only can be separately be used for forming out-of-sequence text, and wherein multinomial combination is for forming out-of-sequence text.Those skilled in the art will be understood that above-mentioned out-of-sequence text only for giving an example, and other out-of-sequence text existing or that may occur from now on also should be included in the protection domain of the present invention as applicable to the present invention, and are contained in this at this with way of reference.
In a preferred embodiment (with reference to Fig. 1), out-of-sequence text-processing equipment 1 also comprises character string screening plant (not shown).Referring to Fig. 1 the preferred embodiment is described in detail, wherein, text deriving means 11 obtains pending out-of-sequence text; Then, the character in 12 pairs of described out-of-sequence texts of permutation and combination device carries out permutation and combination, obtains the one or more character strings corresponding with described out-of-sequence text; Its detailed process for simplicity's sake, is contained in this with way of reference with aforementioned identical with reference to the performed process of the described embodiment Chinese version of Fig. 1 deriving means 11 and permutation and combination device 12, does not give unnecessary details and do not do.
Particularly, the character string screening plant is selected one or more preferred characters sequences according to the screening rule that presets from described one or more character strings; Wherein, matching inquiry device 13 carries out matching inquiry according to described one or more preferred characters sequences in described target pattern storehouse, to obtain described target string.More specifically, one or more character strings that the character string screening plant obtains according to permutation and combination device 12, according to the screening rule that presets, select one or more preferred characters sequences from these one or more character strings, wherein said screening rule includes but not limited to: 1) screen out the identical character string of content in those character strings; 2) screen out and only comprise in those character strings such as $, %, ﹠amp; , * ,~, the character string of the specific character such as space; 3) select wherein only to comprise the character string of specific character, for example only comprise the character string of arabic numeral, only comprise the character string of Chinese character, or only comprise the character string etc. of double-byte characters; Then, matching inquiry device 13 carries out matching inquiry with one or more preferred characters sequences that the character string screening plant obtains in the target pattern storehouse, to obtain described target string.Those skilled in the art will be understood that the mode of above-mentioned selection preferred characters sequence is only for giving an example; the mode of other selection preferred characters sequences existing or that may occur from now on is as applicable to the present invention; also should be included in the protection domain of the present invention, and be contained in this at this with way of reference.
In a further advantageous embodiment (with reference to Fig. 1), out-of-sequence text-processing equipment 1 also comprises the pretreatment unit (not shown).Referring to Fig. 1 the preferred embodiment is described in detail, wherein, text deriving means 11 obtains pending out-of-sequence text; Matching inquiry device 13 carries out matching inquiry according to described one or more character strings in the target pattern storehouse, to obtain the target string in the described out-of-sequence text; Its detailed process for simplicity's sake, is contained in this with way of reference with aforementioned identical with reference to the performed process of the described embodiment Chinese version of Fig. 1 deriving means 11 and matching inquiry device 13, does not give unnecessary details and do not do.
Particularly, pretreatment unit carries out pre-service according to default preprocessing rule to described out-of-sequence text, obtains preprocessed text; Wherein, the character in 12 pairs of described preprocessed texts of permutation and combination device carries out permutation and combination, obtains the one or more character strings corresponding with described preprocessed text.More specifically, pretreatment unit carries out pre-service according to the default preprocessing rule such as normal text that is converted into such as the filtering specific character, with special-shaped literal to out-of-sequence text, obtains preprocessed text; Then, the character in the preprocessed text that 12 pairs of pretreatment units of permutation and combination device obtain carries out permutation and combination, obtains the one or more character strings corresponding with described preprocessed text.For example, suppose that out-of-sequence text is:
Vertical # # #
The oblique # # of Hang
The capable # of Wen #
Zi # # literary composition
Each character that pretreatment unit at first will lose text carries out matching inquiry in the specific character storehouse, obtaining character ' # ' is specific character, and then pretreatment unit obtains the first pre-service result with character filtering from this out-of-sequence text:
Vertical
Hang is oblique
Wen is capable
The Zi literary composition
Then, pretreatment unit carries out matching inquiry with each character of this first preprocessed text in special-shaped literal pool, and accordingly Mars literal ' Vertical ' is converted into ' erecting ', ' Hang ' is converted into ' OK ' ‘ Wen ' and is converted into ' literary composition ', ' Zi ' is converted into ' word ', thereby obtains the second pre-service result:
Perpendicular
Row tiltedly
The literary composition row
The word literary composition
, and with this second pre-service result as preprocessed text; Then, the character in 12 pairs of these preprocessed texts of permutation and combination device carries out permutation and combination, obtains the one or more character strings corresponding with this preprocessed text, such as " perpendicular style of writing word ", " diagonal literary composition ", " tiltedly perpendicular " etc.At this, specific character storehouse in the illustrated embodiment is used for the specific character of storing predetermined justice, include but not limited to relational database, memory storage, harddisk memory etc., special-shaped literal pool in the illustrated embodiment includes but not limited to relational database, memory storage, harddisk memory etc. for the mapping that special-shaped literal such as chrysanthemum body, Mars word of storage reaches the normal text corresponding with it.At this, those skilled in the art will be understood that described specific character storehouse both can be separate with described special-shaped literal pool, also can be integrated in the described special-shaped literal pool.Those skilled in the art will be understood that the pretreated mode of above-mentioned out-of-sequence text is only for giving an example; the pretreated mode of other out-of-sequence texts existing or that may occur from now on is as applicable to the present invention; also should be included in the protection domain of the present invention, and be contained in this at this with way of reference.
Preferably, described preprocessing rule based on but be not limited to that following each carries out pre-service to described out-of-sequence text at least:
Specific character in the described out-of-sequence text of-filtering;
-the special-shaped literal in the described out-of-sequence text is converted into normal text;
-half size character in the described out-of-sequence text is converted to full size character.Particularly, described preprocessing rule carries out pre-service based on the preprocessing rule of the specific character in the described out-of-sequence text of filtering to out-of-sequence text, this specific character include but not limited to ^, *, |, ◎,, ⊙, ★ etc., it can be stored in the specific character storehouse.Described preprocessing rule carries out pre-service based on the preprocessing rule that the special-shaped literal in the described out-of-sequence text is converted into normal text to out-of-sequence text, and this abnormal shape literal includes but not limited to chrysanthemum literary composition, Mars word etc., and it can be stored in the special-shaped literal pool.Described preprocessing rule carries out pre-service based on the preprocessing rule that the half size character in the described out-of-sequence text is converted to full size character to out-of-sequence text, is the single-byte character in the out-of-sequence text all is converted to double-byte characters.Those skilled in the art will be understood that above-mentioned preprocessing rule only for giving an example, and other preprocessing rules existing or that may occur from now on also should be included in the protection domain of the present invention as applicable to the present invention, and are contained in this at this with way of reference.
Preferably, matching inquiry device 13 based on following at least each mode, carries out matching inquiry in the target pattern storehouse according to described one or more character strings, to obtain described target string:
-string matching;
-matching regular expressions.
Particularly, matching inquiry device 13 carries out matching inquiry based on the mode of string matching in the target pattern storehouse, then matching inquiry device 13 for example carries out string matching with the character string that permutation and combination device 12 obtains successively with all character strings of inquiry acquisition in the target pattern storehouse, if the match is successful, then this character string is the target string in the out-of-sequence text.Matching inquiry device 13 carries out matching inquiry based on the mode of matching regular expressions in the target pattern storehouse, then matching inquiry device 13 for example carries out matching regular expressions with the character string that permutation and combination device 12 obtains successively with all regular expressions of inquiry acquisition in the target pattern storehouse, if certain character string in this character string satisfies this regular expression, then this character string is the target string in the out-of-sequence text.Those skilled in the art will be understood that above-mentioned every matching way not only can be separately be used for 13 pairs of character strings of matching inquiry device and carries out matching inquiry, and wherein multinomial combination is carried out matching inquiry for 13 pairs of character strings of matching inquiry device.Those skilled in the art will be understood that above-mentioned matching way only for giving an example, and other matching ways existing or that may occur from now on also should be included in the protection domain of the present invention as applicable to the present invention, and are contained in this at this with way of reference.
Preferably, described target string include but not limited to following at least each:
-telephone number;
-internet address;
-E-mail address;
-instant messaging account.
Particularly, target string comprises telephone number, such as fixed telephone number, Mobile Directory Number etc.Target string comprises internetwork address, is applied to Uniform Resource Identifier (URI) of internet etc. such as IP address, domain name addresses and other.Target string comprises E-mail address, such as 163 email addresses, hotmail email address etc.Target string comprises the instant messaging account, such as QQ account, msn account etc.Those skilled in the art will be understood that above-mentioned target string only for giving an example, and other target strings existing or that may occur from now on also should be included in the protection domain of the present invention as applicable to the present invention, and are contained in this at this with way of reference.
In another preferred embodiment (with reference to Fig. 1), out-of-sequence text-processing equipment 1 also comprises after-treatment device (not shown) and generator (not shown).Referring to Fig. 1 the preferred embodiment is described in detail, wherein, the character in 12 pairs of described out-of-sequence texts of permutation and combination device carries out permutation and combination, obtains the one or more character strings corresponding with described out-of-sequence text; Matching inquiry device 13 carries out matching inquiry according to described one or more character strings in the target pattern storehouse, to obtain the target string in the described out-of-sequence text; Its detailed process for simplicity's sake, is contained in this with way of reference with aforementioned identical with reference to the performed process of permutation and combination device 12 among the described embodiment of Fig. 1 and matching inquiry device 13, does not give unnecessary details and do not do.
Particularly, text deriving means 11 obtains the pending described out-of-sequence text that the user submits to by subscriber equipment; More specifically, the user by with the interactive mode of subscriber equipment, include but not limited to keyboard, mouse, telepilot, touch pad or handwriting equipment, the out-of-sequence text of input in browser software, application program or client software; Take keyboard as example, the user finishes the input of out-of-sequence text at the input text frame of application program, and trigger subscriber equipment this out-of-sequence text is sent to out-of-sequence text-processing equipment 1 according to the communication protocol of arranging via network by clicking " submissions " button or other modes, text deriving means 11 receives this out-of-sequence text in real time by monitoring users message.At this, this subscriber equipment can be any can with the user by the electronic product that keyboard, mouse, telepilot, touch pad or voice-operated device carry out man-machine interaction, include but not limited to computing machine, smart mobile phone, PDA or IPTV etc.Can realize communicating by letter by any communication mode between out-of-sequence text-processing equipment 1 and the subscriber equipment, include but not limited to, based on the mobile communication of 3GPP, LTE, WIMAX, based on the computer network communication of TCP/IP, udp protocol and based on the low coverage wireless transmission method of bluetooth, Infrared Transmission standard.The network that out-of-sequence text-processing equipment 1 is connected with subscriber equipment includes but not limited to: internet, wide area network, Metropolitan Area Network (MAN), LAN (Local Area Network), VPN network, wireless self-organization network (Ad Hoc network) etc.Those skilled in the art will be understood that the above-mentioned mode of out-of-sequence text of obtaining is only for giving an example; other existing or modes of obtaining out-of-sequence text that may occur from now on are as applicable to the present invention; also should be included in the protection domain of the present invention, and be contained in this at this with way of reference.
After-treatment device carries out aftertreatment according to described target string to described out-of-sequence text, to obtain the aftertreatment text corresponding with described out-of-sequence text; Then, generator offers described subscriber equipment with described aftertreatment text.Particularly, after-treatment device is according to the target string in the out-of-sequence text of matching inquiry device 13 acquisitions, this out-of-sequence text is carried out aftertreatment, to obtain the aftertreatment text corresponding with this out-of-sequence text, this post processing mode includes but not limited to: 1) target string is deleted from this out-of-sequence text, 2) in this out-of-sequence text, be other meaningless characters with each character replacement in the target string, all deletions etc. of the out-of-sequence content of text that such as the space etc., 3) will comprise target string; Then, generator adopts any known computing machine that the technological means of people's readable information is provided according to the aftertreatment text that after-treatment device obtains, and such as screen display, loudspeaker broadcast etc. offers subscriber equipment with described aftertreatment text.Take screen display as example, generator utilizes page technology with the aftertreatment text that after-treatment device obtains, such as JSP, ASP or PHP, offer described subscriber equipment by certain format, such as offering subscriber equipment with forms such as link, page text, browse for the user.Those skilled in the art will be understood that and above-mentioned out-of-sequence text carried out aftertreatment and the mode that the aftertreatment text is provided only for for example; other existing or modes that out-of-sequence text is carried out aftertreatment or the aftertreatment text is provided that may occur from now on are as applicable to the present invention; also should be included in the protection domain of the present invention, and be contained in this at this with way of reference.
Preferably, described after-treatment device in conjunction with user related information, carries out aftertreatment to described out-of-sequence text also according to described target string, to obtain the aftertreatment text corresponding with described out-of-sequence text.Particularly, described after-treatment device in conjunction with user related informations such as user behavior historical record, user property, carries out aftertreatment to described out-of-sequence text also according to described target string, to obtain the aftertreatment text corresponding with described out-of-sequence text.For example, suppose that out-of-sequence text is user's posting in the network forum, comprise the target strings such as telephone number, e-mail address in this out-of-sequence text, after-treatment device carries out statistical study by the historical behavior record to this user, obtaining its time interval of posting continuously is lower than the time interval threshold value of posting that presets, and determine that it is unartificial posting, accordingly, the full content that after-treatment device will this out-of-sequence text is directly deleted.For another example, connect example, after-treatment device obtains this user from this user property be the edition owner of this network forum space of a whole page, and enjoy a good reputation in default credit worthiness threshold value, accordingly, after-treatment device does not carry out any aftertreatment to this out-of-sequence text, directly it is output as the aftertreatment text.At this, the mode of obtaining user related information includes but not limited to: obtains in the left log-on message during by the subscriber equipment log-on webpage according to the user, or according to obtaining etc. in user's historical behavior information of extracting in the cookies information that during the user is by the subscriber equipment browsing page, recorded by subscriber equipment end or network-side or by subscriber equipment.Those skilled in the art will be understood that the above-mentioned mode that out-of-sequence text is carried out aftertreatment is only for for example; other existing or modes that out-of-sequence text is carried out aftertreatment that may occur from now on are as applicable to the present invention; also should be included in the protection domain of the present invention, and be contained in this at this with way of reference.
Preferably, described user related information include but not limited to following at least each:
-user historical behavior record;
-user property;
The address of-subscriber equipment.
Particularly, this user related information comprises user's historical behavior record, the time of posting such as the user, the quantity that the user posts, evaluation of estimate that the user posts etc.For example, the quantity that is regarded as the rubbish note during user's history is posted is more, and then to delete the probability of the out-of-sequence text that comprises target string of this user issue larger for after-treatment device.This user related information comprises user property, such as user's hour of log-on, user's credit worthiness etc.For example, user's hour of log-on is longer, credit worthiness is higher, and then to delete the probability of the out-of-sequence text that comprises target string of this user issue less for after-treatment device.This user related information comprises the address of subscriber equipment.For example, if the address of this subscriber equipment is put into the blacklist of limiting access network forum, then after-treatment device deletion user is larger by the probability of the out-of-sequence text of this subscriber equipment submission.Preferably, after-treatment device in conjunction with the combination in any of above-mentioned user related information, carries out aftertreatment to described out-of-sequence text also according to described target string, to obtain the aftertreatment text corresponding with described out-of-sequence text.Those skilled in the art will be understood that above-mentioned user related information only for giving an example, and other user related informations existing or that may occur from now on also should be included in the protection domain of the present invention as applicable to the present invention, and are contained in this at this with way of reference.
Fig. 3 illustrates according to the method flow diagram of one aspect of the invention for the target string that obtains out-of-sequence text.At this, out-of-sequence text-processing equipment 1 includes but not limited to the cloud that computing machine, network host, single network server, a plurality of webserver collection or a plurality of server consist of.At this, cloud is by consisting of based on a large amount of computing machines of cloud computing (Cloud Computing) or the webserver, and wherein, cloud computing is a kind of of Distributed Calculation, a super virtual machine that is comprised of the loosely-coupled computing machine collection of a group.
Particularly, in step S1, out-of-sequence text-processing equipment 1 obtains pending out-of-sequence text.More specifically, in step S1, out-of-sequence text-processing equipment 1 regularly or answer Event triggered to obtain in real time pending out-of-sequence text, the out-of-sequence text of for example submitting to by subscriber equipment by the real-time listening user is submitted request to, to obtain the out-of-sequence text of user's input, perhaps directly read this out-of-sequence text from third party device by the communication mode of arranging termly.At this, described " out-of-sequence text " means improperly to write according to the order that people read usually, but the out-of-sequence regular word content that people can identify, include but not limited to perpendicular style of writing this, diagonal text, S compose a piece of writing this etc.For example, suppose that out-of-sequence text-processing equipment 1 is the network forum server, the user is by subscriber equipment one section perpendicular this information of style of writing of inputting interface input at the network forum webpage, then, subscriber equipment is posted this information and is packaged into the http request and is submitted to out-of-sequence text-processing equipment 1 by http communication protocol as forum, then, in step S1, out-of-sequence text-processing equipment 1 receives and resolves this http request by the real-time listening user message, obtains perpendicular this information of style of writing wherein.For another example, in step S1, out-of-sequence text-processing equipment 1 is pressed some cycles, sends to third party device by the application programming interface (API) of calling setting termly and obtains the request of out-of-sequence text, and receive the perpendicular style of writing document originally that this third party device returns based on this request.Those skilled in the art will be understood that the above-mentioned mode of out-of-sequence text of obtaining is only for giving an example; other existing or modes of obtaining out-of-sequence text that may occur from now on are as applicable to the present invention; also should be included in the protection domain of the present invention, and be contained in this at this with way of reference.
Subsequently, in step S2, the character in 1 pair of described out-of-sequence text of out-of-sequence text-processing equipment carries out permutation and combination, obtains the one or more character strings corresponding with described out-of-sequence text.Particularly, in step S2, out-of-sequence text-processing equipment 1 is according to its out-of-sequence text that provides in step S1, for example by the character in this out-of-sequence text is set gradually corresponding index, and this index carried out permutation and combination, to obtain the one or more character strings corresponding with this out-of-sequence text, perhaps generate the ranks number character matrix identical with the ranks number of this out-of-sequence text, and each character in will this out-of-sequence text is mapped to the character element of this matrix correspondence position, realize permutation and combination to out-of-sequence text by this character matrix being carried out matrix operation, and then obtain the one or more character strings corresponding with this out-of-sequence text.For example, in step S2, out-of-sequence text-processing equipment 1 is according to its out-of-sequence text that provides in step S1:
Do soon #
# speed card
With each character in this out-of-sequence text according to from left to right, order is from top to bottom set up index successively, be 1 such as character ' soon ' index, index ' done ' in character is 2, character ' speed ' index is 5, character ' card ' index is 6, out-of-sequence text-processing equipment 1 is by carrying out full permutation and combination to index 1 to 6, and according to the permutation and combination of index, be its corresponding character with index-mapping, to obtain the one or more character strings corresponding with this out-of-sequence text, be " doing soon # " such as index combination " 123 " corresponding character string, index combination " 26 " corresponding character string is " certificates handling ", index combination " 15 " corresponding character string is " fast ".For another example, in step S2, the out-of-sequence text that out-of-sequence text-processing equipment 1 is listed as according to its 9 row 2 that provide in step S1:
×a
×b
Subtract@
Fertile 1
Tea 2
Special
Valency c
Short o
Pin m
Generate the 9x2 rank character matrix A that has identical ranks number with this out-of-sequence text, and each character in will this out-of-sequence text is mapped to the character element of this character matrix correspondence position, as:
In step S2,1 couple of character matrix A of out-of-sequence text-processing equipment carries out the matrix transpose computing, obtains transposed matrix A ':
Figure BDA0000089706910000182
And every row element among the A ' is mapped as a character string corresponding with this out-of-sequence text, such as character string " * * sales promotion of slim tea special price " and " ab@12.com ".Those skilled in the art will be understood that the above-mentioned mode of character string of obtaining is only for giving an example; other existing or modes of obtaining character string that may occur from now on are as applicable to the present invention; also should be included in the protection domain of the present invention, and be contained in this at this with way of reference.
Then, in step S3, out-of-sequence text-processing equipment 1 carries out matching inquiry according to described one or more character strings in the target pattern storehouse, to obtain the target string in the described out-of-sequence text.Particularly, in step S3, out-of-sequence text-processing equipment 1 is according to its one or more character strings that obtain in step S2, for example each character string is carried out successively the inquiry of string matching inquiry or matching regular expressions in the target pattern storehouse, when if the content in this character string and the target pattern storehouse is complementary, then this character string is the target string in this out-of-sequence text, and obtains accordingly the one or more target strings in this out-of-sequence text.At this, described " target pattern storehouse " is used for the regular expression of storage target string and target string, and this target pattern storehouse includes but not limited to relational database, memory storage, harddisk memory etc.At this, described " target string " includes but not limited to telephone number, e-mail address, website URL etc.For example, suppose to comprise in a plurality of character strings corresponding with out-of-sequence text character string " certificates handling ", in step S3, out-of-sequence text-processing equipment 1 is by carrying out matching inquiry with it in the target pattern storehouse, determine that the character string " certificates handling " in this character string and this target pattern storehouse is complementary, obtain thus this character string and be the target string in this out-of-sequence text.For another example, suppose to comprise in a plurality of character strings corresponding with out-of-sequence text character string " ab@12.com ", in step S3, out-of-sequence text-processing equipment 1 is by carrying out matching inquiry with it in the target pattern storehouse, determine in this character string and this target pattern storehouse regular expression "/^ w+ ((-w+) | (. w+)) * [A-Za-z0-9]+((.|-) [A-Za-z0-9]+) * .[A-Za-z0-9]+/ " be complementary, obtain thus this character string and be the target string in this out-of-sequence text.Those skilled in the art will be understood that the mode of above-mentioned acquisition target string is only for giving an example; the mode of other acquisition target strings existing or that may occur from now on is as applicable to the present invention; also should be included in the protection domain of the present invention, and be contained in this at this with way of reference.
Preferably, be to work continuously between above-mentioned each step.Particularly, in step S1, out-of-sequence text-processing equipment 1 obtains pending out-of-sequence text; Subsequently, in step S2, the character in 1 pair of described out-of-sequence text of out-of-sequence text-processing equipment carries out permutation and combination, obtains the one or more character strings corresponding with described out-of-sequence text; Then, in step S3, out-of-sequence text-processing equipment 1 carries out matching inquiry according to described one or more character strings in the target pattern storehouse, to obtain the target string in the described out-of-sequence text; At this, it will be understood by those skilled in the art that " continuing " refers to that each step requires to carry out the obtaining of the obtaining of out-of-sequence text, character string, and the obtaining of target string according to the mode of operation of setting or adjust in real time respectively, until out-of-sequence text-processing equipment 1 stops obtaining pending out-of-sequence text in a long time.
Fig. 4 illustrates the method flow diagram for the target string that obtains out-of-sequence text in accordance with a preferred embodiment of the present invention.
Particularly, in step S21 ', out-of-sequence text-processing equipment 1 generates the character matrix corresponding with described out-of-sequence text according to described out-of-sequence text, wherein, each character in the described out-of-sequence text is corresponding to the character element of correspondence position in the described character matrix; Then, in step S22 ', 1 pair of described character matrix of out-of-sequence text-processing equipment carries out the permutation and combination of described character element by matrix operation, to obtain described one or more character string.More specifically, in step S21 ', out-of-sequence text-processing equipment 1 generates the character matrix of identical ranks number with having of this out-of-sequence text for example according to the ranks number of out-of-sequence text, and each character in will this out-of-sequence text is mapped to the character element of this matrix correspondence position; Then, in step S22 ', out-of-sequence text-processing equipment 1 is for example by carrying out one or more matrix operations to its character matrix that obtains in step S21 ', such as matrix transpose, matrixing etc., obtain corresponding one or more new character matrixs, then, the character element that will be somebody's turn to do delegation in (a bit) new character matrix or multirow is mapped as a character string, with the permutation and combination of realization to out-of-sequence text character, thereby obtain the one or more character strings corresponding with this out-of-sequence text.At this, the mode that the character element that will be somebody's turn to do delegation in (a bit) new character matrix or multirow is mapped as a character string includes but not limited to: by every row from left to right, every row right-to-left, multirow from top to bottom or multirow wait these character elements of order from bottom to top, obtain a character string with splicing, the character element that maybe will be somebody's turn to do every delegation in (a bit) new character matrix connects sequentially from beginning to end and is mapped as a character string.For example, in step S21 ', out-of-sequence text-processing equipment 1 is according to out-of-sequence text:
Perpendicular literary composition
The row word
Generate the 2x2 rank character matrix B corresponding with this out-of-sequence text:
Figure BDA0000089706910000201
In step S22 ', out-of-sequence text-processing equipment 1 obtains a plurality of new character matrixs by it being carried out matrix operation, comprising:
Figure BDA0000089706910000211
Then, each line character of the character matrix that each is new according to from left to right order, multirow character in accordance with the order from top to bottom head and the tail connect and obtain the character string corresponding with B1 " perpendicular style of writing word ", the character string corresponding with B2 " the row word is perpendicular civilian ".At this, above-mentioned matrix operation includes but not limited to matrix transpose, matrixing, matrix multiplication etc.Those skilled in the art will be understood that above-mentioned generation character matrix and the mode that obtains character string only are for example; other generation character matrixs existing or that may occur from now on or the mode that obtains character string are as applicable to the present invention; also should be included in the protection domain of the present invention, and be contained in this at this with way of reference.
Preferably, in step S22 ', out-of-sequence text-processing equipment 1 is according to default out-of-sequence text, described character matrix carried out the permutation and combination of described character element by the matrix operation corresponding with described out-of-sequence text, to obtain described one or more character string.Particularly, in step S22 ', out-of-sequence text-processing equipment 1 is according to default out-of-sequence text, such as perpendicular style of writing originally, diagonal text etc., for example in the matrix operation formula mapping table that presets, obtain the matrix operation formula corresponding with this (a bit) out-of-sequence text, for example perpendicular this corresponding matrix operation formula of style of writing is the matrix transpose formula, according to this (a bit) formula character matrix is carried out matrix operation, and no longer this character matrix is carried out other matrix operations, realize permutation and combination to character in the out-of-sequence text with this, and obtain described one or more character string.Preferred this, can be by the out-of-sequence text that occurred in the application corresponding with this out-of-sequence text be added up, the higher out-of-sequence text of the acquisition frequency of occurrences, and should (a bit) out-of-sequence text is as described default out-of-sequence text.For example, suppose that default out-of-sequence text comprises perpendicular style of writing originally and the diagonal text, described character matrix C is:
Figure BDA0000089706910000213
In step S22 ', out-of-sequence text-processing equipment 1 at first carries out matching inquiry according to perpendicular style of writing text ID originally in matrix operation formula mapping table, obtain the matrix operation formula corresponding with text type, such as matrix transpose, then out-of-sequence text-processing equipment 1 utilizes this matrix operation formula that character matrix C is carried out matrix operation, realizing a kind of permutation and combination of character element in this character matrix, and obtain corresponding with it one or more character strings; Then, out-of-sequence text-processing equipment 1 utilizes the same method to obtain the matrix operation formula corresponding with the diagonal text, realizing the another kind of permutation and combination of character element in this character matrix, and obtains with it one or more character strings of correspondence.At this, described matrix operation formula mapping table maybe can be stored in out-of-sequence text-processing equipment 1, and perhaps the communication interface by agreement directly reads from third party device.Those skilled in the art will be understood that the mode of above-mentioned acquisition character string is only for giving an example; the mode of other acquisition character strings existing or that may occur from now on is as applicable to the present invention; also should be included in the protection domain of the present invention, and be contained in this at this with way of reference.
Preferably, described out-of-sequence text include but not limited to following at least each:
-perpendicular style of writing is originally;
-diagonal text;
-S composes a piece of writing this.
Particularly, out-of-sequence text comprise perpendicular style of writing this, it means word content according to or tactic text from bottom to top from top to bottom, as:
Originally perpendicular
Style of writing
The literary composition row
This is perpendicular
Out-of-sequence text comprises the diagonal text, and it means the text that word content is arranged according to certain slope, as:
Oblique Zhe ﹠amp; ﹠amp;
﹠amp; Row Shi ﹠amp;
﹠amp; Lift lift ﹠amp;
﹠amp; ﹠amp; The example word
Text; Out-of-sequence text comprises the S style of writing originally, and it means the text that word content is arranged according to alphabetical S shape, as:
^^^0^
^^1^^
^0^^^
8^^^^
^9^^^
^^0^^
^^^^0
^^^1^
^^2^^
^3^^^
4^^^^
Text.Those skilled in the art will be understood that above-mentioned every text not only can be separately be used for forming out-of-sequence text, and wherein multinomial combination is for forming out-of-sequence text.Those skilled in the art will be understood that above-mentioned out-of-sequence text only for giving an example, and other out-of-sequence text existing or that may occur from now on also should be included in the protection domain of the present invention as applicable to the present invention, and are contained in this at this with way of reference.
In a preferred embodiment (with reference to Fig. 3), this process also comprises step S4 (not shown).Referring to Fig. 3 the preferred embodiment is described in detail, wherein, in step S1, out-of-sequence text-processing equipment 1 obtains pending out-of-sequence text; Then, in step S2, the character in 1 pair of described out-of-sequence text of out-of-sequence text-processing equipment carries out permutation and combination, obtains the one or more character strings corresponding with described out-of-sequence text; Its detailed process for simplicity's sake, is contained in this with way of reference with aforementioned identical with reference to the performed process of step S1 and step S2 among the described embodiment of Fig. 3, does not give unnecessary details and do not do.
Particularly, in step S4, out-of-sequence text-processing equipment 1 is selected one or more preferred characters sequences according to the screening rule that presets from described one or more character strings; Wherein, in step S3, out-of-sequence text-processing equipment 1 carries out matching inquiry according to described one or more preferred characters sequences in described target pattern storehouse, to obtain described target string.More specifically, in step S4, out-of-sequence text-processing equipment 1 is according to its one or more character strings that obtain in step S2, according to the screening rule that presets, select one or more preferred characters sequences from these one or more character strings, wherein said screening rule includes but not limited to: 1) screen out the identical character string of content in those character strings; 2) screen out and only comprise in those character strings such as $, %, ﹠amp; , * ,~, the character string of the specific character such as space; 3) select wherein only to comprise the character string of specific character, for example only comprise the character string of arabic numeral, only comprise the character string of Chinese character, or only comprise the character string etc. of double-byte characters; Then, in step S3, out-of-sequence text-processing equipment 1 carries out matching inquiry with one or more preferred characters sequences that it obtains in the target pattern storehouse in step S4, to obtain described target string.Those skilled in the art will be understood that the mode of above-mentioned selection preferred characters sequence is only for giving an example; the mode of other selection preferred characters sequences existing or that may occur from now on is as applicable to the present invention; also should be included in the protection domain of the present invention, and be contained in this at this with way of reference.
In a further advantageous embodiment (with reference to Fig. 3), this process also comprises step S5 (not shown).Referring to Fig. 3 the preferred embodiment is described in detail, wherein, in step S1, out-of-sequence text-processing equipment 1 obtains pending out-of-sequence text; In step S3, out-of-sequence text-processing equipment 1 carries out matching inquiry according to described one or more character strings in the target pattern storehouse, to obtain the target string in the described out-of-sequence text; Its detailed process for simplicity's sake, is contained in this with way of reference with aforementioned identical with reference to the performed process of step S1 and step S3 among the described embodiment of Fig. 3, does not give unnecessary details and do not do.
Particularly, in step S5, out-of-sequence text-processing equipment 1 carries out pre-service according to default preprocessing rule to described out-of-sequence text, obtains preprocessed text; Wherein, in step S2, the character in 1 pair of described preprocessed text of out-of-sequence text-processing equipment carries out permutation and combination, obtains the one or more character strings corresponding with described preprocessed text.More specifically, in step S5, out-of-sequence text-processing equipment 1 carries out pre-service according to the default preprocessing rule such as normal text that is converted into such as the filtering specific character, with special-shaped literal to out-of-sequence text, obtains preprocessed text; Then, in step S2, out-of-sequence text-processing equipment 1 carries out permutation and combination to the character in its preprocessed text that obtains in step S5, obtains the one or more character strings corresponding with described preprocessed text.For example, suppose that out-of-sequence text is:
Vertical # # #
The oblique # # of Hang
The capable # of Wen #
Zi # # literary composition
In step S5, each character that out-of-sequence text-processing equipment 1 at first will lose text carries out matching inquiry in the specific character storehouse, obtaining character ' # ' is specific character, and then out-of-sequence text-processing equipment 1 obtains the first pre-service result with character filtering from this out-of-sequence text:
Vertical
Hang is oblique
Wen is capable
The Zi literary composition
Then, out-of-sequence text-processing equipment 1 carries out matching inquiry with each character of this first preprocessed text in special-shaped literal pool, and accordingly Mars literal ' Vertical ' is converted into ' erecting ', ' Hang ' is converted into ' OK ' ‘ Wen ' and is converted into ' literary composition ', ' Zi ' is converted into ' word ', thereby obtains the second pre-service result:
Perpendicular
Row tiltedly
The literary composition row
The word literary composition
, and with this second pre-service result as preprocessed text; Then, in step S2, the character in 1 pair of this preprocessed text of out-of-sequence text-processing equipment carries out permutation and combination, obtains the one or more character strings corresponding with this preprocessed text, such as " perpendicular style of writing word ", " diagonal literary composition ", " tiltedly perpendicular " etc.At this, specific character storehouse in the illustrated embodiment is used for the specific character of storing predetermined justice, include but not limited to relational database, memory storage, harddisk memory etc., special-shaped literal pool in the illustrated embodiment includes but not limited to relational database, memory storage, harddisk memory etc. for the mapping that special-shaped literal such as chrysanthemum body, Mars word of storage reaches the normal text corresponding with it.At this, those skilled in the art will be understood that described specific character storehouse both can be separate with described special-shaped literal pool, also can be integrated in the described special-shaped literal pool.Those skilled in the art will be understood that the pretreated mode of above-mentioned out-of-sequence text is only for giving an example; the pretreated mode of other out-of-sequence texts existing or that may occur from now on is as applicable to the present invention; also should be included in the protection domain of the present invention, and be contained in this at this with way of reference.
Preferably, described preprocessing rule based on but be not limited to that following each carries out pre-service to described out-of-sequence text at least:
Specific character in the described out-of-sequence text of-filtering;
-the special-shaped literal in the described out-of-sequence text is converted into normal text;
-half size character in the described out-of-sequence text is converted to full size character.
Particularly, described preprocessing rule carries out pre-service based on the preprocessing rule of the specific character in the described out-of-sequence text of filtering to out-of-sequence text, this specific character include but not limited to ^, *, |, ◎,,
Figure BDA0000089706910000261
⊙, ★ etc., it can be stored in the specific character storehouse.Described preprocessing rule carries out pre-service based on the preprocessing rule that the special-shaped literal in the described out-of-sequence text is converted into normal text to out-of-sequence text, and this abnormal shape literal includes but not limited to chrysanthemum literary composition, Mars word etc., and it can be stored in the special-shaped literal pool.Described preprocessing rule carries out pre-service based on the preprocessing rule that the half size character in the described out-of-sequence text is converted to full size character to out-of-sequence text, is the single-byte character in the out-of-sequence text all is converted to double-byte characters.Those skilled in the art will be understood that above-mentioned preprocessing rule only for giving an example, and other preprocessing rules existing or that may occur from now on also should be included in the protection domain of the present invention as applicable to the present invention, and are contained in this at this with way of reference.
Preferably, in step S3, out-of-sequence text-processing equipment 1 based on following at least each mode, carries out matching inquiry in the target pattern storehouse according to described one or more character strings, to obtain described target string:
-string matching;
-matching regular expressions.
Particularly, in step S3, out-of-sequence text-processing equipment 1 carries out matching inquiry based on the mode of string matching in the target pattern storehouse, then in step S3, out-of-sequence text-processing equipment 1 for example carries out string matching with its character string that obtains successively with all character strings that inquiry in the target pattern storehouse obtains in step S2, if the match is successful, then this character string is the target string in the out-of-sequence text.In step S3, out-of-sequence text-processing equipment 1 carries out matching inquiry based on the mode of matching regular expressions in the target pattern storehouse, then in step S3, out-of-sequence text-processing equipment 1 for example carries out matching regular expressions with its character string that obtains successively with all regular expressions that inquiry in the target pattern storehouse obtains in step S2, if certain character string in this character string satisfies this regular expression, then this character string is the target string in the out-of-sequence text.Those skilled in the art will be understood that, above-mentioned every matching way not only can be separately be used for 1 pair of character string of out-of-sequence text-processing equipment carry out matching inquiry, and wherein multinomial combination is carried out matching inquiry for 1 pair of character string of out-of-sequence text-processing equipment.Those skilled in the art will be understood that above-mentioned matching way only for giving an example, and other matching ways existing or that may occur from now on also should be included in the protection domain of the present invention as applicable to the present invention, and are contained in this at this with way of reference.
Preferably, described target string include but not limited to following at least each:
-telephone number;
-internet address;
-E-mail address;
-instant messaging account.
Particularly, target string comprises telephone number, such as fixed telephone number, Mobile Directory Number etc.Target string comprises internetwork address, is applied to Uniform Resource Identifier (URI) of internet etc. such as IP address, domain name addresses and other.Target string comprises E-mail address, such as 163 email addresses, hotmail email address etc.Target string comprises the instant messaging account, such as QQ account, msn account etc.Those skilled in the art will be understood that above-mentioned target string only for giving an example, and other target strings existing or that may occur from now on also should be included in the protection domain of the present invention as applicable to the present invention, and are contained in this at this with way of reference.
In another preferred embodiment (with reference to Fig. 3), out-of-sequence text-processing equipment 1 also comprises step S6 (not shown) and step S7 (not shown).Referring to Fig. 3 the preferred embodiment is described in detail, wherein, in step S2, the character in 1 pair of described out-of-sequence text of out-of-sequence text-processing equipment carries out permutation and combination, obtains the one or more character strings corresponding with described out-of-sequence text; In step S3, out-of-sequence text-processing equipment 1 carries out matching inquiry according to described one or more character strings in the target pattern storehouse, to obtain the target string in the described out-of-sequence text; Its detailed process for simplicity's sake, is contained in this with way of reference with aforementioned identical with reference to the performed process of step S2 and step S3 among the described embodiment of Fig. 3, does not give unnecessary details and do not do.
Particularly, in step S1, out-of-sequence text-processing equipment 1 obtains the pending described out-of-sequence text that the user submits to by subscriber equipment; More specifically, the user by with the interactive mode of subscriber equipment, include but not limited to keyboard, mouse, telepilot, touch pad or handwriting equipment, the out-of-sequence text of input in browser software, application program or client software; Take keyboard as example, the user finishes the input of out-of-sequence text at the input text frame of application program, and trigger subscriber equipment this out-of-sequence text is sent to out-of-sequence text-processing equipment 1 according to the communication protocol of arranging via network by clicking " submissions " button or other modes, out-of-sequence text-processing equipment 1 receives this out-of-sequence text in real time by monitoring users message.At this, this subscriber equipment can be any can with the user by the electronic product that keyboard, mouse, telepilot, touch pad or voice-operated device carry out man-machine interaction, include but not limited to computing machine, smart mobile phone, PDA or IPTV etc.Can realize communicating by letter by any communication mode between out-of-sequence text-processing equipment 1 and the subscriber equipment, include but not limited to, based on the mobile communication of 3GPP, LTE, WIMAX, based on the computer network communication of TCP/IP, udp protocol and based on the low coverage wireless transmission method of bluetooth, Infrared Transmission standard.The network that out-of-sequence text-processing equipment 1 is connected with subscriber equipment includes but not limited to: internet, wide area network, Metropolitan Area Network (MAN), LAN (Local Area Network), VPN network, wireless self-organization network (Ad Hoc network) etc.Those skilled in the art will be understood that the above-mentioned mode of out-of-sequence text of obtaining is only for giving an example; other existing or modes of obtaining out-of-sequence text that may occur from now on are as applicable to the present invention; also should be included in the protection domain of the present invention, and be contained in this at this with way of reference.
In step S6, out-of-sequence text-processing equipment 1 carries out aftertreatment according to described target string to described out-of-sequence text, to obtain the aftertreatment text corresponding with described out-of-sequence text; Then, in step S7, out-of-sequence text-processing equipment 1 offers described subscriber equipment with described aftertreatment text.Particularly, in step S6, out-of-sequence text-processing equipment 1 is according to the target string in its out-of-sequence text that obtains in step S3, this out-of-sequence text is carried out aftertreatment, to obtain the aftertreatment text corresponding with this out-of-sequence text, this post processing mode includes but not limited to: 1) target string is deleted from this out-of-sequence text, 2) in this out-of-sequence text, be other meaningless characters with each character replacement in the target string, all deletions etc. of the out-of-sequence content of text that such as the space etc., 3) will comprise target string; Then, out-of-sequence text-processing equipment 1 adopts any known computing machine that the technological means of people's readable information is provided according to its aftertreatment text that obtains in step S6, and such as screen display, loudspeaker broadcast etc. offers subscriber equipment with described aftertreatment text.Take screen display as example, in step S7, out-of-sequence text-processing equipment 1 is with its aftertreatment text that obtains in step S6, utilize page technology, such as JSP, ASP or PHP, offer described subscriber equipment by certain format, such as offering subscriber equipment with forms such as link, page text, browse for the user.Those skilled in the art will be understood that and above-mentioned out-of-sequence text carried out aftertreatment and the mode that the aftertreatment text is provided only for for example; other existing or modes that out-of-sequence text is carried out aftertreatment or the aftertreatment text is provided that may occur from now on are as applicable to the present invention; also should be included in the protection domain of the present invention, and be contained in this at this with way of reference.
Preferably, described in step S6, out-of-sequence text-processing equipment 1 in conjunction with user related information, carries out aftertreatment to described out-of-sequence text also according to described target string, to obtain the aftertreatment text corresponding with described out-of-sequence text.Particularly, described in step S6, out-of-sequence text-processing equipment 1 is also according to described target string, in conjunction with user related informations such as user behavior historical record, user property, described out-of-sequence text is carried out aftertreatment, to obtain the aftertreatment text corresponding with described out-of-sequence text.For example, suppose that out-of-sequence text is user's posting in the network forum, comprise the target strings such as telephone number, e-mail address in this out-of-sequence text, in step S6, out-of-sequence text-processing equipment 1 is by carrying out statistical study to this user's historical behavior record, obtains its time interval of posting continuously to be lower than the time interval threshold value of posting that presets, and determines that it is unartificial posting, accordingly, the full content that out-of-sequence text-processing equipment 1 will this out-of-sequence text is directly deleted.For another example, connect example, in step S6, out-of-sequence text-processing equipment 1 obtains this user from this user property be the edition owner of this network forum space of a whole page, and enjoy a good reputation in default credit worthiness threshold value, accordingly, 1 pair of this out-of-sequence text of out-of-sequence text-processing equipment does not carry out any aftertreatment, directly it is output as the aftertreatment text.At this, the mode of obtaining user related information includes but not limited to: obtains in the left log-on message during by the subscriber equipment log-on webpage according to the user, or according to obtaining etc. in user's historical behavior information of extracting in the cookies information that during the user is by the subscriber equipment browsing page, recorded by subscriber equipment end or network-side or by subscriber equipment.Those skilled in the art will be understood that the above-mentioned mode that out-of-sequence text is carried out aftertreatment is only for for example; other existing or modes that out-of-sequence text is carried out aftertreatment that may occur from now on are as applicable to the present invention; also should be included in the protection domain of the present invention, and be contained in this at this with way of reference.
Preferably, described user related information include but not limited to following at least each:
-user historical behavior record;
-user property;
The address of-subscriber equipment.
Particularly, this user related information comprises user's historical behavior record, the time of posting such as the user, the quantity that the user posts, evaluation of estimate that the user posts etc.For example, the quantity that is regarded as the rubbish note during user's history is posted is more, and then in step S6, the probability of the out-of-sequence text that comprises target string of out-of-sequence this user's issue of text-processing equipment 1 deletion is larger.This user related information comprises user property, such as user's hour of log-on, user's credit worthiness etc.For example, user's hour of log-on is longer, credit worthiness is higher, and then in step S6, the probability of the out-of-sequence text that comprises target string of out-of-sequence this user's issue of text-processing equipment 1 deletion is less.This user related information comprises the address of subscriber equipment.For example, if the address of this subscriber equipment is put into the blacklist of limiting access network forum, then in step S6, out-of-sequence text-processing equipment 1 deletion user is larger by the probability of the out-of-sequence text that this subscriber equipment is submitted to.Preferably, in step S6, out-of-sequence text-processing equipment 1 in conjunction with the combination in any of above-mentioned user related information, carries out aftertreatment to described out-of-sequence text also according to described target string, to obtain the aftertreatment text corresponding with described out-of-sequence text.Those skilled in the art will be understood that above-mentioned user related information only for giving an example, and other user related informations existing or that may occur from now on also should be included in the protection domain of the present invention as applicable to the present invention, and are contained in this at this with way of reference.
To those skilled in the art, obviously the invention is not restricted to the details of above-mentioned example embodiment, and in the situation that does not deviate from spirit of the present invention or essential characteristic, can realize the present invention with other concrete form.Therefore, no matter from which point, all should regard embodiment as exemplary, and be nonrestrictive, scope of the present invention is limited by claims rather than above-mentioned explanation, therefore is intended to include in the present invention dropping on the implication that is equal to important document of claim and all changes in the scope.Any Reference numeral in the claim should be considered as limit related claim.In addition, obviously other unit or step do not got rid of in " comprising " word, and odd number is not got rid of plural number.A plurality of unit of stating in system's claim or device also can be realized by software or hardware by a unit or device.The first, the second word such as grade is used for representing title, and does not represent any specific order.

Claims (24)

1. the method for a computer implemented target string for obtaining out-of-sequence text, wherein, the method may further comprise the steps:
A obtains pending out-of-sequence text;
B carries out permutation and combination to the character in the described out-of-sequence text, obtains the one or more character strings corresponding with described out-of-sequence text;
C carries out matching inquiry according to described one or more character strings in the target pattern storehouse, to obtain the target string in the described out-of-sequence text.
2. method according to claim 1, wherein, described step b also comprises:
B1 generates the character matrix corresponding with described out-of-sequence text according to described out-of-sequence text, and wherein, each character in the described out-of-sequence text is corresponding to the character element of correspondence position in the described character matrix;
B2 carries out the permutation and combination of described character element to described character matrix by matrix operation, to obtain described one or more character string.
3. method according to claim 2, wherein, described step b2 also comprises:
-according to default out-of-sequence text, described character matrix is carried out the permutation and combination of described character element by the matrix operation corresponding with described out-of-sequence text, to obtain described one or more character string.
4. method according to claim 3, wherein, described out-of-sequence text comprise following at least each:
-perpendicular style of writing is originally;
-diagonal text;
-S composes a piece of writing this.
5. each described method in 4 according to claim 1, wherein, the method also comprises:
-according to the screening rule that presets, from described one or more character strings, select one or more preferred characters sequences;
Wherein, described step c also comprises:
-according to described one or more preferred characters sequences, carry out matching inquiry in described target pattern storehouse, to obtain described target string.
6. each described method in 5 according to claim 1, wherein, the method also comprises:
-according to default preprocessing rule described out-of-sequence text is carried out pre-service, obtain preprocessed text;
Wherein, described step b also comprises:
-character in the described preprocessed text is carried out permutation and combination, obtain the one or more character strings corresponding with described preprocessed text.
7. method according to claim 6, wherein, each carries out pre-service to described out-of-sequence text to described preprocessing rule at least based on following:
Specific character in the described out-of-sequence text of-filtering;
-the special-shaped literal in the described out-of-sequence text is converted into normal text;
-half size character in the described out-of-sequence text is converted to full size character.
8. each described method in 7 according to claim 1, wherein, described step c also comprises:
-according to described one or more character strings, based on following at least each mode, carry out matching inquiry in the target pattern storehouse, to obtain described target string:
-string matching;
-matching regular expressions.
9. each described method in 8 according to claim 1, wherein, described target string comprise following at least each:
-telephone number;
-internet address;
-E-mail address;
-instant messaging account.
10. each described method in 9 according to claim 1, wherein, described step a also comprises:
-obtain the user by the pending described out-of-sequence text of subscriber equipment submission;
Wherein, the method also comprises:
R carries out aftertreatment according to described target string to described out-of-sequence text, to obtain the aftertreatment text corresponding with described out-of-sequence text;
-described aftertreatment text is offered described subscriber equipment.
11. method according to claim 10, wherein, described step r also comprises:
-according to described target string, in conjunction with user related information, described out-of-sequence text is carried out aftertreatment, to obtain the aftertreatment text corresponding with described out-of-sequence text.
12. method according to claim 11, wherein, described user related information comprise following at least each:
-user historical behavior record;
-user property;
The address of-subscriber equipment.
13. the equipment for the target string that obtains out-of-sequence text, wherein, this equipment comprises:
The text deriving means is used for obtaining pending out-of-sequence text;
The permutation and combination device is used for the character of described out-of-sequence text is carried out permutation and combination, obtains the one or more character strings corresponding with described out-of-sequence text;
The matching inquiry device is used for according to described one or more character strings, carries out matching inquiry in the target pattern storehouse, to obtain the target string in the described out-of-sequence text.
14. equipment according to claim 13, wherein, described permutation and combination device also comprises:
The matrix generation unit is used for according to described out-of-sequence text, generates the character matrix corresponding with described out-of-sequence text, and wherein, each character in the described out-of-sequence text is corresponding to the character element of correspondence position in the described character matrix;
The matrix operation unit is used for described character matrix is carried out the permutation and combination of described character element by matrix operation, to obtain described one or more character string.
15. equipment according to claim 14, wherein, described matrix operation unit also is used for according to default out-of-sequence text, described character matrix is carried out the permutation and combination of described character element by the matrix operation corresponding with described out-of-sequence text, to obtain described one or more character string.
16. equipment according to claim 15, wherein, described out-of-sequence text comprise following at least each:
-perpendicular style of writing is originally;
-diagonal text;
-S composes a piece of writing this.
17. each described equipment in 16 according to claim 13, wherein, this equipment also comprises:
The character string screening plant is used for according to the screening rule that presets, and selects one or more preferred characters sequences from described one or more character strings;
Wherein, described matching inquiry device also is used for according to described one or more preferred characters sequences, carries out matching inquiry in described target pattern storehouse, to obtain described target string.
18. each described equipment in 17 according to claim 13, wherein, this equipment also comprises:
Pretreatment unit is used for according to default preprocessing rule described out-of-sequence text being carried out pre-service, obtains preprocessed text;
Wherein, described permutation and combination device also is used for the character of described preprocessed text is carried out permutation and combination, obtains the one or more character strings corresponding with described preprocessed text.
19. equipment according to claim 18, wherein, each carries out pre-service to described out-of-sequence text to described preprocessing rule at least based on following:
Specific character in the described out-of-sequence text of-filtering;
-the special-shaped literal in the described out-of-sequence text is converted into normal text;
-half size character in the described out-of-sequence text is converted to full size character.
20. each described equipment in 19 according to claim 13, wherein, described matching inquiry device also is used for according to described one or more character strings, based on following at least each mode, carry out matching inquiry in the target pattern storehouse, to obtain described target string:
-string matching;
-matching regular expressions.
21. each described equipment in 20 according to claim 13, wherein, described target string comprise following at least each:
-telephone number;
-internet address;
-E-mail address;
-instant messaging account.
22. each described equipment in 21 according to claim 13, wherein, described text deriving means also is used for obtaining the pending described out-of-sequence text that the user submits to by subscriber equipment;
Wherein, this equipment also comprises:
After-treatment device is used for according to described target string described out-of-sequence text being carried out aftertreatment, to obtain the aftertreatment text corresponding with described out-of-sequence text;
Generator is used for described aftertreatment text is offered described subscriber equipment.
23. equipment according to claim 22, wherein, described after-treatment device also is used for according to described target string, in conjunction with user related information, described out-of-sequence text is carried out aftertreatment, to obtain the aftertreatment text corresponding with described out-of-sequence text.
24. equipment according to claim 23, wherein, described user related information comprise following at least each:
-user historical behavior record;
-user property;
The address of-subscriber equipment.
CN201110264447.6A 2011-09-07 2011-09-07 Method and device used for obtaining target character strings in disorder text Active CN102982012B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110264447.6A CN102982012B (en) 2011-09-07 2011-09-07 Method and device used for obtaining target character strings in disorder text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110264447.6A CN102982012B (en) 2011-09-07 2011-09-07 Method and device used for obtaining target character strings in disorder text

Publications (2)

Publication Number Publication Date
CN102982012A true CN102982012A (en) 2013-03-20
CN102982012B CN102982012B (en) 2017-03-22

Family

ID=47856054

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110264447.6A Active CN102982012B (en) 2011-09-07 2011-09-07 Method and device used for obtaining target character strings in disorder text

Country Status (1)

Country Link
CN (1) CN102982012B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104111932A (en) * 2013-04-17 2014-10-22 北京启明星辰信息技术股份有限公司 Recognition method and device of ID (identity) card numbers
CN104699662A (en) * 2015-03-18 2015-06-10 北京交通大学 Method and device for recognizing whole symbol string
CN106547727A (en) * 2015-09-23 2017-03-29 北京国双科技有限公司 The method and device of string processing
CN106815249A (en) * 2015-11-30 2017-06-09 腾讯科技(深圳)有限公司 Vertical text advertisements filter method and device
CN111767908A (en) * 2019-04-02 2020-10-13 顺丰科技有限公司 Character detection method, device, detection equipment and storage medium
CN115410207A (en) * 2021-05-28 2022-11-29 国家计算机网络与信息安全管理中心天津分中心 Detection method and device for vertical texts

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4831775A (en) * 1988-11-21 1989-05-23 Sherman Daniel A Spill resistant rodent bait station
CN1369831A (en) * 2001-01-15 2002-09-18 精工爱普生株式会社 Character processing method, its device and storage medium
CN101876965A (en) * 2009-04-30 2010-11-03 国际商业机器公司 Method and system used for processing text
CN101986368A (en) * 2010-11-16 2011-03-16 无敌科技(西安)有限公司 Language learning electronic device and transposition word display method thereof
CN102110229A (en) * 2009-12-29 2011-06-29 欧姆龙株式会社 Word recognition method, and information processing device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4831775A (en) * 1988-11-21 1989-05-23 Sherman Daniel A Spill resistant rodent bait station
CN1369831A (en) * 2001-01-15 2002-09-18 精工爱普生株式会社 Character processing method, its device and storage medium
CN101876965A (en) * 2009-04-30 2010-11-03 国际商业机器公司 Method and system used for processing text
CN102110229A (en) * 2009-12-29 2011-06-29 欧姆龙株式会社 Word recognition method, and information processing device
CN101986368A (en) * 2010-11-16 2011-03-16 无敌科技(西安)有限公司 Language learning electronic device and transposition word display method thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李海柱: "蒙古文网页中字体嵌入和文字竖排研究", 《内蒙古师范大学学报(自然科学汉文版)》, vol. 34, no. 1, 31 March 2005 (2005-03-31), pages 48 - 52 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104111932A (en) * 2013-04-17 2014-10-22 北京启明星辰信息技术股份有限公司 Recognition method and device of ID (identity) card numbers
CN104111932B (en) * 2013-04-17 2018-02-13 北京启明星辰信息技术股份有限公司 A kind of recognition methods of ID card No. and device
CN104699662A (en) * 2015-03-18 2015-06-10 北京交通大学 Method and device for recognizing whole symbol string
CN104699662B (en) * 2015-03-18 2017-12-22 北京交通大学 The method and apparatus for identifying overall symbol string
CN106547727A (en) * 2015-09-23 2017-03-29 北京国双科技有限公司 The method and device of string processing
CN106815249A (en) * 2015-11-30 2017-06-09 腾讯科技(深圳)有限公司 Vertical text advertisements filter method and device
CN106815249B (en) * 2015-11-30 2022-01-07 腾讯科技(深圳)有限公司 Vertical text advertisement filtering method and device
CN111767908A (en) * 2019-04-02 2020-10-13 顺丰科技有限公司 Character detection method, device, detection equipment and storage medium
CN115410207A (en) * 2021-05-28 2022-11-29 国家计算机网络与信息安全管理中心天津分中心 Detection method and device for vertical texts
CN115410207B (en) * 2021-05-28 2023-08-29 国家计算机网络与信息安全管理中心天津分中心 Detection method and device for vertical text

Also Published As

Publication number Publication date
CN102982012B (en) 2017-03-22

Similar Documents

Publication Publication Date Title
EP2965472B1 (en) Document classification using multiscale text fingerprints
CN102982012A (en) Method and device used for obtaining target character strings in disorder text
CN105677787B (en) Information retrieval device and information search method
CN103617267B (en) Socialized extension search method, device and system
CN103810268B (en) Search result recommendation information loading method, device and system and URL detection method, device and system
CN102999572A (en) User behavior mode digging system and user behavior mode digging method
CN110166465A (en) Processing method, device, server and the storage medium of access request
CN105391674A (en) Information processing method and system, server, and client
CN102402613A (en) System and method for filtering text information of webpage
CN105162697A (en) Microblog system and data control method thereof
CN103544150B (en) For browser of mobile terminal provides the method and system of recommendation information
CN111881337B (en) Data acquisition method and system based on Scapy framework and storage medium
CN102682011B (en) Method, device and system for establishing domain description name information sheet and searching
CN103793508B (en) A kind of loading recommendation information, the methods, devices and systems of network address detection
CN101739412A (en) Web page safety evaluating device and web page safety evaluating method for intelligent card
CN106803032A (en) Realize method, device and client device that website fingerprint is logged in
US10915913B2 (en) Data structures for categorizing and filtering content
CN102185830A (en) Method and system for security filtration of network television browser
CN110457434A (en) Webpage evidence collecting method, device, readable storage medium storing program for executing and server based on search
CN102982011A (en) Method and device for identifying out-of-sequence texts
CN101964792B (en) Multimode mapping based strong authentication method
CN102662977A (en) Implementation method of self-learning post auditing system
CN104063779B (en) Mailbox annex method for down loading and system
CN104965926B (en) Webpage providing method and device
CN108009206A (en) The management method and device of hyperlink

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant