Search Images Maps Play YouTube News Gmail Drive More »
Sign in
Screen reader users: click this link for accessible mode. Accessible mode has the same essential features but works better with your reader.

Patents

  1. Advanced Patent Search
Publication numberCN104111932 A
Publication typeApplication
Application numberCN 201310134135
Publication date22 Oct 2014
Filing date17 Apr 2013
Priority date17 Apr 2013
Publication number201310134135.2, CN 104111932 A, CN 104111932A, CN 201310134135, CN-A-104111932, CN104111932 A, CN104111932A, CN201310134135, CN201310134135.2
Inventors许金鹏, 王洋, 李健安, 薛萍
Applicant北京启明星辰信息技术股份有限公司, 北京启明星辰信息安全技术有限公司
Export CitationBiBTeX, EndNote, RefMan
External Links: SIPO, Espacenet
Recognition method and device of ID (identity) card numbers
CN 104111932 A
Abstract
The invention discloses a recognition method and device of ID (identity) card numbers and overcomes the defect that a mature method of recognizing ID card numbers from network data lacks at present. The recognition method includes: recognizing a character string, which may be an ID card number, from a character stream; verifying the character string, which many be the ID card number, according to ID card number encoding rules; using the ID card number, which may be the ID card number and which passes verification, as a positively effective ID card number. The recognition method and device of the ID card numbers according to the embodiment has the advantages that high-traffic webpage contents can be filtered and recognized in the network security field, accuracy is high, the speed is high, and small memory space is used in hardware equipment.
Claims(10)  translated from Chinese
1. 一种身份证号码的识别方法,包括: 从字符流中识别出可能为身份证号码的字符串; 采用身份证号码编码规则对所述可能为身份证号码的字符串进行验证; 将通过验证的所述可能为身份证号码的字符串作为可能有效的身份证号码。 An identification card number, comprising: identifying a character stream from the possible to the ID number of the string; use of identification numbers on the coding rule may be verified as a string ID number; through verification of the identity card number may be a string of valid ID numbers as possible.
2. 根据权利要求1所述的方法,其中,从字符流中识别出可能为身份证号码的字符串, 包括: 采用能够对有效数字、可忽略字符、非法字符以及可能的结束符进行区分的哈希表对所述字符流进行字符识别; 将十八位数字组成的字符串或者十七位数字以及一位X或X的字符串识别为所述可能为身份证号码的字符串。 2. The method of claim 1, wherein identifying the character stream for the ID number in the string may include: the use of significant figures can be ignored character, the illegal character and possible endings to distinguish hash table for the character stream for character recognition; the string of eight digits or seventeen identification numbers, and a string of X or X is the ID number of the string may be.
3. 根据权利要求1所述的方法,其中,采用身份证号码编码规则对所述可能为身份证号码的字符串进行验证,包括: 采用身份证号码编码规则对所述可能为身份证号码的字符串进行身份证号码的地址码、出生日期码以及校验码的验证。 3. The method of claim 1, wherein the encoding rules for the use of identification numbers may be verified as the ID number of the string, including: the identification number of the encoding rules may ID number string ID number of the address code, date of birth verification code and verification code.
4. 根据权利要求3所述的方法,其中,采用身份证号码编码规则对所述可能为身份证号码的字符串进行身份证号码的地址码的验证,包括: 利用二维数组对所述可能为身份证号码的字符串中的前六位字符进行所述地址码的验证。 4. The method of claim 3, wherein, using the ID number of the coding rules validation may be a string ID number ID number of the address code, comprising: a two-dimensional array of the possible to verify the address of a string of code ID number in the first six characters.
5. 根据权利要求4所述的方法,其中: 利用所述二维数组的第一维对所述可能为身份证号码的字符串中的前三位字符进行所述地址码的大行政区号码的验证,利用所述二维数组的第二维对所述可能为身份证号码的字符串中的第四至六位字符进行所述地址码的区内号码的验证; 其中,所述二维数组第一维的长度大于等于三位数字所表示的有效的大行政区号码的数量,第二维的长度大于等于所有有效的大行政区号码内三位数字所表示的区内号码的数量。 5. The method of claim 4, wherein: the use of the two-dimensional array of the first dimension of the possible identification number to a string in the first three characters of the address code of a large number of administrative verification, using the two-dimensional array of the second dimension of the possible identification number to a string in the fourth to six characters for verification of the address area code numbers; wherein the two-dimensional array The first dimension is longer than the number of valid administrative regions equal number of three-digit number indicates, the second dimension of length greater than equal to the number of the district number in all valid administrative regions represented by the three-digit number.
6. -种身份证号码的识别装置,包括: 识别模块,配置为从字符流中识别出可能为身份证号码的字符串; 验证模块,配置为采用身份证号码编码规则对所述可能为身份证号码的字符串进行验证; 执行模块,配置为将通过验证的所述可能为身份证号码的字符串作为可能有效的身份证号码。 6. - kind of identification number identification means, comprising: identifying module configured to identify a character stream from a string ID number may be; verification module configured to use the ID number for the encoding rules may be identity card number to verify the string; execution module may be configured to string ID number and valid identity number as possible through the verification.
7. 根据权利要求6所述的装置,其中,所述识别模块包括: 识别单元,配置为采用能够对有效数字、可忽略字符、非法字符以及可能的结束符进行区分的哈希表对所述字符流进行字符识别; 判断单元,配置为将十八位数字组成的字符串或者十七位数字以及一位X或X的字符串识别为所述可能为身份证号码的字符串。 7. The apparatus of claim 6, wherein the identification module comprises: identifying unit configured to be able to effectively use digital, you can ignore the character, the illegal character and possible endings to distinguish the hash table character recognition character stream; judging unit configured to identify a string string of eight digits or seventeen numbers and one X or X is the ID number of the string may be.
8. 根据权利要求6所述的装置,其中,所述验证模块包括: 第一验证单元,配置为采用身份证号码编码规则对所述可能为身份证号码的字符串进行身份证号码的地址码的验证; 第二验证单元,配置为采用身份证号码编码规则对所述可能为身份证号码的字符串进行身份证号码的出生日期码的验证; 第三验证单元,配置为采用身份证号码编码规则对所述可能为身份证号码的字符串进行身份证号码的校验码的验证。 8. The apparatus of claim 6, wherein said verification module comprises: a first authentication unit configured to encoding rules for use of the ID number may be a string of the ID number of the ID number address code verification; second verification unit configured to use the ID number coding rules validation on the identity card number may be a string date of birth identification number code; third verification unit configured to use the ID number coding Rules for the possibility to verify the ID number of the checksum of a string ID number.
9. 根据权利要求8所述的装置,其中: 所述第一验证单元配置为利用二维数组对所述可能为身份证号码的字符串中的前六位字符进行所述地址码的验证。 9. The device according to claim 8, wherein: the first verification unit is configured to use a two-dimensional array of the possible identification number to a string in the first six characters of the address verification code.
10. 根据权利要求9所述的装置,其中: 所述第一验证单元配置为利用所述二维数组的第一维对所述可能为身份证号码的字符串中的前三位字符进行所述地址码的大行政区号码的验证,利用所述二维数组的第二维对所述可能为身份证号码的字符串中的第四至六位字符进行所述地址码的区内号码的验证; 其中,所述二维数组第一维的长度大于等于三位数字所表示的有效的大行政区号码的数量,第二维的长度大于等于所有有效的大行政区号码内三位数字所表示的区内号码的数量。 10. The apparatus according to claim 9, wherein: the first verification unit is configured to use the first dimension of said two-dimensional arrays may be a string ID number in the first three characters of the verify the address above code large number of administrative regions, using the two-dimensional array of the second dimension of the possible identification number to a string in the fourth to six characters for verification of the address area code numbers ; wherein the two-dimensional array length is greater than the first dimension equal to the number of administrative regions effective three-digit number indicates, the second dimension of length greater than or equal area all valid administrative areas represented by three-digit numbers the number of internal numbers.
Description  translated from Chinese
一种身份证号码的识别方法及装置 Identification Method and device identification number

技术领域 Technical Field

[0001] 本发明涉及号码识别技术,尤其涉及一种身份证号码的识别方法及装置。 [0001] The present invention relates to a number identification technology, and more particularly to a method and apparatus for identifying ID number.

背景技术 Background

[0002] 身份证号码的识别在网路安全领域中,如入侵检测、短消息过滤、信息查询等方面均有重要的应用。 [0002] identification card number in the field of network security, such as intrusion detection, short message filtering, information and so on are important applications. 随着网络技术的发展和Internet的普及,个人信息等网络信息的安全和保护,越来越受到人们的重视。 With the popularity of security and protection of personal information network technology and the Internet and other network information, more and more people's attention.

[0003] 本文中身份证号码的识别,是指在网路信息处理系统中,从网络数据或者从文件数据(如网页)中,识别并提取出有效的身份证号码。 [0003] The identification card number herein, refers to the network information processing system, from network data or data from a file (such as web pages), and to identify and extract the valid ID number. 这些号码有可能是网路上某些人的违规发布,容易造成个人隐私信息的泄露。 These numbers may be on the Internet by publishing some people, likely to cause leakage of private information.

[0004] 可以通过识别身份证号码,以及预警或者隐藏等手段,来保护身份证信息的违规泄漏。 [0004] by identifying the ID number, as well as warning or hide other means to protect the illegal leakage identification information. 但目前这方面还没有较为成熟的技术。 But this has not yet mature technology.

发明内容 DISCLOSURE

[0005] 本发明所要解决的技术问题是克服目前还没有较为成熟的从网络数据中识别身份证号码的缺陷。 Technical problems to be solved by the invention [0005] This is not yet mature to overcome identification card number from the network data defects.

[0006] 为了解决上述技术问题,本发明提供了一种身份证号码的识别方法,包括: [0006] In order to solve the above problems, the present invention provides a method for the identification card numbers, including:

[0007] 从字符流中识别出可能为身份证号码的字符串; [0007] identify possible for the ID number of the string from the character stream;

[0008] 采用身份证号码编码规则对所述可能为身份证号码的字符串进行验证; [0008] The identification numbers on the coding rule may be verified as a string ID number;

[0009] 将通过验证的所述可能为身份证号码的字符串作为可能有效的身份证号码。 [0009] The string ID number may be as likely to be effective through the verification of the identity card number.

[0010] 优选地,从字符流中识别出可能为身份证号码的字符串,包括: [0010] Preferably, the recognition from the character stream for the ID number of the possible string, including:

[0011] 采用能够对有效数字、可忽略字符、非法字符以及可能的结束符进行区分的哈希表对所述字符流进行字符识别; [0011] The effective figures can be ignored character, the illegal character and possible endings to distinguish hash table for the character stream for character recognition;

[0012] 将十八位数字组成的字符串或者十七位数字以及一位X或X的字符串识别为所述可能为身份证号码的字符串。 [0012] The string eighteen or seventeen digit identification numbers, and a string of X or X is the ID number of the string may be.

[0013] 优选地,采用身份证号码编码规则对所述可能为身份证号码的字符串进行验证, 包括: [0013] Preferably, the use of identification numbers on the coding rule may be verified as a string ID number, comprising:

[0014] 采用身份证号码编码规则对所述可能为身份证号码的字符串进行身份证号码的地址码、出生日期码以及校验码的验证。 [0014] The identification numbers on the coding rule may be a string ID number ID number of the address code, date of birth verification code and verification code.

[0015] 优选地,采用身份证号码编码规则对所述可能为身份证号码的字符串进行身份证号码的地址码的验证,包括: [0015] Preferably, the use of the ID number of the coding rules validation may be a string ID number ID number of the address code, including:

[0016] 利用二维数组对所述可能为身份证号码的字符串中的前六位字符进行所述地址码的验证。 [0016] The use of two-dimensional array of the possible verification of the address code is a string ID number in the first six characters.

[0017] 优选地,利用所述二维数组的第一维对所述可能为身份证号码的字符串中的前三位字符进行所述地址码的大行政区号码的验证,利用所述二维数组的第二维对所述可能为身份证号码的字符串中的第四至六位字符进行所述地址码的区内号码的验证; [0017] Preferably, with the first dimension of the two-dimensional array may be a string ID number in the first three characters of the address verification big administrative code number, using the two-dimensional The second dimension of the array may be a string ID number in the fourth to six characters for verification of the address code of the district number;

[0018] 其中,所述二维数组第一维的长度大于等于三位数字所表示的有效的大行政区号码的数量,第二维的长度大于等于所有有效的大行政区号码内三位数字所表示的区内号码的数量。 [0018] wherein the first dimension of the two-dimensional array of length greater than or equal the number of valid administrative regions represented three-digit number, the second dimension of length greater than or equal effective administrative regions in all the three-digit number represented the number of district numbers.

[0019] 本申请还提供了一种身份证号码的识别装置,包括: [0019] The present application also provides a number of identification card, comprising:

[0020] 识别模块,配置为从字符流中识别出可能为身份证号码的字符串; [0020] The identification module configured to identify possible for the ID number of string from the character stream;

[0021] 验证模块,配置为采用身份证号码编码规则对所述可能为身份证号码的字符串进行验证; [0021] The authentication module is configured to use the ID number of the encoding rules could be verified as a string ID number;

[0022] 执行模块,配置为将通过验证的所述可能为身份证号码的字符串作为可能有效的身份证号码。 [0022] The execution module may be configured to a string ID number and valid identity number as possible through the verification.

[0023] 优选地,所述识别模块包括: [0023] Preferably, the identification module comprising:

[0024] 识别单元,配置为采用能够对有效数字、可忽略字符、非法字符以及可能的结束符进行区分的哈希表对所述字符流进行字符识别; [0024] The identification unit is configured to be able to effectively use digital, you can ignore the character, the illegal character and possible endings to distinguish hash table for the character stream for character recognition;

[0025] 判断单元,配置为将十八位数字组成的字符串或者十七位数字以及一位X或X的字符串识别为所述可能为身份证号码的字符串。 [0025] judgment unit configured to identify a string string of eight digits or seventeen numbers and one X or X is the ID number of the string may be.

[0026] 优选地,所述验证模块包括: [0026] Preferably, the authentication module comprising:

[0027] 第一验证单元,配置为采用身份证号码编码规则对所述可能为身份证号码的字符串进行身份证号码的地址码的验证; [0027] The first verification unit configured to use the ID number of the coding rules validation may be a string ID number ID number of the address code;

[0028] 第二验证单元,配置为采用身份证号码编码规则对所述可能为身份证号码的字符串进行身份证号码的出生日期码的验证; [0028] The second verification unit configured to use the ID number coding rules validation on the identity card number may be a string date of birth identification number code;

[0029] 第三验证单元,配置为采用身份证号码编码规则对所述可能为身份证号码的字符串进行身份证号码的校验码的验证。 [0029] The third verification unit configured to use the ID number of the coding rules validation may be a string ID number ID number of checksum.

[0030] 优选地,所述第一验证单元配置为利用二维数组对所述可能为身份证号码的字符串中的前六位字符进行所述地址码的验证。 [0030] Preferably, the first verification unit is configured to use a two-dimensional array to a string on the possible identity card number of the first six characters of the address verification code.

[0031] 优选地,所述第一验证单元配置为利用所述二维数组的第一维对所述可能为身份证号码的字符串中的前三位字符进行所述地址码的大行政区号码的验证,利用所述二维数组的第二维对所述可能为身份证号码的字符串中的第四至六位字符进行所述地址码的区内号码的验证; [0031] Preferably, the first verification unit is configured to use the first dimension of the two-dimensional array may be a string ID number in the first three characters of the address code of a large number of administrative verification, using the two-dimensional array of the second dimension of the possible identification number to a string in the fourth to six characters for verification of the address code of the district number;

[0032] 其中,所述二维数组第一维的长度大于等于三位数字所表示的有效的大行政区号码的数量,第二维的长度大于等于所有有效的大行政区号码内三位数字所表示的区内号码的数量。 [0032] wherein the first dimension of the two-dimensional array of length greater than or equal the number of valid administrative regions represented three-digit number, the second dimension of length greater than or equal effective administrative regions in all the three-digit number represented the number of district numbers.

[0033] 与现有技术相比,本申请的实施例可以在网路安全领域对大流量的网页内容进行过滤识别,准确性高、速度快、硬件设备上占用内存空间小。 [0033] Compared with the prior art, the application of the embodiments may cybersecurity field for web content filtering to identify high-volume, high accuracy, speed, take up memory space on a small hardware device. 本申请的实施例通过对将三位大区代码的数值映射为一个数值为0-149的有效区间,节约了内存,降低了成本。 Embodiments of the present application by the three large area of the numerical code mapped to a valid value range 0-149, saving memory, reducing costs.

[0034] 本发明的其它特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本发明而了解。 [0034] Other features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or be learned by practice of the invention. 本发明的目的和其他优点可通过在说明书、权利要求书以及附图中所特别指出的结构来实现和获得。 The objectives and other advantages of the present invention may be in the description, the drawings in the book and the structure particularly pointed out to achieve and get the claims.

附图说明 Brief Description

[0035] 附图用来提供对本发明技术方案的进一步理解,并且构成说明书的一部分,与本申请的实施例一起用于解释本发明的技术方案,并不构成对本发明技术方案的限制。 [0035] The drawings used to provide technical solutions to further understanding of the present invention and constitute part of the specification, and the embodiment of the present application, serve to explain the technical aspect of the present invention and are not construed as limiting aspect of the present invention.

[0036] 图1为本申请实施例的身份证号码的识别方法的流程示意图。 [0036] FIG. 1 is a schematic flow of an application ID number identification methods implemented.

[0037] 图2为本申请实施例的身份证号码的识别系统的流程示意图。 [0037] FIG. 2 is a schematic flow of an application ID number recognition system implementation.

具体实施方式 DETAILED DESCRIPTION

[0038] 以下将结合附图及实施例来详细说明本发明的实施方式,借此对本发明如何应用技术手段来解决技术问题,并达成技术效果的实现过程能充分理解并据以实施。 [0038] The following embodiments in conjunction with the accompanying drawings and described in detail embodiments of the present invention, whereby the present invention is how to apply technology to solve technical problems and to reach a technical effect of the implementation process to fully understand and implement accordingly. 本申请实施例以及实施例中的各个特征在不相冲突前提下的相互结合,均在本发明的保护范围之内。 This application examples and examples of various features combined with each other without the premise of conflict, are within the scope of the present invention.

[0039] 另外,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行。 [0039] In addition, the steps of the process illustrated in the drawings can be performed, such as a set of computer-executable instructions of a computer system. 并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。 Further, although in the flowchart shown in a logical order, but in some cases, may be different from the order of the steps herein described or illustrated.

[0040] 我国目前应用的二代身份证,包含有十八位号码,其编码规则是十八位号码中包含十七位数字本体码和位于十七位数字本体码之后的一位数字校验码。 [0040] China's current application of second-generation ID card, containing eighteen-digit number, the encoding rules are eighteen seventeen digit number includes the body code and check digit code located seventeen digital body after code. 其中十七位数字本体码中依次包含六位数字地址码、八位数字出生日期码和三位数字顺序码。 17 of them-digit code in the body in turn contains six digit address code, date of birth eight-digit code and three-digit order code. 其中六位数字地址码表示编码对象常住户口所在地的行政区划代码;出生日期码表示编码对象出生的年、月、日,其中年份用四位数字表示,年、月、日之间不用分隔符;顺序码表示同一地址码所标识的区域范围内,对同年同月同日出生的人员编定的顺序号。 Where the six-digit address code indicates coded permanent residence administrative division code; date of birth code is the year coded born, month, day, year in which four digits without separators between year, month, day; order code represents within the region identified by the same address code, the same year to persons born on the same day scheduled sequence number. 校验码是根据前面十七位数字码,按照IS07064 :1983. M0D11-2校验算法计算出来的结果。 Check code is based on the previous seventeen digit code, according to IS07064: 1983 M0D11-2 results calculated checksum algorithm.

[0041] 六位数字地址码中,第一、二位数字表示编码对象常住户口所在地的省份代码,第三、四位数字表示编码对象常住户口所在地的城市代码,第五、六位数字表示编码对象常住户口所在地的区县代码。 [0041] The six-digit address code, the first two numbers indicate the location of permanent residence coded province code, third and fourth digits coded permanent residence in the city code, fifth, six digits represent coding Object permanent residence in the county code. 也可以将六位数字地址码中的前三位作为大行政区号码,后三位作为区内号码。 The six-digit code also address the top three can be used as a large administrative number, the latter three as the region's number.

[0042] 八位出生日期码中,依次为四位年份码、两位月份码及两位日期码。 [0042] The eight date of birth code, followed by the four-digit year code, the two month yards and two date codes. 四位年份码的有效值为大于等于1900小于等于2100。 Valid values for the four years 1900 yards less than or equal to 2100. 两位月份码的有效值为大于等于1小于等于12。 Valid values for month two yards less than or equal to 1 12. 两位日期码在两位月份码为1、3、5、7、8、10以及12时为大于等于1小于等于31,在两位月份码为4、6、9以及11时为大于等于1小于等于30,在两位月份码为2且四位年份码表示闰年时为大于等于1小于等于29,在两位月份码为2且四位年份码表示平年时为大于等于1 小于等于28。 Two date code in two month code 1,3,5,7,8,10 and 12:00 to 31 less than or equal to 1, the two month code is 4,6,9 and 11 o'clock is greater than or equal to 1 less than or equal to 30, in two month code is 2 and the four-digit year code indicates a leap year when greater than or equal to 1 or less 29 yards for a 2 month in two and four-digit year code indicates an average year when less than or equal to 28.

[0043] 三位顺序码中的前两位表示编码对象常住户口所在地政府所辖派出所的分配码, 第三位通过奇偶来表示性别。 [0043] three sequential code represents the first two encoding target permanent residence under the jurisdiction of the police station of government allocation code, and the third is represented by gender parity. 本申请的实施例并不对三位顺序码进行验证和确认。 Embodiments of the present application is not on the three sequential code verification and validation.

[0044] -位验证码可以根据十七位数字本体码以及标准算法进行验算。 [0044] - checking digit verification code can be based on seventeen digital body code and standard algorithms. 在其与验算结果相等时表示该十七位数字本体码及该验证码为一真实有效的身份证号码,与验算结果不相符时表示该十七位数字本体码及该验证码为一无效的身份证号码。 It indicates that the seventeen-digit code and the body is a real and effective verification code ID number, when the results are not consistent with the checking indicates that the body of seventeen-digit number and the verification code is a void when it is equal to the checking result ID number.

[0045] 现实生活中,为方便读写身份证号码,往往会在身份证号码中间插入一些辅助字符,比如破折号或者点号等,来对身份证号码进行分节。 [0045] In real life, the ID number for the convenience of reading and writing, often inserted in the middle of the ID card number ancillary characters, such as dashes or dots, etc., to carry out sub-sections of the identity card number.

[0046] 如图1所示,本申请实施例的身份证号码的识别方法主要包括如下内容。 [0046] 1, the identification method ID number of the presently filed embodiment mainly includes the following.

[0047] S110,利用哈希表从网络数据的字符流中识别出可能为身份证号码的字符串。 [0047] S110, a hash table for the ID number to identify the likely character string from the stream network data.

[0048] 本申请的实施例利用能够对有效数字、可忽略字符、非法字符以及可能的结束符进行区分的8位字节的256项字符哈希(hash)表进行字符识别。 [0048] The present application embodiment utilizes capable of significant figures, the characters can be ignored, illegal characters and possible endings to distinguish eight 256-byte character hash (hash) table for character recognition. 该字符哈希表为有效数字、可忽略字符、非法字符以及可能的结束符设置不同的值。 The hash table is a valid number of characters, the characters can be ignored, illegal characters and possible endings to different values. 具体地,本申请的实施例将有效数字的值设为1、可忽略字符的值设为2、非法字符的值设为3以及将可能的结束符的值设为4。 In particular, embodiments of the present application will be valid numeric value is set to 1, the value of the character set to 2 can be ignored, the illegal character of the value to the value of 3, and the possible endings to 4.

[0049] 本申请的实施例中该字符串包含有可忽略的辅助字符、数字以及可能存在的结束符,该结束符表示字符X或X。 [0049] The present application embodiment the string contains negligible secondary characters, numbers, and possible terminator, the terminator represents the character X or X. 其中该字符串包含该结束符X或X时,该字符串中的数字为十七位;该字符串不包含该结束符X或X时,该字符串中的数字为十八位。 Which at the end of the string contains character X or X, the numbers in a string of seventeen; the string does not contain the terminator X or X, the numbers in a string of eighteen. 将十八位数字组成的字符串或者十七位数字以及一位X或X的字符串识别为可能为身份证号码的字符串。 The string of eight digits or seventeen identification numbers and strings A string of X or X's identity card number may be.

[0050] S120,采用身份证号码编码规则对该字符串进行验证。 [0050] S120, using the ID number of the string encoding rules for verification. 其中该验证,主要是按照身份证号码编码规则对该字符串进行身份证号码的地址码、出生日期码以及校验码的验证。 Wherein the verification, mainly in accordance with the rules of the ID number encoded string ID number of the address code, date of birth verification code and verification code.

[0051] S130,将通过验证的该字符串作为可能有效的身份证号码。 [0051] S130, the string will be validated as a possible valid ID number.

[0052] 本申请的实施例根据8位字符的可能数值,建立长度为256项的字符HASH表,并将有效数字的值设为1、可忽略字符的值设为2、非法字符的值设为3以及将可能的结束符的值设为4。 Implementation [0052] According to the present application may value 8 characters, the establishment of the length of the character of HASH table 256, and the value is set to a valid number 1, you can ignore the value of the character set to 2, the value of the illegal character set 3 as well as the possible endings of value to 4. 表示有效数字的值为1,该有效数字为10个数字,包括字符0到9。 Represents a valid number is 1, the effective number is 10 digits, including the character at 0-9. 表示用于分隔的可忽略字符的值为2,本申请的实施例中可忽略的9个字符包括:空格、tab字符、回车、换行、点.、破折号(_)、等于号(=)、折号(/)、反折号(\)。 Represents the value that separates the two characters can be ignored, the present application embodiment negligible 9 characters include: spaces, tab characters, carriage returns, line, dot, dash (_), equal sign (=) , folding number (/), reflexed number (\). 表示可能为身份证号码结束符的字符为X或X的值为3。 May represent the end of a character and identity card numbers of X or X is 3. 不是前述10个数字、可忽略字符以及结束符的字符均作为无效字符,其值为4。 Instead of the previous 10 numbers, characters and terminator characters are ignored as invalid characters, its value is 4.

[0053] 本申请的其他实施例中,用于表示分隔的忽略的字符还可以有更多,比如感叹号(!)、星号(*)、井号(#)等等,凡是可以出现在身份证号码中间用于对身份证号码进行分隔而方便读写的字符,均可视为本申请实施例中的可忽略字符,其在本申请实施例的字符哈希表中值均为2。 [0053] The present application further embodiments, indicating separated ignored characters can also be more, such as an exclamation point (!), The asterisk (*), the pound sign (#) and so on, who can appear in status Character middle card number for identification numbers are separated and easy to read and write, can be regarded as negligible character example embodiment of the present application, the character hash table its embodiment in this application values are 2.

[0054] 当读取输入字符串时,通过字符HASH表,读取字符的类型属性,并采取不同的操作。 [0054] When reading the input string, through the character HASH table, reading the type attribute character and take different actions. 每次读取待处理字符串中的一个字符,以该字符为下标读取字符HASH表的数值,对于值为1的有效字符,在已记录长度小于19时进行记录,否则丢弃。 Pending each read a character string to the character of the subscript characters read values HASH table, a value of 1 for a valid character in the recorded recording length is less than 19:00, otherwise discarded. 对于值为2的可忽略字符直接丢弃。 For a value of 2 is negligible character discarded. 对于值为3的结束符,如果已记录的字符串的长度为18,则已记录的字符串可能就是一个有效的身份证号码,完成提取工作;如果已记录的字符串的长度为17,则将结束符记录到字符串中作为一个可能有效的身份证号码,完成提取工作;其他长度的已记录的字符串不是一个身份证号码,清除该已记录的字符串(清除缓冲区)。 For the value of terminator 3, the length of the string is 18 if the recorded string already recorded may be a valid identity card number to complete the extraction work; if the recorded string length is 17, then The terminator record into a string as a possible valid ID number, complete extraction work; other lengths of string is not recorded an ID number, remove the string that has been recorded (clear buffer). 对于值为4的非法字符,如果已记录的字符串的长度为18,则将该已记录的字符串作为一个可能有效的身份证号码提交,否则清除该已记录的字符串(清除缓冲区)。 For illegal characters is 4, if the length of the string has been recorded at 18, then the strings have been recorded as a possible submission of a valid identity card number, or to clear a string that has been recorded (clear buffer) .

[0055] 本申请的实施例所建立的字符HASH表,每个ascii码对应的数据项为该ascii码的属性。 Implementation [0055] of the present application the established character HASH tables, each corresponding ascii code ascii code data entry for the properties. 如hash[48]的值为1,ascii码48是字符'0',因此其数值为1(有效身份证字符),hash [45]的值为2, ascii码45是字符因此其数值为2 (可忽略的分隔字符)。 As hash [48] of the value 1, ascii code 48 is the character '0', so the value is 1 (valid ID character), hash [45] the value of 2, ascii character code 45 is therefore a value of 2 (negligible separator character). 本申请的实施例根据预设的字符HASH表,以数据流中的待检测字符为下标,查询得出字符的类型。 The implementation of the present application according to a preset character HASH table to the data stream to be detected character index, query type derived characters. 在判断任一字符时,仅需判断其hash[a]的数值即可。 In judging any character, just judge hash [a] value to.

[0056] 本申请的实施例通过一次查表,即可完成字符的处理,对有效身份证字符进行记录,对无效身份证字符进行终结处理,对可忽略字符进行过滤以及对结束符的判断处理,结合身份证号码长度,快速得出与身份证号码长度项相符合的字符串,完成身份证号码的提取和验证工作。 [0056] Examples of the application through a look-up table to complete the deal with the characters, the characters of a valid identity card records, identity cards invalid termination character, to be ignored characters and the terminator filter determination process combined with the length of the ID number, and ID number to quickly obtain consistent string length term to complete the identification number of the extraction and validation.

[0057] 本申请的实施例将六位数字地址码分为三位数字的大行政区号码和三位数字的区内号码。 Implementation [0057] of the present application will address the six-digit code into a three-digit and three-digit numbers greater administrative number of areas. 本申请的实施例利用二维数组对可能为身份证号码的字符串中的前六位字符进行验证。 The implementation of the present application may use two-dimensional array of string ID number in the first six characters for verification. 二维数组第一维的长度大于等于三位数字所表示的有效的大行政区号码的数量, 第二维的长度大于等于所有有效的大行政区号码内三位数字所表示的区内号码的数量。 Two-dimensional array of length greater than the first dimension equal to the number of administrative regions effective three-digit number indicates, the second dimension of length greater than equal to the number of the district number in all valid administrative regions represented by the three-digit number. 利用二维数组的第一维对可能为身份证号码的字符串中的前三位字符进行地址码的大行政区号码的验证,利用二维数组的第二维对可能为身份证号码的字符串中的第四至六位字符进行地址码的区内号码的验证。 The first dimension of the use of two-dimensional array of possible identification number to a string in the first three characters of the address codes large number of administrative verification, two-dimensional array of the second dimension is the identity card number of possible strings The fourth to six characters of the area to verify the address code numbers. 该二维数组的第一维记录有有效身份证号码中所有的大行政区号码,第二维记录有所有有效大行政区号码下有效区内号码。 The first dimension record two-dimensional array of a valid identity card numbers in all the major administrative number, the second dimension is recorded all valid number greater administrative district number is valid. 本申请的实施例,该二维数组的第一维为一维大区hash表,第二维为二维行政区hash表,通过该一维大区hash表和二维行政区hash表来快速进行六位数字地址码的识别。 Embodiments of the present application, the first dimension of the two-dimensional array of one-dimensional region hash table, the second dimension of the two-dimensional SAR hash table, through which one-dimensional and two-dimensional region hash table hash table to quickly carry out administrative Six identification digit address codes.

[0058] 二维数组第一维的长度大于等于三位数字所表示的有效的大行政区号码的数量, 第二维的长度大于等于所有有效的大行政区号码内三位数字所表示的区内号码的数量。 [0058] The two-dimensional array of length greater than the first dimension equal to the number of valid administrative regions represented three-digit number, the second dimension of length greater than or equal effective administrative regions in all three-digit numbers represented the district number number. 由于大行政区号码有许多空值,本申请的实施例为节省硬件设备的内存空间,利用大区号码映射表对三位数字的大行政区号码进行验证,该大区号码映射表的空间大于〇但小于将三位数字的大区代码的数值0-999映射为一个数值小于999的有效区间,比如为0-149的有效区间。 Due to the large number of administrative regions have many null values, the embodiment of the present application in order to save memory space hardware, using a large area map for the three-digit number of large administrative number to verify that the region is greater than the number of the mapping table space square but less than the value of three-digit area code 0-999 large mapped to a value less than 999 valid range, for example, the effective range of 0-149. 该大区号码映射表中存储有目前全国身份证号码的大行政区号码。 The region number mapping table stores the current national identity card number of large administrative number.

[0059] 本申请的实施例为利用长度为1000项的区内号码映射表对每个三位数字的区内号码进行验证,其中存储三位数字的区内号码,用来对大行政区号码之后的三位数字进行验证。 [0059] Examples of the application for the use of a length of 1000 map of the region number be validated for each three-digit zone number, which is stored in the area of three-digit number, used after number of administrative regions three-digit number for verification. 通过两次查表,即可完成六位数字地址码的有效性确认。 Through two look-up table to complete the six-digit address validity confirmation code.

[0060] 为节省空间,本申请的实施例建立行政区HASH表来完成身份证号码中六位数字地址码的快速验证。 [0060] To save space, the implementation of the present application to establish administrative HASH table to perform a quick verification of the six-digit identification number address code. 建立身份证三位大行政区号码的映射,把1000长度的数据值映射为150长度的数据项值,再建立150*1000的二维数组,完成六位地址码的确认。 Establish administrative identity number three large maps, a length of 1000 data values are mapped to the length of the data item value 150, then create a two-dimensional array of 150 * 1000, completing six confirmed address codes. 其中,大区HASH表用来完成身份证号码中的大行政区号码的HASH映射。 Among them, the large area HASH HASH table to complete the mapping ID number of large administrative numbers. 由于大行政区号码中空余数值较多,本申请的实施例将其映射为一个长度为150的值域范围,大大减少了硬件设备所需的内存空间。 Due to the large number of vacant administrative value more embodiments of the present application will be mapped to a range of values of length 150, greatly reducing the hardware required memory space. 大区HASH表中,第0-148项各自对应一个有效的大行政区号码,第149项表示无效区号。 Region HASH table, the first item 0-148 respective administrative areas a valid number, item 149 indicates an invalid code. 大区HASH表的数值,作为二维数组的第一个下标变量,身份证号码中的区内号码作为二维数组的第二维下标变量。 HASH value region of the table, as the first two-dimensional array index variable, ID number of the area number as the second dimension index variable two-dimensional array. 行政区二维HASH表中,数值0表示该号码不是一个有效的地址码,数值1表示该号码是一个有效的地址码。 Administrative dimensional HASH table, a value of 0 indicates that the number is not a valid address code value of 1 indicates that the number is a valid address code.

[0061] 比如对于一个从网页上获取到的字符串"aaaal23asdll0. 108-196212302873uuuu uuuuuu",首先依次读取字符"aaaa",通过字符HASH表判断其都是无效字符,删除。 [0061] For example, for a webpage to get from the string "aaaal23asdll0. 108-196212302873uuuu uuuuuu", first in order to read character "aaaa", judged by character HASH table are invalid characters removed. 然后读取字符" 123",通过字符HASH表判断其为有效字符,进行记录。 Then read the characters "123" is determined by the character HASH table as a valid character, be recorded. 紧接着读取"asd",在读取字符"a"时,已记录的字符串的数字长度是3,不是有效身份证号码的长度15或者18,因此其不是有效的字符丢弃;同样"sd"字符也丢弃,并且已记录的字符串" 123"也已不可能成为一个有效的身份证号码中的字符,因此也丢弃。 Then read "asd", read the character "a", the figures recorded length of the string is the length of 3, is not a valid identity card numbers of 15 or 18, so it is not a valid character discarded; the same "sd "characters are discarded, and the string recorded" 123 "has not become a valid identity card number of the characters, and therefore discarded. 紧接着读取"110. 108-196212302873", 由于通过字符HASH表判断出"110"、" 108"以及"196212302873"都是有效字符,则按序进行记录,同时通过字符HASH表判断出字符"和是可以忽略的字符,将这两个字符忽略。由于已经记录的字符串"110108196212302873"已经达到18位长度,则将其作为一个可能的省份证号码进行存储,对其进行是否可能为有效身份证号码的判断。采用区号判断语句Hash2[hashl[110]] [108]进行判断,得到大区号110的HASH值为1,Hash2[l] Then read "110. 108-196212302873", as judged by the character HASH table "110", "108" and "196212302873" is a valid character, the sequential recording, while judged by character HASH table character " and is ignored characters, these two characters are ignored. Because the string already recorded "110108196212302873" has reached 18 length, it is the province as a possible card numbers stored, whether it be possible for the valid identity Analyzing card numbers. using code judge sentences Hash2 [hashl [110]] [108] judgment, to obtain a large value HASH 110 area code 1, Hash2 [l]

[108]项为1,有效,前六位字符"110108"是一个有效的地址码。 [108] entry to 1, effectively, the first six characters "110108" is a valid address code. 然后对"19621230"进行是否为一有效的出生日期码的判断,最后采用IS07064 :1983.MODll-2校验算法对字符串"110108196212302873"最后一位字符"3"进行校验。 Then the "19621230" in whether a valid birth date codes judgment Finally IS07064: 1983.MODll-2 checksum algorithm string "110108196212302873" the last character "3" for verification. 据此,可以判断出字符串"110108196212302873"是否为一个可能有效的身份证号码。 Accordingly, based on the string "110108196212302873" is a possible valid ID number.

[0062] 如图2所示,本申请实施例的身份证号码的识别装置主要包括识别模块210、验证模块220及执行模块230。 [0062] As shown in Figure 2, the ID number of the presently filed embodiment mainly includes an identification device identification module 210, authentication module 220 and execution module 230.

[0063] 识别模块210,配置为从字符流中识别出可能为身份证号码的字符串; [0063] identification module 210 may be configured to identify the identification number of the string from the character stream;

[0064] 验证模块220,与识别模块210相连,配置为采用身份证号码编码规则对所述可能为身份证号码的字符串进行验证; [0064] authentication module 220, connected with the identification module 210, configured to use the ID number coding rules of the string ID number may be verified;

[0065] 执行模块230,与验证模块220相连,配置为将通过验证的所述可能为身份证号码的字符串作为可能有效的身份证号码。 [0065] execution module 230 is connected with the verification module 220, configured for possible identification number to a string as a possible valid ID number through the verification.

[0066] 如图2所示,上述识别模块210包括识别单元211及判断单元212。 [0066] As shown in Figure 2, the identification module 210 includes an identification unit 211 and the judging unit 212.

[0067] 识别单元211,配置为采用能够对有效数字、可忽略字符、非法字符以及可能的结束符进行区分的哈希表对所述字符流进行字符识别; [0067] identification unit 211 is configured to be able to effectively use digital, you can ignore the character, the illegal character and possible endings to distinguish hash table for the character stream for character recognition;

[0068] 判断单元212,与识别单元211及验证模块220相连,配置为将十八位数字组成的字符串或者十七位数字以及一位X或X的字符串识别为所述可能为身份证号码的字符串。 [0068] judging unit 212, the recognition unit 211 and the verification module 220 is connected to the string configuration identification string of eight digits or seventeen numbers and one X or X is the possibility of identity string of numbers.

[0069] 如图2所示,上述验证模块220包括第一验证单元221、第二验证单元222以及第三验证单元223。 [0069] As shown in Figure 2, the verification module 220 includes a first verification unit 221, the second verification unit 222 and a third authentication unit 223.

[0070] 第一验证单元221,与执行模块230及识别模块210中的判断单元212相连,配置为采用身份证号码编码规则对所述可能为身份证号码的字符串进行身份证号码的地址码的验证; [0070] The first verification unit 221, and an execution module 230 and the identification module 210 is connected to the judging unit 212, configured to use the ID number for the encoding rules may be a string ID number ID number address code verification;

[0071] 第二验证单元222,与执行模块230及识别模块210中的判断单元212相连,配置为采用身份证号码编码规则对所述可能为身份证号码的字符串进行身份证号码的出生日期码的验证; [0071] The second verification unit 222, and an execution module 230 and the identification module 210 is connected to the judging unit 212, configured to use the ID number for the encoding rules may be a string ID number ID number Date of Birth verification code;

[0072] 第三验证单元223,与执行模块230及识别模块210中的判断单元212相连,配置为采用身份证号码编码规则对所述可能为身份证号码的字符串进行身份证号码的校验码的验证。 [0072] The third verification unit 223, and an execution module 230 and the identification module 210 is connected to the judging unit 212, configured to use the ID number for the encoding rules may be a string ID number ID number check verification code.

[0073] 上述第一验证单元221配置为利用二维数组对所述可能为身份证号码的字符串中的前六位字符进行所述地址码的验证。 [0073] the first verification unit 221 is configured to use a two-dimensional array to a string on the possible identity card number of the first six characters of the address verification code.

[0074] 上述第一验证单元221配置为利用所述二维数组的第一维对所述可能为身份证号码的字符串中的前三位字符进行所述地址码的大行政区号码的验证,利用所述二维数组的第二维对所述可能为身份证号码的字符串中的第四至六位字符进行所述地址码的区内号码的验证。 [0074] the first verification unit 221 is configured to use a two-dimensional array of first dimension of the possible identification number to a string in the first three characters of the address verification code number of large administrative, using the two-dimensional array of the second dimension of the possible identification number to a string in the fourth to six characters for verification of the address area code numbers. 所述二维数组第一维的长度大于等于三位数字所表示的有效的大行政区号码的数量,第二维的长度大于等于所有有效的大行政区号码内三位数字所表示的区内号码的数量。 The two-dimensional array of the first dimension of a length greater than or equal the number of valid administrative regions represented three-digit number, the second dimension of length greater than or equal district number valid in all administrative areas represented by three-digit number of quantity.

[0075] 本领域的技术人员应该明白,上述的本申请实施例所提供的装置的各组成部分, 以及方法中的各步骤,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上。 [0075] Those skilled in the art should understand that the above-described embodiment of the present application the components provided in the device, as well as the steps of the method embodiments, they can focus on a single computing device or distributed across multiple computing devices consisting of a network. 可选地,它们可以用计算装置可执行的程序代码来实现。 Alternatively, they may be implemented in program code executable by a computing device. 从而,可以将它们存储在存储装置中由计算装置来执行,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。 Thus, they can be stored in the storage means is performed by the computing means, or they are made into respective integrated circuit modules, or making them into a plurality of modules or steps to achieve a single integrated circuit module. 这样,本发明不限制于任何特定的硬件和软件结合。 Thus, the present invention is not limited to any specific combination of hardware and software.

[0076] 虽然本发明所揭露的实施方式如上,但所述的内容仅为便于理解本发明而采用的实施方式,并非用以限定本发明。 [0076] Although the embodiment of the present invention disclosed above, but the content is only to facilitate understanding of the invention embodiments use, not intended to limit the present invention. 任何本发明所属领域内的技术人员,在不脱离本发明所揭露的精神和范围的前提下,可以在实施的形式及细节上进行任何的修改与变化,但本发明的专利保护范围,仍须以所附的权利要求书所界定的范围为准。 Those skilled in the art of the present invention to any inside, in the present invention without departing from the spirit and scope disclosed premise, you can make any modifications and changes in form and details of the implementation, but the scope of patent protection of the invention, still range of the appended claims defined subject.

Patent Citations
Cited PatentFiling datePublication dateApplicantTitle
CN101651938A *2 Jul 200917 Feb 2010优视动景(北京)技术服务有限公司Telephone number recognition system for mobile terminal and application method thereof
CN101976333A *18 Nov 201016 Feb 2011上海合合信息科技发展有限公司Method for automatically distinguishing first-generation identity card from second-generation identity card
CN102982012A *7 Sep 201120 Mar 2013百度在线网络技术(北京)有限公司Method and device used for obtaining target character strings in disorder text
US5421619 *22 Dec 19936 Jun 1995Drexler Technology CorporationLaser imaged identification card
Classifications
International ClassificationG06F17/30
Cooperative ClassificationG06F17/30985, G06F17/30867
Legal Events
DateCodeEventDescription
22 Oct 2014C06Publication
26 Nov 2014C10Entry into substantive examination