CN105512108A - English pun recognition method based on likelihood ratio estimation - Google Patents

English pun recognition method based on likelihood ratio estimation Download PDF

Info

Publication number
CN105512108A
CN105512108A CN201510918577.5A CN201510918577A CN105512108A CN 105512108 A CN105512108 A CN 105512108A CN 201510918577 A CN201510918577 A CN 201510918577A CN 105512108 A CN105512108 A CN 105512108A
Authority
CN
China
Prior art keywords
pun
sentence
english
likelihood ratio
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510918577.5A
Other languages
Chinese (zh)
Other versions
CN105512108B (en
Inventor
邹航
王月芳
孔令璇
李�瑞
刘树英
戴继生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University
Original Assignee
Jiangsu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University filed Critical Jiangsu University
Priority to CN201510918577.5A priority Critical patent/CN105512108B/en
Publication of CN105512108A publication Critical patent/CN105512108A/en
Application granted granted Critical
Publication of CN105512108B publication Critical patent/CN105512108B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities

Abstract

The invention discloses an English pun recognition method based on likelihood ratio estimation. The method comprises steps as follows: step 1, an English sentence required to be recognized is read by software; step 2, pun words and all notional words of the sentence in the step 1 are extracted and marked as h and wm (m denotes 1, 2,...M), and two meanings of each pun word H are marked as I1 and I2; step 3, each notional word wm (m denotes 1, 2,...M) is counted, the value of the correlation between M and the pun meaning Ii (i denotes 1 or 2) is marked as R (wm,Ii), and the value of R (wm,Ii) is counted in a questionnaire manner in advance; step 4, the R (wm,Ii) obtained in the step 3 is used for creating a likelihood ratio lambda (I); step 5, whether the sentence contains pun meanings is judged according to the calculation result of lambda (I); when the value of lambda (I) approaches 0, the sentence contains pun meanings, and otherwise, the sentence doesn't contain pun meanings. A probability calculation method capable of accurately quantifying ambiguity of the sentence and recognizing puns is proposed, and the defect that the pun meanings cannot be accurately and quantitatively analyzed with a conventional method is overcome.

Description

A kind of English pun recognition methods estimated based on likelihood ratio
Technical field
The invention belongs to natural language processing field, relate to the quirkish identification of English, specifically a kind of English pun recognition methods estimated based on likelihood ratio.
Background technology
In recent years, the rise of computational linguistics, for philological research and development is filled with new vitality, also for quirkish research provides a brand-new approach.Computational linguistics, usually by means of probabilistic method, take computer technology as means, from extensive real text, obtains useful statistical information.It is less that relevant domestic scholars utilizes statistical method to carry out quirkish achievement in research, at document: Zhao Huijun, pun Pragmatic Translation quantitative model, in foreign language research 135 (5) (2012) 72-76, propose the comparatively simple pun Pragmatic Translation quantitative model of one, this is the Beneficial that computational linguistics and artificial intelligence technology are applied in pun translation by domestic scholars.
Foreign scholar's great majority are using the emphasis of the quantitative test of word incongruity as pun Study of recognition, but academic circles at present not yet has a strict standard accurately to the measurement of word incongruity, this uncertainty is that quirkish discriminance analysis brings many unfavorable factors.The pun recognition methods being in main flow is at present failed as pun analysis and identification provide a general calculating cognitive theory.Pun Study of recognition based on computational linguistics is still in the starting stage, is its feature extraction, computation model designs or theoretical analysis method aspect all remains to be further improved and develops.
Summary of the invention
In order to solve the problem, the present invention utilizes a games which become fairer with time being applicable to analyze having a double meaning word incongruity, and propose a kind of English pun recognition methods estimated based on likelihood ratio, the method can realize the quirkish quick identification of English automatically.The technical scheme adopted is as follows:
Based on the English pun recognition methods that likelihood ratio is estimated, comprise the steps:
Step 1: read the English sentence that need identify by software;
Step 2: the having a double meaning word of sentence and all notional words in extraction step 1, be designated as h and w respectively m, m=1,2 ..., M, the two layers of meaning that wherein having a double meaning word h comprises is designated as I respectively 1and I 2;
Step 3: add up each notional word w m, m=1,2 ..., M and having a double meaning word meaning I i, i=1, the correlation degree between 2, its value is designated as R (w m, I i);
Step 4: utilize the R (w obtained in step 3 m, I i), structure likelihood ratio λ (I);
Step 5: judge whether sentence exists having a double meaning implication according to the result of calculation of λ (I).
As optimal technical scheme, software described in step 1 is realized by Matlab or visual c++.
As optimal technical scheme, step 2 also comprises: artificial foundation comprises the corpus of word part of speech and having a double meaning word and is stored in computing machine, extracts having a double meaning word and all notional words by computer inquery corpus.
As optimal technical scheme, the correlation degree R (w in described step 3 m, I i) the value mode that adopts prior survey to add up obtain.
As optimal technical scheme, described correlation degree R (w m, I i) value be located between 0-10.
As optimal technical scheme, the computing method of the likelihood ratio λ (I) in described step 4 are:
λ ( I ) = log P ( w 1 , ... , w M | I 1 ) log P ( w 1 , ... , w M | I 2 ) = Σ m = 1 M R ( w m , I 1 ) - Σ m = 1 M R ( w m , I 2 ) ;
In formula, P (|) represents conditional probability function, and log () represents natural logarithm function.
As optimal technical scheme, the concrete grammar whether sentence exists having a double meaning implication that judges described in described step 5 is: when | λ (I) | during <1, judge that sentence has having a double meaning implication; Otherwise judge that sentence does not have having a double meaning implication.
Beneficial effect of the present invention:
The present invention is based on likelihood ratio estimation theory, give a kind of can accurate quantification statement ambiguousness identify quirkish method for calculating probability, solving classic method cannot the defect of the having a double meaning implication of accurate quantitative analysis.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of the English pun recognition methods that the present invention proposes;
Fig. 2 is each notional word and having a double meaning word meaning (I in example sentence (a) 1=reign and I 2=rain) between the statistical value of degree of correlation;
Fig. 3 is each notional word and having a double meaning word meaning (I in example sentence (b) 1=reign and I 2=rain) between the statistical value of degree of correlation;
Fig. 4 is each notional word and having a double meaning word meaning (I in example sentence (c) 1=reign and I 2=rain) between the statistical value of degree of correlation;
Fig. 5 is each notional word and having a double meaning word meaning (I in example sentence (d) 1=reign and I 2=rain) between the statistical value of degree of correlation.
Embodiment
Below in conjunction with the drawings and specific embodiments, the present invention is described in further detail.
As shown in Figure 1, be the process flow diagram of the English pun recognition methods that the present invention proposes, comprise the steps:
Step 1: read the English sentence that need identify by software; This sentence is made up of certain having a double meaning word and M notional word, and wherein having a double meaning word, preposition, adverbial word, article are not counted in sum M.
Step 2: artificial set up the corpus that comprises word part of speech and having a double meaning word and be kept in computing machine, computing machine is by inquiry corpus, and in extraction step 1, the having a double meaning word of sentence and all notional words, be designated as h and w respectively m, m=1,2, K, M, wherein having a double meaning word h comprises two layers of meaning and is designated as I respectively 1and I 2;
Step 3: add up each notional word w m, m=1,2 ..., M and having a double meaning word meaning I i, i=1, the correlation degree between 2, its value is designated as R (w m, I i);
R (w m, I i) can be added up by the mode of survey in advance, generally should be greater than 50 people by trial number.Survey requires every tested independent judgment group w m, I ibetween meaning of one's words correlativity, and provide corresponding mark, marking scope 0 assigns to 10 points, and 0 point of expression is completely uncorrelated, and 10 points of expressions are extremely relevant.Each R (w m, I i) estimated value by the w obtained in survey m, I ibetween the mean value of meaning of one's words associated score determined.
Step 4: utilize the R (w obtained in step 3 m, I i), structure likelihood ratio λ (I);
&lambda; ( I ) = log P ( w 1 , ... , w M | I 1 ) log P ( w 1 , ... , w M | I 2 ) = &Sigma; m = 1 M R ( w m , I 1 ) - &Sigma; m = 1 M R ( w m , I 2 ) - - - ( 1 )
In formula (1), P (|) represents conditional probability function, and log () represents natural logarithm function.
Step 5: judge whether sentence exists having a double meaning implication according to the result of calculation of λ (I).If λ (I) be one close to the value (such as | λ (I) | <1) of zero, judge that sentence has having a double meaning implication; Otherwise judge that sentence does not have having a double meaning implication.
Below in conjunction with embodiment, effect of the present invention is described further.
In order to assess the performance of the method that the present invention proposes, 4 English example sentences in method his-and-hers watches 1 of the present invention are used to carry out quirkish identification.
Table 1 English example sentence
Sequence number Example sentence content
(a) Britain is a wet place since the queen has had a long reign.
(b) Britain is a wet place since the autumn has had a long reign.
(c) The king's reign ended and his heir took over.
(d) Rain fell on the city last night.
From table 1, it is having a double meaning that reign and the rain in sentence (a) belongs to unisonance; Word queen in sentence (a) has been replaced to word autumn by sentence (b), and this makes having a double meaning linguistic context be destroyed, and is generally belonged to by such sentence " going having a double meaning "; Sentence (c) and sentence (d) all belong to non-having a double meaning, and the implication that sentence (c) is clearly expressed is I 1=reign, and the implication that sentence (d) is clearly expressed is I 2=rain.In above-mentioned all sentences, notional word has 13 altogether, and for convenience of description, we are by its serial number: w 1=Britain, w 2=wet, w 3=place, w 4=queen, w 5=long, w 6=autumn, w 7=king, w 8=end, w 9=heir, w 10=takeover, w 11=fall, w 12=city, w 13=lastnight.Questionnaire requires the meaning of one's words correlativity in every tested independent judgment table 2 between each phrase, and provides degree of correlation mark, and marking scope 0 assigns to 10 points, and 0 point of expression is completely uncorrelated, and 10 points of expressions are extremely relevant.
Table 2 survey content
As Figure 2-Figure 5, be each notional word in each example sentence of table 1 and having a double meaning word meaning (I 1=reign and I 2=rain) between the statistics of degree of correlation.As can be seen from Fig. 2-Fig. 5: the notional word of sentence (a) is I 1=reign and I 2=rain provides different supporting roles; The notional word of sentence (b) and (d) is mainly I 2=rain provides supporting role; The notional word of sentence (c) is mainly I 1=reign provides supporting role.More than show R (w m, I i) correlation degree of notional word and having a double meaning implication can be weighed more exactly.
The value of the corresponding likelihood ratio λ of each example sentence (I) is calculated according to example sentence notional word each in Fig. 2-Fig. 5 and the correlation degree value between having a double meaning word and relational expression (1), for sentence (a), the computation process of its likelihood ratio λ (I) is as follows: result of calculation is as shown in table 3, as can be seen from Table 3, the λ (I)=0.04 that sentence (a) is corresponding, be one close to zero value, absolute value is less than 1, therefore judges that sentence (a) has having a double meaning implication; The λ (I) that other sentence is corresponding keeps off in zero, therefore judges that these sentences do not have having a double meaning implication.This recognition result is consistent with legitimate reading, thus confirms validity of the present invention.
The calculated value of each example sentence likelihood ratio λ (I) of table 3
The above is only for describing technical scheme of the present invention and specific embodiment; the protection domain be not intended to limit the present invention; be to be understood that; under the prerequisite without prejudice to flesh and blood of the present invention and spirit, institute changes, improve or be equal to replacement etc. all will fall within the scope of protection of the present invention.

Claims (7)

1., based on the English pun recognition methods that likelihood ratio is estimated, it is characterized in that, comprise the steps:
Step 1: read the English sentence that need identify by software;
Step 2: the having a double meaning word of sentence and all notional words in extraction step 1, be designated as h and w respectively m, m=1,2 ..., M, the two layers of meaning that wherein having a double meaning word h comprises is designated as I respectively 1and I 2;
Step 3: add up each notional word w m, m=1,2 ..., M and having a double meaning word meaning I i, i=1, the correlation degree between 2, its value is designated as R (w m, I i);
Step 4: utilize the R (w obtained in step 3 m, I i), structure likelihood ratio λ (I);
Step 5: judge whether sentence exists having a double meaning implication according to the result of calculation of λ (I).
2. a kind of English pun recognition methods estimated based on likelihood ratio according to claim 1, is characterized in that, software described in step 1 is realized by Matlab or visual c++.
3. a kind of English pun recognition methods estimated based on likelihood ratio according to claim 1, it is characterized in that, step 2 also comprises: artificial foundation comprises the corpus of word part of speech and having a double meaning word and is stored in computing machine, extracts having a double meaning word and all notional words by computer inquery corpus.
4. a kind of English pun recognition methods estimated based on likelihood ratio according to claim 1, is characterized in that, the correlation degree R (w in described step 3 m, I i) the value mode that adopts prior survey to add up obtain.
5. a kind of English pun recognition methods estimated based on likelihood ratio according to claim 4, is characterized in that, described correlation degree R (w m, I i) value be located between 0-10.
6. a kind of English pun recognition methods estimated based on likelihood ratio according to claim 1, it is characterized in that, the computing method of the likelihood ratio λ (I) in described step 4 are:
&lambda; ( I ) = log P ( w 1 , ... , w M | I 1 ) log P ( w 1 , ... , w M | I 2 ) = &Sigma; m = 1 M R ( w m , I 1 ) - &Sigma; m = 1 M R ( w m , I 2 ) ;
In formula, P (|) represents conditional probability function, and log () represents natural logarithm function.
7. a kind of English pun recognition methods estimated based on likelihood ratio according to claim 1, it is characterized in that, the concrete grammar whether sentence exists having a double meaning implication that judges described in described step 5 is: when | λ (I) | during <1, judge that sentence has having a double meaning implication; Otherwise judge that sentence does not have having a double meaning implication.
CN201510918577.5A 2015-12-11 2015-12-11 A kind of English pun recognition methods based on likelihood compared estimate Active CN105512108B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510918577.5A CN105512108B (en) 2015-12-11 2015-12-11 A kind of English pun recognition methods based on likelihood compared estimate

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510918577.5A CN105512108B (en) 2015-12-11 2015-12-11 A kind of English pun recognition methods based on likelihood compared estimate

Publications (2)

Publication Number Publication Date
CN105512108A true CN105512108A (en) 2016-04-20
CN105512108B CN105512108B (en) 2018-04-03

Family

ID=55720101

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510918577.5A Active CN105512108B (en) 2015-12-11 2015-12-11 A kind of English pun recognition methods based on likelihood compared estimate

Country Status (1)

Country Link
CN (1) CN105512108B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106021233A (en) * 2016-05-24 2016-10-12 仲恺农业工程学院 Experiment method and application for metonymy processing of hierarchical quantization based on textual context information

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5500920A (en) * 1993-09-23 1996-03-19 Xerox Corporation Semantic co-occurrence filtering for speech recognition and signal transcription applications
CN1811916A (en) * 2005-01-24 2006-08-02 乐金电子(惠州)有限公司 Phonic proving method for speech recognition system
CN203351061U (en) * 2013-07-01 2013-12-18 哈尔滨金融学院 English pun vocabulary learning device having cognitive pragmatic function

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5500920A (en) * 1993-09-23 1996-03-19 Xerox Corporation Semantic co-occurrence filtering for speech recognition and signal transcription applications
CN1811916A (en) * 2005-01-24 2006-08-02 乐金电子(惠州)有限公司 Phonic proving method for speech recognition system
CN203351061U (en) * 2013-07-01 2013-12-18 哈尔滨金融学院 English pun vocabulary learning device having cognitive pragmatic function

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
JUSTINE KAO ET AL: "Play on Words: Predicting Punniness with Statistics and Semantics", 《HTTPS://WWW.SEMANTICSCHOLAR.ORG/PAPER/PLAYONWORDSPREDICTINGPUNNINESSWITHSTATISTICSKAOTAN/0735A15E6710E58F81EDED71E34DB2841354CC4D》 *
JUSTINE T. KAO ET AL: "The Funny Thing About Incongruity: A Computational Model of Humor in Puns", 《HTTPS://WWW.SEMANTICSCHOLAR.ORG/PAPER/THEFUNNYTHINGABOUTINCONGRUITYACOMPUTATIONALKAOLEVY/53E876027C6E6FAF5F6AB561A07CB382EE3CE694》 *
张翠玲: "基于似然率方法的语音证据评价", 《证据科学》 *
张艳云 等: "贝叶斯似然率理论的实验研究", 《广东公安科》 *
王先财 等: "似然率在法庭语音取证中的应用研究", 《创新技术导报》 *
赵会军: "双关语语用翻译量化模型", 《外语研究》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106021233A (en) * 2016-05-24 2016-10-12 仲恺农业工程学院 Experiment method and application for metonymy processing of hierarchical quantization based on textual context information
CN106021233B (en) * 2016-05-24 2018-07-27 广东外语外贸大学 Metonymy machining experiment method based on the quantization of text language ambience information level and application

Also Published As

Publication number Publication date
CN105512108B (en) 2018-04-03

Similar Documents

Publication Publication Date Title
CN108399163B (en) Text similarity measurement method combining word aggregation and word combination semantic features
Yu et al. Chinese spelling error detection and correction based on language model, pronunciation, and shape
CN103235772B (en) A kind of text set character relation extraction method
Meister et al. Language model evaluation beyond perplexity
CN106557462A (en) Name entity recognition method and system
CN112148832B (en) Event detection method of dual self-attention network based on label perception
Layton et al. Recentred local profiles for authorship attribution
CN110347787B (en) Interview method and device based on AI auxiliary interview scene and terminal equipment
CN102568475A (en) System and method for assessing proficiency in Putonghua
CN111597356B (en) Intelligent education knowledge map construction system and method
CN103646112A (en) Dependency parsing field self-adaption method based on web search
US20150161096A1 (en) Method for detecting grammatical errors, error detection device for same and computer-readable recording medium having method recorded thereon
CN100543735C (en) File similarity measure method based on file structure
Walker 20 Variation analysis
CN113761890B (en) Multi-level semantic information retrieval method based on BERT context awareness
CN109508460B (en) Unsupervised composition running question detection method and unsupervised composition running question detection system based on topic clustering
CN103688254A (en) Example-based error detection system for automatic evaluation of writing, method for same, and error detection apparatus for same
CN105786971B (en) A kind of grammer point recognition methods towards international Chinese teaching
TWI477987B (en) Methods for sentimental analysis of news text
Zhang et al. Contextual similarity is more valuable than character similarity: An empirical study for chinese spell checking
CN109344233A (en) A kind of Chinese personal name recognition method
CN105512108A (en) English pun recognition method based on likelihood ratio estimation
CN103455638A (en) Behavior knowledge extracting method and device combining reasoning and semi-automatic learning
Putra et al. Sentence boundary disambiguation for Indonesian language
Oco et al. Measuring language similarity using trigrams: Limitations of language identification

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant