WO2014049186A1 - Method for generating semantic patterns - Google Patents

Method for generating semantic patterns Download PDF

Info

Publication number
WO2014049186A1
WO2014049186A1 PCT/ES2013/070638 ES2013070638W WO2014049186A1 WO 2014049186 A1 WO2014049186 A1 WO 2014049186A1 ES 2013070638 W ES2013070638 W ES 2013070638W WO 2014049186 A1 WO2014049186 A1 WO 2014049186A1
Authority
WO
WIPO (PCT)
Prior art keywords
pattern
grammatical
terms
categories
candidate
Prior art date
Application number
PCT/ES2013/070638
Other languages
Spanish (es)
French (fr)
Inventor
Valentín Miguel MORENO PELAYO
Pablo Miguel SUÁREZ LÓPEZ
Anabel FRAGA VÁZQUEZ
Juan Bautista LLORENS MORILLO
Eugenio PARRA CORREDOR
Original Assignee
Universidad Carlos Iii De Madrid
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Universidad Carlos Iii De Madrid filed Critical Universidad Carlos Iii De Madrid
Publication of WO2014049186A1 publication Critical patent/WO2014049186A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Definitions

  • the present invention is related to natural language recognition methods. More specifically, it is framed with those methods for the generation of semantic patterns that enable the organization of information.
  • the methodology proposed here allows to have complex patterns at the phrase level, which can be expressed by simpler ones as commonly used, with the advantage of identifying larger areas of the texts from which to extract semantics with Greater precision. Consequently, they have greater semantic wealth.
  • its obtaining is fully automatic based on documents in natural language.
  • a method is developed in which different steps are described that allow the automatic generation of indexing patterns, having a corpus as origin and as a result a list of patterns ordered by frequency. Also including, optionally, several special functions that organize and represent the information intermediate obtained, and expansion capacities of the generated patterns to other hierarchical formats.
  • the method of generating semantic patterns includes at least the following stages:
  • the groups are tupias of grammatical categories of terms.
  • the terms of the text include at least one punctuation mark and / or one word.
  • the grouping of categories is made from the categories of adjacent terms in a first iteration.
  • This type candidate for employer is called basic.
  • the grouping of categories is based on the categories of the terms distanced from each other by at least one intermediate term whose specific category has an optional presence in the patron candidate.
  • This type of candidate is called a pattern candidate with optional element (s).
  • At least one of the components of the pattern is in turn a candidate for the pattern of a previous iteration.
  • This pattern candidate type is called a compound.
  • the step of determining the semantic category is performed on the groups only in case one of its components is a grammatical category of verb.
  • one of its components is a pattern from a previous iteration from which it acquires (inherits) its semantic category.
  • FIG. 1 shows in a diagram the main steps according to a possible embodiment.
  • FIG. 1 you can see the sequence of steps to obtain a semantic pattern. It is based on a text to determine the grammatical category 11 of the words, word groups and punctuation marks that compose it.
  • categories 12 are grouped under different criteria. Thus groups (tupias) are obtained that contain the grammar codes associated with the words and / or punctuation marks of the text that have been grouped. There are different ways of grouping and therefore, different types of candidates for employers (basic patterns, compound or with optional terms).
  • C1 C2 Composed only by terms with grammatical categories, the tupias can be binary or n-ary. In the present example, without loss of generality, binaries will be chosen.
  • Indexing patterns contribute to identifying texts through grammatical and semantic categories. These patterns will be generated and stored through three defined data structures:
  • Map which contains sequentially; that is, in order of appearance, the patterns identified from the terms or tokens, and
  • Patterns which contain and group the generated patterns in order of frequency.
  • P (P1 -> 83 times, P2 -> 55 times, P3 -> 40 times, ..., Pn -> 1 time).
  • the corpus of work are formed by texts that can include, together with their terms, the information of their grammatical category. You can work directly on them or preprocess them to represent the morphological information of the words with other sets of grammar labels.
  • a corpus created for the English language is used from numerous representative sources. However, the method described here should not be considered limited to a specific corpus.
  • T (C138, C22, C55, C1 1, C127, C22, C1 11, C138, C22, C50, C1 11, C147, C22, C11, C151, C138, C22, C50, C13, C1 1, C138, C22 , C162, C82, C151, C138, C22, C29, C142, C138, C22, C127, C1 11, C138, C22, C144, C1 1, C54);
  • C127 preposition
  • C11 1 relative pronoun
  • C50 comma [,]
  • C147 verb to have;
  • the terms of the set T with their grammatical categories are grouped in MPk groups, following the order of appearance in the text.
  • the grouping form can be chosen to create groups of several terms. In the present example, it is done in pairs (only the first one has been underlined).
  • T (C138, C22, C55, C1 1.C127.C22.C1 11, C138, C22, c50, C11 1.C147.C22.C11.C151,
  • MP (MP1, MP2, ..., MPn);
  • the frequency of occurrence of each group (couple) Pk is counted to generate the basic patterns P.
  • a valid pattern candidate can be established when the frequency exceeds a threshold, for example more than 3 times.
  • This process can be repeated iteratively to generate composite patterns.
  • a compound pattern contains at least one other pattern as one of its terms.
  • the process would be similar to the previous one except that instead of categories the terms would be replaced by their equivalent subpattern.
  • new patterns can be located that will be by their nature compound patterns.
  • T (C138, C22, C55, C1 1, C127, C22, C1 11, C138, C22, c50, C1 11, C147, C22, C11, C151,
  • T (P1.C55.C11, C127, C22, C11 1, PJ., C50, C1 11, C147, C22, C1 1, C151, PJ., C50, C13,
  • new patterns are obtained that are saved on the Map. These new patterns are in this case:
  • the frequency of occurrence of each group (couple) Pk is counted to generate the composite patterns that are stored in the Patterns table.
  • Pn-1 (P1.C22), 1 time
  • This process is defined by maximum substitution levels or it is executed until no more substitutions are possible. However, it is advantageous to have a maximum configurable stop level in case the domain knowledge is extensive or the level from which the patterns are no longer useful and therefore their extraction is not necessary.
  • the patterns with optional elements are patterns formed by a tupia that can contain N elements (either grammatical categories or subpatrons) optional intermediate. To generate these patterns, the components of each pattern are searched in order in the token list, admitting the presence of intermediate elements between them. The maximum number of consecutive intermediate elements allowed is configurable. Its value is usually two (2). Subsequently, these patterns are stored on the Map and added to the structure of Patterns ordered by their frequency of appearance.
  • P99 (C11.C22) is defined; when applied to T, MOk is obtained.
  • T (C138.C22.C55.C1 1.C127.C22.C1 11.C138.C22.C50.C11 1.C147.C22.
  • MO (M01, M01 ', M02);
  • the corresponding semantic code can be associated with the help of a taxonomy.
  • P T contributes semantics (it contains at least one verb directly or indirectly), its semantics will be associated with the patterns that contain it (in this example P s ).
  • PF (PG, PH) - Although it is categorized as without direct semantics because it ignores the essence of the patterns it contains; the pattern P F will have associated the semantics (if it exists) that contain the patterns P G and PH-

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to the methods for recognising natural language. More concretely, the invention relates to the methods for generating semantic patterns that enable the organisation of the information. The invention includes the steps of: determining the grammatical category of each term of a text, assigning the grammatical categories into groups, counting the frequency of appearance of each group, establishing a pattern candidate if the frequency of appearance of a group is sufficiently high, determining the semantic category of the pattern candidate using a pre-defined taxonomy, and identifying a pattern when the pattern candidate has an associated semantic category.

Description

MÉTODO DE GENERACIÓN DE PATRONES SEMÁNTICOS  METHOD OF GENERATION OF SEMANTIC PATTERNS
Campo técnico de la invención Technical Field of the Invention
La presente invención está relacionada con los métodos de reconocimiento de lenguaje natural. Más concretamente, se encuadra con aquellos métodos para la generación de patrones semánticos que posibilitan la organización de la información. The present invention is related to natural language recognition methods. More specifically, it is framed with those methods for the generation of semantic patterns that enable the organization of information.
Estado de la Técnica State of the Art
En el campo del reconocimiento de lenguaje natural, es necesaria una herramienta que genere automáticamente patrones semánticos. In the field of natural language recognition, a tool that automatically generates semantic patterns is necessary.
Aunque existen diferentes formas y técnicas de relacionar los conceptos semánticamente, la extracción de relaciones mediante patrones semánticos es de las más utilizadas. La aplicación de esta técnica requiere confeccionar una lista de patrones para distintos tipos de relaciones semánticas. Estos patrones deben ser relativamente frecuentes en los documentos. Entre los antecedentes relacionados con la invención cabe destacar los siguientes documentos. Although there are different ways and techniques to relate the concepts semantically, the extraction of relations through semantic patterns is one of the most used. The application of this technique requires making a list of patterns for different types of semantic relationships. These patterns must be relatively frequent in the documents. Among the background related to the invention, the following documents are worth mentioning.
Llorens J., Morato J., Genova G. RSHP: An information representation model based on relationships. In: Ernesto Damiani, Lakhmi C. Jain, Mauro Madravio (Eds.), Soft Computing in Software Engineering (Studies in Fuzziness and Soft Computing Series, Vol. 159), Springer, pp 221-253. 2004. En este documento se propone un modelo de representación basado en grafos para relacionar conceptos, frente a la presente invención, se diferencia en que si bien se manifiesta la necesidad de nutrir ese modelo con patrones, no incluye un método para la obtención automática de los mismos. Llorens J., Morato J., Genova G. RSHP: An information representation model based on relationships. In: Ernesto Damiani, Lakhmi C. Jain, Mauro Madravio (Eds.), Soft Computing in Software Engineering (Studies in Fuzziness and Soft Computing Series, Vol. 159), Springer, pp 221-253. 2004. This document proposes a representation model based on graphs to relate concepts, compared to the present invention, it differs in that although it manifests the need to nurture that model with patterns, it does not include a method for the automatic obtaining of the same.
Alshawi H. Processing Dictiornary Definitions with Phrasal Pattern Hierarchies. Computational Linguistic. July-December 1987, 13 (3-4). Pp. 195-202. En este documento se propone extraer relaciones taxonómicas mediante patrones a partir de las definiciones de las palabras de un diccionario. Su alcance es limitado frente a la presente propuesta, ya que parte de documentos con cierta estructura para obtener un único tipo específico de relación entre parejas de términos. Alshawi H. Processing Dictiornary Definitions with Phrasal Pattern Hierarchies. Computational Linguistic. July-December 1987, 13 (3-4). Pp. 195-202. In this document it is proposed to extract taxonomic relationships through patterns from the definitions of the words in a dictionary. Its scope is limited compared to this proposal, since part of documents with a certain structure to obtain a single specific type of relationship between pairs of terms.
R.A. Amsler. A taxonomy for English nouns and verbs. Proceedings of the 19th annual Meeting of the Association for Computacional Linguistic. Stanford, California, 1981. Pp. 133- 138. En este documento se propone, a partir de documentos estructurados y de forma semiautomática, aplicar distribuciones de patrones con el propósito de identificar estructuras jerárquicas entre términos de los diccionarios. Frente a la presente propuesta, se diferencia en que los documentos de trabajo están estructurados, además los patrones no se obtienen de forma automática y son para un propósito muy específico. R.A. Amsler A taxonomy for English nouns and verbs. Proceedings of the 19th annual Meeting of the Association for Computational Linguistic. Stanford, California, 1981. Pp. 133-138. This document proposes, based on structured and semi-automatic documents, to apply pattern distributions with the purpose of identifying hierarchical structures between dictionary terms. In the face of this proposal, it differs in that the working documents are structured, in addition the patterns are not obtained automatically and are for a very specific purpose.
En estos y otros trabajos los patrones obtenidos tienen como fin el poder establecer relaciones entre dos conceptos y su obtención se realiza a partir de fuentes estructuradas como diccionarios, taxonomías, tesauros u ontológicas, no siendo en muchos casos totalmente automática. En su aplicación, estos patrones tradicionales identifican dos conceptos relacionados dentro de una frase sin atender en general al significado y estructura global de la misma. In these and other works the obtained patterns have the purpose of being able to establish relations between two concepts and their obtaining is carried out from structured sources such as dictionaries, taxonomies, thesauri or ontological, not being in many cases fully automatic. In their application, these traditional patterns identify two related concepts within a sentence without paying attention in general to its overall meaning and structure.
A diferencia de lo expuesto, la metodología aquí propuesta permite disponer de patrones complejos a nivel de frase, que se pueden expresar por otros más simples como lo habitualmente utilizados, con la ventaja de identificar zonas más amplias de los textos de las que extraer semántica con mayor precisión. En consecuencia, tienen una mayor riqueza semántica. Además, su obtención es totalmente automática tomando como base documentos en lenguaje natural. Unlike the above, the methodology proposed here allows to have complex patterns at the phrase level, which can be expressed by simpler ones as commonly used, with the advantage of identifying larger areas of the texts from which to extract semantics with Greater precision. Consequently, they have greater semantic wealth. In addition, its obtaining is fully automatic based on documents in natural language.
Breve descripción de la invención Brief Description of the Invention
Sería por tanto deseable a la vista de los problemas identificados en el estado de la técnica, disponer de un método que resolviera estos inconvenientes. En particular, que contemple la identificación automática de patrones desde un corpus y la implementación de funciones que organicen los datos generados. It would therefore be desirable in view of the problems identified in the state of the art, to have a method to solve these inconveniences. In particular, that contemplates the automatic identification of patterns from a corpus and the implementation of functions that organize the generated data.
Como solución, se desarrolla un método en el que se describen distintos pasos que permiten la generación automática de patrones de indexación, teniendo como origen un corpus y como salida una lista de patrones ordenados por frecuencia. Incluyendo además opcionalmente, varias funciones especiales que organizan y representan la información intermedia obtenida, y capacidades de expansión de los patrones generados a otros formatos jerárquicos. As a solution, a method is developed in which different steps are described that allow the automatic generation of indexing patterns, having a corpus as origin and as a result a list of patterns ordered by frequency. Also including, optionally, several special functions that organize and represent the information intermediate obtained, and expansion capacities of the generated patterns to other hierarchical formats.
El método de generación de patrones semánticos incluye al menos las siguientes etapas:The method of generating semantic patterns includes at least the following stages:
- Determinar la categoría gramatical de cada término de una secuencia de términos de un texto. - Determine the grammatical category of each term in a sequence of terms in a text.
- Agrupar en grupos, las categorías gramaticales de los términos de la secuencia anterior. Dichos grupos se forman siguiendo el orden de los términos de la secuencia.  - Group the grammatical categories of the terms in the previous sequence into groups. These groups are formed following the order of the sequence terms.
- Contabilizar la frecuencia de aparición de cada grupo.  - Count the frequency of occurrence of each group.
- Establecer un grupo como candidato a patrón cuando la frecuencia de aparición del grupo sea superior a un umbral.  - Establish a group as a pattern candidate when the frequency of the group's appearance exceeds a threshold.
- Determinar la categoría semántica del candidato a patrón según una taxonomía predefinida para una pluralidad de grupos en función de la categoría gramatical de los términos que componen dicho candidato a patrón.  - Determine the semantic category of the employer candidate according to a predefined taxonomy for a plurality of groups based on the grammatical category of the terms that make up said employer candidate.
- Identificar que un candidato a patrón es realmente un patrón cuando dicho candidato a patrón tiene una categoría semántica asociada.  - Identify that a patron candidate is really a patron when said patron candidate has an associated semantic category.
Opcionalmente, los grupos son tupias de categorías gramaticales de términos. Optionally, the groups are tupias of grammatical categories of terms.
Opcionalmente, para la agrupación en categorías gramaticales, los términos del texto incluyen al menos un signo de puntuación y/o una palabra. Optionally, for grouping into grammatical categories, the terms of the text include at least one punctuation mark and / or one word.
Opcionalmente, para establecer un candidato a patrón, la agrupación de categorías se hace a partir de las categorías de los términos adyacentes en una primera iteración. Este tipo candidato a patrón se denomina básico. Optionally, to establish a pattern candidate, the grouping of categories is made from the categories of adjacent terms in a first iteration. This type candidate for employer is called basic.
Alternativamente al caso anterior, para establecer un candidato a patrón, la agrupación de categorías se hace a partir de las categorías de los términos distanciados entre sí por al menos un término intermedio cuya categoría concreta tiene una presencia opcional en el candidato a patrón. Este tipo de candidato se denomina candidato a patrón con elemento(s) opcional(es). Alternatively to the previous case, in order to establish a patron candidate, the grouping of categories is based on the categories of the terms distanced from each other by at least one intermediate term whose specific category has an optional presence in the patron candidate. This type of candidate is called a pattern candidate with optional element (s).
Opcionalmente, tras una primera iteración, para una posterior agrupación en categorías gramaticales, al menos uno de los componentes del patrón es a su vez un candidato a patrón de una iteración anterior. Este tipo candidato a patrón se denomina compuesto. Opcionalmente, el paso de determinar la categoría semántica se realiza sobre los grupos solo en caso de que uno de sus componentes es una categoría gramatical de verbo. Optionally, after a first iteration, for a subsequent grouping into grammatical categories, at least one of the components of the pattern is in turn a candidate for the pattern of a previous iteration. This pattern candidate type is called a compound. Optionally, the step of determining the semantic category is performed on the groups only in case one of its components is a grammatical category of verb.
Opcionalmente, según el caso anterior, uno de sus componentes es un patrón de una iteración anterior del cual adquiere (hereda) su categoría semántica. Optionally, according to the previous case, one of its components is a pattern from a previous iteration from which it acquires (inherits) its semantic category.
Breve descripción de las figuras Brief description of the figures
Para complementar la descripción y con objeto de ayudar a una mejor comprensión de las características de la invención, se acompaña a la presente memoria descriptiva, como parte integrante de la misma, unas figuras In order to complement the description and in order to help a better understanding of the features of the invention, the present specification, as an integral part thereof, is accompanied by figures
FIG. 1 : muestra en un diagrama los principales pasos según un posible ejemplo de realización. FIG. 1: shows in a diagram the main steps according to a possible embodiment.
Descripción detallada de la invención Detailed description of the invention
La presente invención se ilustra adicionalmente mediante el siguiente ejemplo, el cual no pretende ser limitativo de su alcance. The present invention is further illustrated by the following example, which is not intended to limit its scope.
En la FIG. 1 se puede ver la secuencia de pasos para obtener un patrón semántico. Se parte de un texto del que se ha de determinar la categoría gramatical 11 de las palabras, grupos de palabras y signos de puntuación que lo componen. In FIG. 1 you can see the sequence of steps to obtain a semantic pattern. It is based on a text to determine the grammatical category 11 of the words, word groups and punctuation marks that compose it.
Con las categorías gramaticales identificadas, se procede a agrupar categorías 12 bajo distintos criterios. Así se obtienen grupos (tupias) que contienen los códigos gramaticales asociados a las palabras y/o signos de puntuación del texto que se han agrupado. Hay diversas formas de agrupar y por tanto, diferentes tipos de candidatos a patrones (patrones básicos, compuestos o con términos opcionales).  With the grammatical categories identified, categories 12 are grouped under different criteria. Thus groups (tupias) are obtained that contain the grammar codes associated with the words and / or punctuation marks of the text that have been grouped. There are different ways of grouping and therefore, different types of candidates for employers (basic patterns, compound or with optional terms).
Con la información anterior, es posible averiguar qué grupos o tupias son más comunes y contabilizar frecuencias de aparición 13 en cada iteración. Definiendo un umbral de frecuencia, se pueden buscar iterativamente aquellos candidatos a patrones que cumplen la condición de tener una frecuencia de aparición superior a la umbral 14.  With the above information, it is possible to find out which groups or tupias are more common and count frequencies of occurrence 13 in each iteration. By defining a frequency threshold, those candidates for patterns that meet the condition of having an occurrence frequency above threshold 14 can be searched iteratively.
Aquellos que cumplen con la condición exigida, han sido establecidos como candidatos a patrones 15. Para comprobar si son o no patrones, dichos candidatos deben tener una semántica asociada. Para ello, se busca en una taxonomía y se determina la categoría semántica de los componentes del posible patrón 16. En caso de que el patrón contenga una categoría semántica concreta, preferentemente aportada por un verbo, se infiere que dicho candidato es realmente un patrón semántico 17. Con el ejemplo práctico siguiente, se entenderá mejor el procedimiento objeto de la presente invención. Those who meet the required condition have been established as candidates for employers 15. To verify whether or not they are employers, such candidates must have an associated semantics. For this, a taxonomy is sought and the semantic category of the components of the possible pattern is determined 16. In case the pattern contains a specific semantic category, preferably provided by a verb, it is inferred that said candidate is really a semantic pattern 17. With the following practical example, the process object of the present invention will be better understood.
Definimos un posible patrón o candidato a patrón a una tupia que puede contener tanto categorías gramaticales (C) como otros patrones (P), es decir, subpatrones. Ya que poseen la misma estructura, por el momento, se hablará de patrón de forma general, entendiendo que se trata de un posible patrón, es decir, de un candidato que ha de ser verificado en etapas posteriores. We define a possible pattern or candidate for a pattern to a tupia that can contain both grammatical categories (C) and other patterns (P), that is, subpatrons. Since they have the same structure, for the moment, they will talk about a general pattern, understanding that it is a possible pattern, that is, a candidate that has to be verified in later stages.
Ejemplos de tipos de patrones: Examples of pattern types:
Patrón básico (C1 C2): Compuesto únicamente por términos con categorías gramaticales, las tupias puede ser binarias o bien n-arias. En el presente ejemplo, sin pérdida de generalidad, se elegirán binarias.  Basic pattern (C1 C2): Composed only by terms with grammatical categories, the tupias can be binary or n-ary. In the present example, without loss of generality, binaries will be chosen.
Patrones compuestos en caso contrario (P1 C2), (C1 P2), (P1 P2). Compound patterns otherwise (P1 C2), (C1 P2), (P1 P2).
Los patrones de indexación contribuyen a identificar textos a través de categorías gramaticales y semánticas. Estos patrones se generarán y almacenarán a través de tres estructuras de datos definidas: Indexing patterns contribute to identifying texts through grammatical and semantic categories. These patterns will be generated and stored through three defined data structures:
Términos o tokens, que contienen de forma secuencial; es decir, por orden de aparición, las categorías o patrones del texto;  Terms or tokens, which contain sequentially; that is, in order of appearance, the categories or patterns of the text;
Mapa, que contiene de forma secuencial; es decir, por orden de aparición, los patrones identificados desde los términos o tokens, y  Map, which contains sequentially; that is, in order of appearance, the patterns identified from the terms or tokens, and
Patrones, que contienen y agrupan por orden de frecuencia los patrones generados.  Patterns, which contain and group the generated patterns in order of frequency.
Por ejemplo: si se tiene que la estructura Tokens está representada por: For example: if you have token structure is represented by:
T = (C138,C22,C1 1 ,C22,C137,C22,C5,C22,C11 ,C13,22,C29,C11 1 ,C132,C22,C11 ,C67), entonces se genera el Mapa como:  T = (C138, C22, C1 1, C22, C137, C22, C5, C22, C11, C13,22, C29, C11 1, C132, C22, C11, C67), then the Map is generated as:
PM = (PM1 ,PM2,PM3,PM4,PM5,PM6,PM7, ... ,PMn),  PM = (PM1, PM2, PM3, PM4, PM5, PM6, PM7, ..., PMn),
en donde: where:
PM1 = (C138, C22), PM2 = (C22, C11), PM3 = (C11 , C22), ...  PM1 = (C138, C22), PM2 = (C22, C11), PM3 = (C11, C22), ...
y la tabla de Patrones queda como se muestra a continuación: and the Pattern table is as shown below:
P= (P1 -> 83 veces, P2 -> 55 veces, P3 -> 40 veces, ... , Pn -> 1 vez). P = (P1 -> 83 times, P2 -> 55 times, P3 -> 40 times, ..., Pn -> 1 time).
El procedimiento descrito emplea distintos módulos independientes con capacidad de analizar los textos, generar oraciones, tokens, mapas y patrones. A continuación, se incluye una breve descripción de los pasos más importantes de la metodología: Generación de oraciones y tokens The procedure described uses different independent modules with the ability to analyze texts, generate sentences, tokens, maps and patterns. Below is a brief description of the most important steps of the methodology: Generation of sentences and tokens
Los corpus de trabajo están formados por textos que pueden incluir junto a sus términos la información de su categoría gramatical. Se puede trabajar directamente sobre ellos o pre procesarlos para representar la información morfológica de las palabras con otros juegos de etiquetas gramaticales. En el ejemplo, se emplea un corpus creado para la lengua inglesa a partir de numerosas fuentes representativas. No obstante, el método aquí descrito no debe considerarse limitado a un corpus concreto. The corpus of work are formed by texts that can include, together with their terms, the information of their grammatical category. You can work directly on them or preprocess them to represent the morphological information of the words with other sets of grammar labels. In the example, a corpus created for the English language is used from numerous representative sources. However, the method described here should not be considered limited to a specific corpus.
Los códigos que aparecen junto a cada término ya sea palabra o signo de puntuación indican la categoría gramatical a la que pertenece. Esta información será también empleada para el análisis semántico posterior. The codes that appear next to each term either word or punctuation mark indicate the grammatical category to which it belongs. This information will also be used for subsequent semantic analysis.
Texto extraído del Brown corpus: Text extracted from Brown corpus:
The/at Fulton/np-tl County/nn-tl Grand/jj-tl Jury/nn-tl said/vbd  The / at Fulton / np-tl County / nn-tl Grand / jj-tl Jury / nn-tl said / vbd
Friday/nr an/at investigation/nn of/in Atlanta's/np$ recent/jj  Friday / nr an / at investigation / nn of / in Atlanta's / np $ recent / jj
primary/nn election/nn produced/vbd  primary / nn election / nn produced / vbd
Las etiquetas gramaticales que se muestran y sus correspondientes categorías son: The grammar labels shown and their corresponding categories are:
Figure imgf000007_0001
Figure imgf000007_0001
También se puede trabajar sin la información gramatical que aportan y obtenerla término a término mediante herramientas especializadas de análisis morfológico. Estas herramientas suelen tener juegos de etiquetas gramaticales específicos según las necesidades de información que se hayan preestablecido sobre un dominio concreto. Asignación de la información gramatical al texto anterior por una herramienta especializada mediante un juego de códigos gramaticales específico: You can also work without the grammatical information they provide and obtain it term by term using specialized morphological analysis tools. These tools usually have specific grammatical label sets according to the information needs that have been pre-established on a specific domain. Assignment of the grammar information to the previous text by a specialized tool through a specific grammar code set:
The/1879 Fulton/1801 County/1793 Grand/1850 Jury/1793 said/1828  The / 1879 Fulton / 1801 County / 1793 Grand / 1850 Jury / 1793 said / 1828
Friday/1814 an/1880 investigation/1792 of/1857 Atlanta's/1802  Friday / 1814 an / 1880 investigation / 1792 of / 1857 Atlanta's / 1802
recent/1850 primary/1793 election/1793 produced/1944  recent / 1850 primary / 1793 election / 1793 produced / 1944
Las categorías gramaticales que se muestran y sus correspondientes códigos gramaticales asociados a dichas categorías son: The grammatical categories shown and their corresponding grammatical codes associated with these categories are:
Figure imgf000008_0001
Figure imgf000008_0001
A partir del corpus se extraen sus oraciones teniendo en cuenta un conjunto de caracteres delimitadores. Una vez identificadas las oraciones se normalizan sus términos y se almacena esta información en la estructura de datos Tokens. From the corpus their sentences are extracted taking into account a set of delimiting characters. Once the sentences have been identified, their terms are normalized and this information is stored in the Tokens data structure.
Ejemplo: Example:
En el siguiente ejemplo se puede observar como se han troceado frases del Brown Corpus y se han extraído las estructuras de Tokens, teniendo en cuenta que las comas y los símbolos de fin de frase son filtrados para su uso en el procesamiento del lenguaje natural.  In the following example, you can see how Brown Corpus phrases have been chopped and the Tokens structures have been extracted, taking into account that commas and end of sentence symbols are filtered for use in natural language processing.
1) The jury further said in term-end presentments that the city Executive Committee, 2) which had over-all charge oí the election, 3) "deserves the praise and thanks oí the City of Atlanta" for the manner in which the election was conducted. Se determina la categoría gramatical Ck de cada término del texto. Así se obtiene la estructura de tokens, T. 1) The jury further said in term-end presentments that the city Executive Committee, 2) which had over-all charge I heard the election, 3) "deserves the praise and thanks I heard the City of Atlanta" for the manner in which the election was conducted The grammatical category Ck of each text term is determined. This is how the token structure, T.
T = (C138,C22,C55,C1 1 ,C127,C22,C1 11 ,C138,C22,C50,C1 11 ,C147,C22,C11 ,C151 ,C138, C22,C50,C13,C1 1 ,C138,C22,C162,C82,C151 ,C138,C22,C29,C142,C138,C22,C127,C1 11 , C138,C22,C144,C1 1 ,C54);  T = (C138, C22, C55, C1 1, C127, C22, C1 11, C138, C22, C50, C1 11, C147, C22, C11, C151, C138, C22, C50, C13, C1 1, C138, C22 , C162, C82, C151, C138, C22, C29, C142, C138, C22, C127, C1 11, C138, C22, C144, C1 1, C54);
Donde: C138=artículo determinado; C22=nombre; C55=adverbio; C1 1=verbo; Where: C138 = determined article; C22 = name; C55 = adverb; C1 1 = verb;
C127=preposición; C11 1=pronombre relativo; C50 = coma [,]; C147=verbo to have; C127 = preposition; C11 1 = relative pronoun; C50 = comma [,]; C147 = verb to have;
C151 =preposición of; C13=símbolo; C162=conjunción and; C82=verbo absoluto; C151 = preposition of; C13 = symbol; C162 = conjunction and; C82 = absolute verb;
C29=apostrofe; C142= preposición for; C144=verbo to be; C54= punto [.] C29 = apostrophe; C142 = preposition for; C144 = verb to be; C54 = point [.]
Generación de patrones básicos Basic Pattern Generation
Para la generación de patrones básicos, se agrupan en grupos MPk, los términos del conjunto T con sus categorías gramaticales, siguiendo el orden de aparición en el texto. For the generation of basic patterns, the terms of the set T with their grammatical categories are grouped in MPk groups, following the order of appearance in the text.
La forma de agrupación puede elegirse para crear grupos de varios términos. En el presente ejemplo, se hace por parejas (se ha subrayado sólo la primera). The grouping form can be chosen to create groups of several terms. In the present example, it is done in pairs (only the first one has been underlined).
T = (C138,C22,C55,C1 1.C127.C22.C1 11 ,C138,C22,c50,C11 1.C147.C22.C11.C151 ,  T = (C138, C22, C55, C1 1.C127.C22.C1 11, C138, C22, c50, C11 1.C147.C22.C11.C151,
C138.C22.C50.C13.C11.C138.C22.C162.C82.C151.C138.C22.C29.C142.C138.C22.C127. C138.C22.C50.C13.C11.C138.C22.C162.C82.C151.C138.C22.C29.C142.C138.C22.C127.
C1 11.C138.C22.C144.C1 1 ,C54); C1 11.C138.C22.C144.C1 1, C54);
MP = (MP1 ,MP2, ... ,MPn); MP = (MP1, MP2, ..., MPn);
Donde: MP1 = (C138.C22); MP2 = (C22.C55); MP3 = (C55.C1 1); ... ; MPn = (C11.C54);  Where: MP1 = (C138.C22); MP2 = (C22.C55); MP3 = (C55.C1 1); ... MPn = (C11.C54);
Se contabiliza la frecuencia de aparición de cada grupo (pareja) Pk para generar los patrones básicos P. The frequency of occurrence of each group (couple) Pk is counted to generate the basic patterns P.
P = (P1 ,P2, ... ,Pn);  P = (P1, P2, ..., Pn);
Donde: P1 = (C138.C22), 7 veces; P2 = (C22.C55), 2 veces; ... ; Pn = (C11.C54), 1 vez;  Where: P1 = (C138.C22), 7 times; P2 = (C22.C55), 2 times; ... Pn = (C11.C54), 1 time;
De este análisis, se puede establecer un candidato a patrón válido cuando la frecuencia es superior a un umbral, por ejemplo más de 3 veces. From this analysis, a valid pattern candidate can be established when the frequency exceeds a threshold, for example more than 3 times.
Generación de patrones compuestos Generation of composite patterns
Este proceso se puede repetir iterativamente para generar patrones compuestos. Un patrón compuesto contiene al menos a otro patrón como uno de sus términos. El proceso sería similar al anterior salvo que en lugar de categorías se sustituirían los términos por su subpatrón equivalente. Así se pueden localizar nuevos patrones que serán por su naturaleza patrones compuestos. This process can be repeated iteratively to generate composite patterns. A compound pattern contains at least one other pattern as one of its terms. The process would be similar to the previous one except that instead of categories the terms would be replaced by their equivalent subpattern. Thus, new patterns can be located that will be by their nature compound patterns.
En el ejemplo anterior, a partir de las categorías gramaticales que componen T y de los patrones básicos, se realiza una sustitución remplazando el patrón más frecuente, P1 como sigue: In the previous example, from the grammatical categories that make up T and the basic patterns, a substitution is made replacing the most frequent pattern, P1 as follows:
P1 = (C138.C22)  P1 = (C138.C22)
T = (C138,C22,C55,C1 1 ,C127,C22,C1 11 ,C138,C22,c50,C1 11 ,C147,C22,C11 ,C151 , T = (C138, C22, C55, C1 1, C127, C22, C1 11, C138, C22, c50, C1 11, C147, C22, C11, C151,
C138,C22,c50,C13,C1 1 ,C138,C22,C162,C82,C151 ,C138,C22,C29,C142,C138,C22,C127, C1 11 ,C138,C22,C144,C1 1 ,C54); C138, C22, c50, C13, C1 1, C138, C22, C162, C82, C151, C138, C22, C29, C142, C138, C22, C127, C1 11, C138, C22, C144, C1 1, C54);
Al sustituir con el patrón P1 , T queda como: When replacing with the P1 pattern, T looks like:
T = (P1.C55.C11 ,C127,C22,C11 1 ,PJ.,C50,C1 11 ,C147,C22,C1 1 ,C151 ,PJ.,C50,C13,  T = (P1.C55.C11, C127, C22, C11 1, PJ., C50, C1 11, C147, C22, C1 1, C151, PJ., C50, C13,
C1 1 ,P1,C162,C82,C151 ,P1,C29,C142,C138,C22,C127,PJ.,C22,C144,C11 ,C54); C1 1, P1, C162, C82, C151, P1, C29, C142, C138, C22, C127, PJ., C22, C144, C11, C54);
Después, a partir de las nuevas tupias formadas por los elementos del patrón substituido, se obtienen nuevos patrones que se guardan en el Mapa. Dichos nuevos patrones son en este caso: Then, from the new tupias formed by the elements of the substituted pattern, new patterns are obtained that are saved on the Map. These new patterns are in this case:
MP1-1 = (P1.C55); MP2-1 = (C1 11.P1); MP3-1 = (P1.C50); MPn-1 = (P1.C22);  MP1-1 = (P1.C55); MP2-1 = (C1 11.P1); MP3-1 = (P1.C50); MPn-1 = (P1.C22);
Al igual que con los patrones básicos, se contabiliza la frecuencia de aparición de cada grupo (pareja) Pk para generar los patrones compuestos que se almacenan en la tabla Patrones. As with the basic patterns, the frequency of occurrence of each group (couple) Pk is counted to generate the composite patterns that are stored in the Patterns table.
P = (P1-1 , P2-1 , Pn-1);  P = (P1-1, P2-1, Pn-1);
Donde: P1-1 = (P1.C55), 1 veces; P2-1 = (C1 11.P1), 1 veces; P3-1 = (P1.C50), 2 veces;  Where: P1-1 = (P1.C55), 1 times; P2-1 = (C1 11.P1), 1 times; P3-1 = (P1.C50), 2 times;
Pn-1 = (P1.C22), 1 vez;  Pn-1 = (P1.C22), 1 time;
Este proceso se define por niveles de sustitución máximo o bien, se ejecuta hasta que no sean posibles más sustituciones. Sin embargo, resulta ventajoso tener un nivel máximo de parada configurable en caso de que el conocimiento del dominio sea extenso o se conozca el nivel a partir del cual los patrones ya no son útiles y por tanto su extracción no es necesaria. This process is defined by maximum substitution levels or it is executed until no more substitutions are possible. However, it is advantageous to have a maximum configurable stop level in case the domain knowledge is extensive or the level from which the patterns are no longer useful and therefore their extraction is not necessary.
Generación de patrones con términos opcionales Generation of patterns with optional terms
Los patrones con elementos opcionales son patrones formados por una tupia que puede contener N elementos (bien sean categorías gramaticales o subpatrones) opcionales intermedios. Para generar estos patrones, se buscan por orden en la lista de tokens los componentes de cada patrón, admitiendo la presencia de elementos intermedios entre ellos. El número máximo de elementos intermedios consecutivos admisible es configurable. Su valor generalmente es dos (2). Posteriormente, se guardan estos patrones en el Mapa y se añaden a la estructura de Patrones ordenados por su frecuencia de aparición. The patterns with optional elements are patterns formed by a tupia that can contain N elements (either grammatical categories or subpatrons) optional intermediate. To generate these patterns, the components of each pattern are searched in order in the token list, admitting the presence of intermediate elements between them. The maximum number of consecutive intermediate elements allowed is configurable. Its value is usually two (2). Subsequently, these patterns are stored on the Map and added to the structure of Patterns ordered by their frequency of appearance.
Por ejemplo, se define P99 = (C11.C22); al aplicarlo a T, se obtiene MOk. For example, P99 = (C11.C22) is defined; when applied to T, MOk is obtained.
T = (C138.C22.C55.C1 1.C127.C22.C1 11.C138.C22.C50.C11 1.C147.C22. T = (C138.C22.C55.C1 1.C127.C22.C1 11.C138.C22.C50.C11 1.C147.C22.
C1 1 ,C151 ,C138,C22,C50,C13,C11.C138,C22,C162,C82,C151 ,C138,C22,C29,C142,C138, C22,C127,C11 1 ,C138,C22,C144,C11 ,C54); C1 1, C151, C138, C22, C50, C13, C11.C138, C22, C162, C82, C151, C138, C22, C29, C142, C138, C22, C127, C11 1, C138, C22, C144, C11, C54);
M01 = (C11 ,[C127],C22); 01 = (C1 1 ,[C127],C22), 1 vez; M01 = (C11, [C127], C22); 01 = (C1 1, [C127], C22), 1 time;
M01' = (C11 ,[C138],C22); 01 ' = (C11 ,[C138],C22), 1 vez; M01 '= (C11, [C138], C22); 01 '= (C11, [C138], C22), 1 time;
M02 = (C11 ,[C151 ,C138],C22); 02 = (C11 ,[C151 ,C138],C22), 1 vez; M02 = (C11, [C151, C138], C22); 02 = (C11, [C151, C138], C22), 1 time;
MO = (M01 , M01', M02); MO = (M01, M01 ', M02);
0 = (01 ,01',02) 0 = (01, 01 ', 02)
Características semánticas Semantic characteristics
Una vez se tienen los patrones, es posible determinar la categoría semántica a la que pertenecen. Para ello, se recurre a una taxonomía predefinida donde se determinan a qué categoría semántica pertenecen los grupos anteriormente generados, según los términos que los componen. Once you have the patterns, it is possible to determine the semantic category to which they belong. To do this, a predefined taxonomy is used where they determine to which semantic category the previously generated groups belong, according to the terms that compose them.
La semántica se halla, en el presente ejemplo de realización, en los términos con categorías gramaticales de tipo verbo. The semantics are found, in the present example of embodiment, in terms with grammatical categories of the verb type.
Una vez identificadas las categorías gramaticales, con la ayuda de una taxonomía se puede asociar su código semántico correspondiente.  Once the grammatical categories have been identified, the corresponding semantic code can be associated with the help of a taxonomy.
Para incluir la semántica en los patrones generados, se valida para los elementos que forman el patrón si alguno corresponde con una categoría gramatical tipo verbo y se guarda su código semántico correspondiente. Así, se obtienen cuatro escenarios a la hora de definir la semántica para un patrón que posea un verbo asociado: Caso 1 : Patrón con semántica: Posee una categoría tipo verbo con código semántico asociado. Por ejemplo, al código semántico "Alimentarse" pertenecen "alimentar", "comer", "beber". To include the semantics in the generated patterns, it is validated for the elements that form the pattern if one corresponds to a grammar category verb type and its corresponding semantic code is saved. Thus, four scenarios are obtained when defining the semantics for a pattern that has an associated verb: Case 1: Pattern with semantics: It has a verb category with associated semantic code. For example, to the semantic code "Feed" belong "feed,""eat,""drink."
Ejemplo:  Example:
PK = (CA,CB) = (NOMBRE, VERBO) y VERBO tiene un código semántico con valor x por ejemplo. P K = (C A , C B ) = (NAME, VERB) and VERB has a semantic code with value x for example.
Caso 2: Patrón que obtiene la semántica directamente del verbo por no estar contenido en un grupo semántico Case 2: Pattern that obtains semantics directly from the verb because it is not contained in a semantic group
Ejemplo:  Example:
Ps = (PT,Cb) = (PT, VERBO) y VERBO no pertenece a un grupo semántico, pero posee semántica del verbo que representa intrínsecamente. P s = (PT, C b ) = (PT, VERB) and VERB does not belong to a semantic group, but has semantics of the verb that it intrinsically represents.
.Si PT aporta semántica (contiene al menos un verbo directa o indirectamente), su semántica estará asociada a los patrones que lo contengan (en este ejemplo Ps). .If P T contributes semantics (it contains at least one verb directly or indirectly), its semantics will be associated with the patterns that contain it (in this example P s ).
Caso 3: Patrón sin semántica por no tener una categoría tipo verbo. Case 3: Pattern without semantics for not having a verb category.
Ejemplo: Example:
Pv = (Cj,CA) = (ARTICULO DEFINIDO, NOMBRE). Pv = (Cj, C A ) = (DEFINED ARTICLE, NAME).
Caso 4: Patrón sin semántica directa por estar compuesto por dos (2) subpatrones Case 4: Pattern without direct semantics for being composed of two (2) subpatrons
Ejemplo: Example:
PF = (PG, PH)- Si bien se categoriza como sin semántica directa por desconocer la esencia de los patrones que contiene; el patrón PF tendrá asociada la semántica (si existe) que contengan los patrones PG y PH- PF = (PG, PH) - Although it is categorized as without direct semantics because it ignores the essence of the patterns it contains; the pattern P F will have associated the semantics (if it exists) that contain the patterns P G and PH-

Claims

REIVINDICACIONES
1. - Método de generación de patrones semánticos caracterizado por que comprende las siguientes etapas: 1. - Method of generating semantic patterns characterized by comprising the following stages:
- determinar la categoría gramatical (11) Ck de cada término de una secuencia T de términos de un texto,  - determine the grammatical category (11) Ck of each term of a T sequence of terms in a text,
- agrupar en grupos (12) MPk, las categorías gramaticales de los términos de la secuencia o T, donde los grupos se forman siguiendo el orden de los términos del secuencia,  - group the grammatical categories of the terms of the sequence or T in groups (12) MPk, where the groups are formed following the order of the sequence terms,
- contabilizar la frecuencia (13) de aparición de cada grupo P,  - count the frequency (13) of occurrence of each group P,
- establecer un candidato a patrón (15) cuando la frecuencia de aparición de un grupo es superior a un umbral,  - establish a pattern candidate (15) when the frequency of occurrence of a group is greater than a threshold,
- determinar la categoría semántica del candidato a patrón (16) según una taxonomía predefinida para una pluralidad de grupos en función de la categoría gramatical de los términos que componen dicho candidato a patrón,  - determine the semantic category of the employer candidate (16) according to a predefined taxonomy for a plurality of groups based on the grammatical category of the terms that make up said employer candidate,
- identificar un patrón (17) cuando el candidato a patrón tenga una categoría semántica asociada.  - identify a pattern (17) when the pattern candidate has an associated semantic category.
2. - Método según la reivindicación 1 , caracterizado por que los grupos son tupias de categorías gramaticales de términos. 2. - Method according to claim 1, characterized in that the groups are tupias of grammatical categories of terms.
3. - Método según la reivindicación 2, caracterizado por que, para la agrupación en categorías gramaticales, los términos del texto comprenden al menos uno de los siguientes elementos: 3. - Method according to claim 2, characterized in that, for grouping into grammatical categories, the terms of the text comprise at least one of the following elements:
- un signo de puntuación,  - a punctuation mark,
- una palabra.  - a word.
4. - Método según la reivindicación 2 ó 3, caracterizado por que, para establecer un candidato a patrón, la agrupación de categorías se hace a partir de las categorías de los términos adyacentes en una primera iteración. 4. - Method according to claim 2 or 3, characterized in that, to establish a candidate for pattern, the grouping of categories is made from the categories of adjacent terms in a first iteration.
5. - Método según la reivindicación 2 ó 3, caracterizado por que, para establecer un candidato a patrón, la agrupación de categorías se hace a partir de las categorías de los términos distanciados entre sí por al menos un término intermedio cuya categoría concreta tiene una presencia opcional en el candidato a patrón. 5. - Method according to claim 2 or 3, characterized in that, to establish a candidate for employer, the grouping of categories is made from the categories of terms distanced from each other by at least one intermediate term whose specific category has a Optional presence in the employer candidate.
6. - Método según la reivindicación 4 ó 5, caracterizado por que, tras una primera iteración, para una posterior agrupación en categorías gramaticales, al menos uno de los componentes del patrón es a su vez un candidato a patrón de una iteración anterior. 6. - Method according to claim 4 or 5, characterized in that, after a first iteration, for a subsequent grouping into grammatical categories, at least one of the components of the pattern is in turn a candidate for a pattern of a previous iteration.
7. - Método según una cualquiera de las reivindicaciones anteriores, caracterizado por que el paso de determinar la categoría semántica se realiza sobre los grupos cuando uno de sus componentes es una categoría gramatical de verbo. 7. - Method according to any one of the preceding claims, characterized in that the step of determining the semantic category is performed on the groups when one of its components is a grammatical category of verb.
8. - Método según la reivindicación 7, caracterizado por que el componente, cuya categoría gramatical es verbo, es un patrón de una iteración anterior. 8. - Method according to claim 7, characterized in that the component, whose grammatical category is verb, is a pattern of a previous iteration.
PCT/ES2013/070638 2012-09-26 2013-09-16 Method for generating semantic patterns WO2014049186A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
ES201231484 2012-09-26
ESP201231484 2012-09-26

Publications (1)

Publication Number Publication Date
WO2014049186A1 true WO2014049186A1 (en) 2014-04-03

Family

ID=50387044

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/ES2013/070638 WO2014049186A1 (en) 2012-09-26 2013-09-16 Method for generating semantic patterns

Country Status (1)

Country Link
WO (1) WO2014049186A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3510554A4 (en) * 2016-09-09 2020-03-11 Ascent Technologies Inc. Real-time regulatory compliance alerts using modularized and taxonomy-based classification of regulatory obligations

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000073936A1 (en) * 1999-05-28 2000-12-07 Sehda, Inc. Phrase-based dialogue modeling with particular application to creating recognition grammars for voice-controlled user interfaces
US20050108630A1 (en) * 2003-11-19 2005-05-19 Wasson Mark D. Extraction of facts from text
US20100287162A1 (en) * 2008-03-28 2010-11-11 Sanika Shirwadkar method and system for text summarization and summary based query answering

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000073936A1 (en) * 1999-05-28 2000-12-07 Sehda, Inc. Phrase-based dialogue modeling with particular application to creating recognition grammars for voice-controlled user interfaces
US20050108630A1 (en) * 2003-11-19 2005-05-19 Wasson Mark D. Extraction of facts from text
US20100287162A1 (en) * 2008-03-28 2010-11-11 Sanika Shirwadkar method and system for text summarization and summary based query answering

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3510554A4 (en) * 2016-09-09 2020-03-11 Ascent Technologies Inc. Real-time regulatory compliance alerts using modularized and taxonomy-based classification of regulatory obligations

Similar Documents

Publication Publication Date Title
Chambers et al. Dense event ordering with a multi-pass architecture
Islam et al. Real-word spelling correction using Google Web 1T 3-grams
Dunietz et al. A new entity salience task with millions of training examples
Abu-Jbara et al. Reference scope identification in citing sentences
US11113470B2 (en) Preserving and processing ambiguity in natural language
Şahin et al. Redefinition of Turkish morphology using flag diacritics
Antony et al. Computational morphology and natural language parsing for Indian languages: a literature survey
McCrae Mapping wordnet instances to Wikipedia
Vadas et al. Parsing noun phrases in the Penn Treebank
Marcińczuk et al. Optimizing CRF-based model for proper name recognition in Polish texts
Nakanishi et al. Probabilistic models for disambiguation of an HPSG-based chart generator
Peng et al. An empirical study of Chinese name matching and applications
WO2014049186A1 (en) Method for generating semantic patterns
Qian et al. 2d trie for fast parsing
Goyal Named entity recognition for south asian languages
Liu et al. Japanese named entity recognition for question answering system
Paikens et al. SUMMA at TAC Knowledge Base Population Task 2016.
Ziering et al. Multilingual lexicon bootstrapping-improving a lexicon induction system using a parallel corpus
Bindu et al. Named entity identifier for malayalam using linguistic principles employing statistical methods
Rokaya Arabic semantic spell checking based on power links
Mille et al. Upf at epe 2017: Transduction-based deep analysis
Akeel et al. Divergence and ambiguity control in an English to Arabic machine translation
McClanahan A probabilistic morphological analyzer for Syriac
Morsi et al. Studying the impact of various features on the performance of Conditional Random Field-based Arabic Named Entity Recognition
Kocijan et al. Designing a croatian aspectual derivatives dictionary: preliminary stages

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13840851

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13840851

Country of ref document: EP

Kind code of ref document: A1