WO2002097674A2 - Efficient collation element structure for handling large numbers of characters - Google Patents
Efficient collation element structure for handling large numbers of characters Download PDFInfo
- Publication number
- WO2002097674A2 WO2002097674A2 PCT/US2002/016186 US0216186W WO02097674A2 WO 2002097674 A2 WO2002097674 A2 WO 2002097674A2 US 0216186 W US0216186 W US 0216186W WO 02097674 A2 WO02097674 A2 WO 02097674A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- collation
- weight value
- weight
- primary
- collation element
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/02—Comparing digital values
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/22—Arrangements for sorting or merging computer data on continuous record carriers, e.g. tape, drum, disc
- G06F7/24—Sorting, i.e. extracting data from one or more carriers, rearranging the data in numerical or other ordered sequence, and rerecording the sorted data on the original carrier or on a different carrier or set of carriers sorting methods in general
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2207/00—Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F2207/02—Indexing scheme relating to groups G06F7/02 - G06F7/026
- G06F2207/025—String search, i.e. pattern matching, e.g. find identical word or best match in a string
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
- Y10S707/99934—Query formulation, input preparation, or translation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99937—Sorting
Definitions
- the present invention relates to the process of indexing and sorting data within a database system. More specifically, the present invention relates to a method and an apparatus for providing an efficient collation element structure for encoding sorting weights for a large number of characters.
- Multi-lingual sorting is typically accomplished by converting strings of characters into corresponding strings of collation elements (these strings are also known as sorting keys), and then comparing the strings of collation elements to perform the sorting operation. This conversion process is typically accomplished by looking up characters in a collation weight table that contains a corresponding collation weight for each character.
- Unicode Technical Report No. 10 specifies a collation element structure that includes a 16-bit primary weight value followed by an eight-bit secondary weight value and an eight-bit tertiary weight value.
- the primary weight value identifies a character, while the secondary weight value specifies an accent on the character, and the tertiary weight value specifies case information (and possibly related punctuation) for the character.
- the primary weight value may specify that a character is an "a”
- the secondary value specifies that the character has an accent "a”
- the tertiary value specifies that the character is upper case "A”.
- a comparison function typically compares the primary weights first.
- the comparison function compares the secondary weights. If both primary and secondary weights match, the comparison function compares the tertiary weights.
- Report No. 10 can only encode 65,536 different characters. However, it is becoming necessary to provide more that 65,536 characters. This can be accomplished by increasing the size of the primary weight value to 32 bits (4 bytes). However, increasing the size of the primary weight value from 16 to 32 bits has a number of disadvantages: (1) more memory is required to build a linguistic index to support the
- One embodiment of the present invention provides a system for facilitating use of a collation element that supports a large number of characters.
- the system operates by receiving the collation element and reading a primary weight value from a primary weight field within the collation element. If the primary weight value falls within a reserved set of values, the system reads an additional portion of the primary weight value from both a secondary weight field and a tertiary weight field within the collation element. On the other hand, if the primary weight value is not within the reserved set of values, the system reads a secondary weight value from the secondary weight field, and also reads a tertiary weight value from the tertiary weight field. In one embodiment of the present invention, if the primary weight value falls within a reserved set of values, the system sets the secondary weight value to a secondary default value, and sets the tertiary weight value to a tertiary default value.
- the collation element adheres to the Unicode standard.
- the primary weight value identifies a character.
- the secondary weight value specifies an accent on the character, and the tertiary weight value can specify case information for the character.
- the collation element is four bytes in size, of which the primary weight field is two bytes, the secondary weight field is one byte and the tertiary weight field is one byte, unless a value in the primary weight field belongs to the reserved set of values, in which case the primary weight field takes up all four bytes of the collation element.
- the reserved set of values for the primary weight value includes hexidecimal values OxFFFO-OxFFFF.
- the collation element is taken from a collation weight table that is used to map characters to collation weights in order to establish an ordering between strings of characters.
- the system additionally constructs a sorting key for a string by reading each character in the string and looking up a corresponding collation element for each character from the collation weight table. The system subsequently adds the corresponding collation element for each character to the sorting key. Note that if this sorting key is associated with a record in a database, the sorting key can used to construct a linguistic index for the database.
- FIG. 1 illustrates a computer system with a database in accordance with an embodiment of the present invention.
- FIG. 2 illustrates alternative structures for a collation element in accordance with an embodiment of the present invention.
- FIG. 3 A illustrates how a sorting key is created in accordance with an embodiment of the present invention.
- FIG. 3B is a flow chart illustrating the process of creating a sorting key in accordance with an embodiment of the present invention.
- FIG. 4 is a flow chart illustrating the process of reading a collation element in accordance with an embodiment of the present invention.
- a computer readable storage medium which may be any device or medium that can store code and/or data for use by a computer system. This includes, but is not limited to, magnetic and optical storage devices such as disk drives, magnetic tape,
- CDs compact discs
- DVDs digital versatile discs or digital video discs
- computer instruction signals embodied in a transmission medium (with or without a carrier wave upon which the signals are modulated).
- the transmission medium may include a communications network, such as the Internet.
- FIG. 1 illustrates a computer system 102 with a database 104 in accordance with an embodiment of the present invention.
- Computer system 102 can generally include any type of computer system, including, but not limited to, a computer system based on a microprocessor, a mainframe computer, a digital signal processor, a portable computing device, a personal organizer, a device controller, and a computational engine within an appliance.
- Database 104 can include any type of system for storing data in non-volatile storage. This includes, but is not limited to, systems based upon magnetic, optical, and magneto-optical storage devices, as well as storage devices based on flash memory and/or battery-backed up memory.
- Database 104 includes a data file 106 comprised of a collection of collection of records, which are stored in insertion order. Data file 106 can be referenced through one or more indexes, such as index 108, which specifies an ordering for the records in data file 106. This ordering is typically determined by sorting an associated target column in data file 106.
- each character string in the target column is first converted into a sorting key by looking up characters in collation weight table 1 10.
- collation weight table 1 10 is simply an array that contains a collation element for each possible character. Structure of Collation Element
- FIG. 2 illustrates alternative structures for a collation element 204 in accordance with an embodiment of the present invention. As illustrated in FIG. 2, collation element 204 is produced by performing a lookup into collation weight table 1 10.
- collation element 204 occupies four bytes of data, and can have one of two forms.
- the first two bytes of collation element 204 contain primary weight field 206, while the third byte contains secondary weight field 208 and the fourth byte contains tertiary weight field 210.
- the first two bytes of collation element 204 contain a reserved value in the range OxFFFO-OxFFFF. This reserved value indicates that the third and fourth bytes of collation element 204 contain an extended portion of the primary weight field, instead of the secondary and tertiary weight values. In this case, the secondary and tertiary weight values are set to default values.
- the second form supports more than 1 ,000,000 different characters because each of the 16 possible values OxFFFO-OXFFFF in the first and second bytes of collation element 204 is associated with 16 bits or 65,536 possible values in the third and fourth bytes of collation element 204.
- the secondary and tertiary weight values can be set to default values because new characters with identifiers greater than 65,536 are Chinese Japanese and Korean (CJK) characters, mainly Han and Hangul Jamo characters, and there are no accent and case differences between Han/Hangul Jamo characters.
- CJK Chinese Japanese and Korean
- FIG. 3A illustrates how a sorting key is created in accordance with an embodiment of the present invention.
- a string 302 is converted character- by-character into a string of collation elements (weights) that comprise sorting key
- FIG. 3B is a flow chart illustrating the process of creating a sorting key 304 in accordance with an embodiment of the present invention. For each character 202 in a string 302, the system reads character 202 (step 306) and looks up a collation element
- step 308 The system then adds collation element 204 to sorting key 304 (step 310).
- FIG. 4 is a flow chart illustrating the process of reading a collation element
- the system starts by receiving collation element 204 during a sorting process or some other operation requiring comparisons between sorting keys (step 402). Next, the system determines if the first two (higher order) bytes of collation element 204 contain a reserved value, which is greater than or equal to OxFFFO (step 404). If so, the system takes the primary weight value to be all four bytes of collation element 204, and the secondary and tertiary weight values are set to default values (step 406).
- the system sets the primary weight value to be the first and second bytes of collation element 204. This is accomplished by shifting collation element 204 by 16 bits to the right, and then taking the remaining two bytes as the primary weight value.
- the secondary weight value is taken from the third (second to lowest order) byte of collation element 204. This is accomplished by shifting collation element 204 right by eight bits and taking the secondary weight value to be the lower order byte of the remaining word.
- the tertiary weight value is taken from the fourth (lowest order) byte of collation element 204 (step 408).
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN02809865.XA CN1531692B (en) | 2001-05-31 | 2002-05-22 | Efficient collation element structure for handling large numbers of characters |
AU2002311984A AU2002311984A1 (en) | 2001-05-31 | 2002-05-22 | Efficient collation element structure for handling large numbers of characters |
JP2003500784A JP4685348B2 (en) | 2001-05-31 | 2002-05-22 | Efficient collating element structure for handling large numbers of characters |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/872,552 US6877003B2 (en) | 2001-05-31 | 2001-05-31 | Efficient collation element structure for handling large numbers of characters |
US09/872,552 | 2001-05-31 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2002097674A2 true WO2002097674A2 (en) | 2002-12-05 |
WO2002097674A3 WO2002097674A3 (en) | 2004-02-19 |
Family
ID=25359815
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2002/016186 WO2002097674A2 (en) | 2001-05-31 | 2002-05-22 | Efficient collation element structure for handling large numbers of characters |
Country Status (5)
Country | Link |
---|---|
US (1) | US6877003B2 (en) |
JP (1) | JP4685348B2 (en) |
CN (1) | CN1531692B (en) |
AU (1) | AU2002311984A1 (en) |
WO (1) | WO2002097674A2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006134332A (en) * | 2004-11-05 | 2006-05-25 | Microsoft Corp | Automated collation creation |
Families Citing this family (125)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8645137B2 (en) | 2000-03-16 | 2014-02-04 | Apple Inc. | Fast, language-independent method for user authentication by voice |
CA2390849A1 (en) * | 2002-06-18 | 2003-12-18 | Ibm Canada Limited-Ibm Canada Limitee | System and method for sorting data |
US7359905B2 (en) * | 2003-06-24 | 2008-04-15 | Microsoft Corporation | Resource classification and prioritization system |
US7941311B2 (en) * | 2003-10-22 | 2011-05-10 | Microsoft Corporation | System and method for linguistic collation |
US7676476B2 (en) * | 2004-08-25 | 2010-03-09 | Microsoft Corporation | Data types with incorporated collation information |
US20060212449A1 (en) * | 2005-03-21 | 2006-09-21 | Novy Alon R J | Method and apparatus for generating relevance-sensitive collation keys |
CN100393071C (en) * | 2005-06-30 | 2008-06-04 | 杭州华三通信技术有限公司 | Method for configuring access control list and its application |
US8677377B2 (en) | 2005-09-08 | 2014-03-18 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US9318108B2 (en) | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
US8977255B2 (en) | 2007-04-03 | 2015-03-10 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US9053089B2 (en) | 2007-10-02 | 2015-06-09 | Apple Inc. | Part-of-speech tagging using latent analogy |
US8620662B2 (en) * | 2007-11-20 | 2013-12-31 | Apple Inc. | Context-aware unit selection |
US9330720B2 (en) | 2008-01-03 | 2016-05-03 | Apple Inc. | Methods and apparatus for altering audio output signals |
US8996376B2 (en) | 2008-04-05 | 2015-03-31 | Apple Inc. | Intelligent text-to-speech conversion |
US10496753B2 (en) | 2010-01-18 | 2019-12-03 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
JP5391583B2 (en) * | 2008-05-29 | 2014-01-15 | 富士通株式会社 | SEARCH DEVICE, GENERATION DEVICE, PROGRAM, SEARCH METHOD, AND GENERATION METHOD |
US20100030549A1 (en) | 2008-07-31 | 2010-02-04 | Lee Michael M | Mobile device having human language translation capability with positional feedback |
US9959870B2 (en) | 2008-12-11 | 2018-05-01 | Apple Inc. | Speech recognition involving a mobile device |
US8140517B2 (en) * | 2009-04-06 | 2012-03-20 | International Business Machines Corporation | Database query optimization using weight mapping to qualify an index |
US9858925B2 (en) | 2009-06-05 | 2018-01-02 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US10241644B2 (en) | 2011-06-03 | 2019-03-26 | Apple Inc. | Actionable reminder entries |
US10255566B2 (en) | 2011-06-03 | 2019-04-09 | Apple Inc. | Generating and processing task items that represent tasks to perform |
US10241752B2 (en) | 2011-09-30 | 2019-03-26 | Apple Inc. | Interface for a virtual digital assistant |
US9431006B2 (en) | 2009-07-02 | 2016-08-30 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US10679605B2 (en) | 2010-01-18 | 2020-06-09 | Apple Inc. | Hands-free list-reading by intelligent automated assistant |
US10553209B2 (en) | 2010-01-18 | 2020-02-04 | Apple Inc. | Systems and methods for hands-free notification summaries |
US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US10705794B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
DE202011111062U1 (en) | 2010-01-25 | 2019-02-19 | Newvaluexchange Ltd. | Device and system for a digital conversation management platform |
US8682667B2 (en) | 2010-02-25 | 2014-03-25 | Apple Inc. | User profiling for selecting user specific voice input processing information |
US10762293B2 (en) | 2010-12-22 | 2020-09-01 | Apple Inc. | Using parts-of-speech tagging and named entity recognition for spelling correction |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
US9509757B2 (en) * | 2011-06-30 | 2016-11-29 | Google Inc. | Parallel sorting key generation |
US8994660B2 (en) | 2011-08-29 | 2015-03-31 | Apple Inc. | Text correction processing |
US10134385B2 (en) | 2012-03-02 | 2018-11-20 | Apple Inc. | Systems and methods for name pronunciation |
US9483461B2 (en) | 2012-03-06 | 2016-11-01 | Apple Inc. | Handling speech synthesis of content for multiple languages |
US9280610B2 (en) | 2012-05-14 | 2016-03-08 | Apple Inc. | Crowd sourcing information to fulfill user requests |
US9721563B2 (en) | 2012-06-08 | 2017-08-01 | Apple Inc. | Name recognition system |
US9495129B2 (en) | 2012-06-29 | 2016-11-15 | Apple Inc. | Device, method, and user interface for voice-activated navigation and browsing of a document |
US9576574B2 (en) | 2012-09-10 | 2017-02-21 | Apple Inc. | Context-sensitive handling of interruptions by intelligent digital assistant |
US9547647B2 (en) | 2012-09-19 | 2017-01-17 | Apple Inc. | Voice-based media searching |
CN103827862B (en) * | 2012-09-20 | 2017-09-01 | 株式会社东芝 | Data processing equipment, data management system, data processing method |
KR20230137475A (en) | 2013-02-07 | 2023-10-04 | 애플 인크. | Voice trigger for a digital assistant |
US9368114B2 (en) | 2013-03-14 | 2016-06-14 | Apple Inc. | Context-sensitive handling of interruptions |
AU2014233517B2 (en) | 2013-03-15 | 2017-05-25 | Apple Inc. | Training an at least partial voice command system |
WO2014144579A1 (en) | 2013-03-15 | 2014-09-18 | Apple Inc. | System and method for updating an adaptive speech recognition model |
WO2014197334A2 (en) | 2013-06-07 | 2014-12-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
WO2014197336A1 (en) | 2013-06-07 | 2014-12-11 | Apple Inc. | System and method for detecting errors in interactions with a voice-based digital assistant |
US9582608B2 (en) | 2013-06-07 | 2017-02-28 | Apple Inc. | Unified ranking with entropy-weighted information for phrase-based semantic auto-completion |
WO2014197335A1 (en) | 2013-06-08 | 2014-12-11 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
EP3937002A1 (en) | 2013-06-09 | 2022-01-12 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
AU2014278595B2 (en) | 2013-06-13 | 2017-04-06 | Apple Inc. | System and method for emergency calls initiated by voice command |
DE112014003653B4 (en) | 2013-08-06 | 2024-04-18 | Apple Inc. | Automatically activate intelligent responses based on activities from remote devices |
US9620105B2 (en) | 2014-05-15 | 2017-04-11 | Apple Inc. | Analyzing audio input for efficient speech and music recognition |
US10592095B2 (en) | 2014-05-23 | 2020-03-17 | Apple Inc. | Instantaneous speaking of content on touch devices |
US9502031B2 (en) | 2014-05-27 | 2016-11-22 | Apple Inc. | Method for supporting dynamic grammars in WFST-based ASR |
US9430463B2 (en) | 2014-05-30 | 2016-08-30 | Apple Inc. | Exemplar-based natural language processing |
US9734193B2 (en) | 2014-05-30 | 2017-08-15 | Apple Inc. | Determining domain salience ranking from ambiguous words in natural speech |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US10289433B2 (en) | 2014-05-30 | 2019-05-14 | Apple Inc. | Domain specific language for encoding assistant dialog |
US10078631B2 (en) | 2014-05-30 | 2018-09-18 | Apple Inc. | Entropy-guided text prediction using combined word and character n-gram language models |
US9633004B2 (en) | 2014-05-30 | 2017-04-25 | Apple Inc. | Better resolution when referencing to concepts |
US9842101B2 (en) | 2014-05-30 | 2017-12-12 | Apple Inc. | Predictive conversion of language input |
AU2015266863B2 (en) | 2014-05-30 | 2018-03-15 | Apple Inc. | Multi-command single utterance input method |
US9760559B2 (en) | 2014-05-30 | 2017-09-12 | Apple Inc. | Predictive text input |
US9785630B2 (en) | 2014-05-30 | 2017-10-10 | Apple Inc. | Text prediction using combined word N-gram and unigram language models |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US10659851B2 (en) | 2014-06-30 | 2020-05-19 | Apple Inc. | Real-time digital assistant knowledge updates |
US10446141B2 (en) | 2014-08-28 | 2019-10-15 | Apple Inc. | Automatic speech recognition based on user feedback |
US9818400B2 (en) | 2014-09-11 | 2017-11-14 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US10789041B2 (en) | 2014-09-12 | 2020-09-29 | Apple Inc. | Dynamic thresholds for always listening speech trigger |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
US9886432B2 (en) | 2014-09-30 | 2018-02-06 | Apple Inc. | Parsimonious handling of word inflection via categorical stem + suffix N-gram language models |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US9646609B2 (en) | 2014-09-30 | 2017-05-09 | Apple Inc. | Caching apparatus for serving phonetic pronunciations |
US10552013B2 (en) | 2014-12-02 | 2020-02-04 | Apple Inc. | Data detection |
US9711141B2 (en) | 2014-12-09 | 2017-07-18 | Apple Inc. | Disambiguating heteronyms in speech synthesis |
US9865280B2 (en) | 2015-03-06 | 2018-01-09 | Apple Inc. | Structured dictation using intelligent automated assistants |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US9899019B2 (en) | 2015-03-18 | 2018-02-20 | Apple Inc. | Systems and methods for structured stem and suffix language models |
US9842105B2 (en) | 2015-04-16 | 2017-12-12 | Apple Inc. | Parsimonious continuous-space phrase representations for natural language processing |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US10127220B2 (en) | 2015-06-04 | 2018-11-13 | Apple Inc. | Language identification from short strings |
US10101822B2 (en) | 2015-06-05 | 2018-10-16 | Apple Inc. | Language input correction |
US9578173B2 (en) | 2015-06-05 | 2017-02-21 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US10255907B2 (en) | 2015-06-07 | 2019-04-09 | Apple Inc. | Automatic accent detection using acoustic models |
US10186254B2 (en) | 2015-06-07 | 2019-01-22 | Apple Inc. | Context-based endpoint detection |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US9697820B2 (en) | 2015-09-24 | 2017-07-04 | Apple Inc. | Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks |
US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
US10366158B2 (en) | 2015-09-29 | 2019-07-30 | Apple Inc. | Efficient word encoding for recurrent neural network language models |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10446143B2 (en) | 2016-03-14 | 2019-10-15 | Apple Inc. | Identification of voice inputs providing credentials |
US9934775B2 (en) | 2016-05-26 | 2018-04-03 | Apple Inc. | Unit-selection text-to-speech synthesis based on predicted concatenation parameters |
US9972304B2 (en) | 2016-06-03 | 2018-05-15 | Apple Inc. | Privacy preserving distributed evaluation framework for embedded personalized systems |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
DK179588B1 (en) | 2016-06-09 | 2019-02-22 | Apple Inc. | Intelligent automated assistant in a home environment |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
US10586535B2 (en) | 2016-06-10 | 2020-03-10 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10192552B2 (en) | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
US10490187B2 (en) | 2016-06-10 | 2019-11-26 | Apple Inc. | Digital assistant providing automated status report |
US10509862B2 (en) | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
DK179049B1 (en) | 2016-06-11 | 2017-09-18 | Apple Inc | Data driven natural language event detection and classification |
DK179343B1 (en) | 2016-06-11 | 2018-05-14 | Apple Inc | Intelligent task discovery |
DK179415B1 (en) | 2016-06-11 | 2018-06-14 | Apple Inc | Intelligent device arbitration and control |
DK201670540A1 (en) | 2016-06-11 | 2018-01-08 | Apple Inc | Application integration with a digital assistant |
US10043516B2 (en) | 2016-09-23 | 2018-08-07 | Apple Inc. | Intelligent automated assistant |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
DK201770439A1 (en) | 2017-05-11 | 2018-12-13 | Apple Inc. | Offline personal assistant |
DK179496B1 (en) | 2017-05-12 | 2019-01-15 | Apple Inc. | USER-SPECIFIC Acoustic Models |
DK179745B1 (en) | 2017-05-12 | 2019-05-01 | Apple Inc. | SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT |
DK201770431A1 (en) | 2017-05-15 | 2018-12-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
DK201770432A1 (en) | 2017-05-15 | 2018-12-21 | Apple Inc. | Hierarchical belief states for digital assistants |
DK179560B1 (en) | 2017-05-16 | 2019-02-18 | Apple Inc. | Far-field extension for digital assistant services |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA1265623A (en) * | 1987-06-11 | 1990-02-06 | Eddy Lee | Method of facilitating computer sorting |
CA1280215C (en) * | 1987-09-28 | 1991-02-12 | Eddy Lee | Multilingual ordered data retrieval system |
US5551018A (en) * | 1993-02-02 | 1996-08-27 | Borland International, Inc. | Method of storing national language support text by presorting followed by insertion sorting |
US5485373A (en) * | 1993-03-25 | 1996-01-16 | Taligent, Inc. | Language-sensitive text searching system with modified Boyer-Moore process |
US5440482A (en) * | 1993-03-25 | 1995-08-08 | Taligent, Inc. | Forward and reverse Boyer-Moore string searching of multilingual text having a defined collation order |
US5675818A (en) * | 1995-06-12 | 1997-10-07 | Borland International, Inc. | System and methods for improved sorting with national language support |
US5873111A (en) * | 1996-05-10 | 1999-02-16 | Apple Computer, Inc. | Method and system for collation in a processing system of a variety of distinct sets of information |
US6381616B1 (en) * | 1999-03-24 | 2002-04-30 | Microsoft Corporation | System and method for speeding up heterogeneous data access using predicate conversion |
-
2001
- 2001-05-31 US US09/872,552 patent/US6877003B2/en not_active Expired - Lifetime
-
2002
- 2002-05-22 AU AU2002311984A patent/AU2002311984A1/en not_active Abandoned
- 2002-05-22 WO PCT/US2002/016186 patent/WO2002097674A2/en active Application Filing
- 2002-05-22 CN CN02809865.XA patent/CN1531692B/en not_active Expired - Lifetime
- 2002-05-22 JP JP2003500784A patent/JP4685348B2/en not_active Expired - Lifetime
Non-Patent Citations (3)
Title |
---|
DAVIS M: "ICU Collation Design Documentation" INTERNET PUBLICATION, [Online] 31 May 2000 (2000-05-31), XP002249274 Retrieved from the Internet: <URL:http://oss.software.ibm.com/cvs/icu/~ checkout~/icuhtml/design/collation/ICU_col lation_design.htm?rev=1.11&content-type=te xt/plain> [retrieved on 2003-07-25] * |
DAVIS MARK ET AL: "UNICODE TECHNICAL STANDARD #10, UNICODE COLLATION ALGORITHM" UNICODE TECHNICAL REPORT, [Online] 23 March 2001 (2001-03-23), XP002249276 Retrieved from the Internet: <URL:http://www.unicode.org/reports/tr10/t r10-8.html> [retrieved on 2003-07-25] cited in the application * |
HO C ET AL: "Multilingual Collation and CJK Sorts in Oracle 9i " PROCEEDINGS OF 18TH INTERNATIONAL UNICODE CONFERENCE, [Online] 24 - 27 April 2001, XP002249275 HONG KONG Retrieved from the Internet: <URL:http://www.unicode.org/iuc/iuc18/pape rs.html> [retrieved on 2003-07-25] cited in the application * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006134332A (en) * | 2004-11-05 | 2006-05-25 | Microsoft Corp | Automated collation creation |
Also Published As
Publication number | Publication date |
---|---|
JP2005517221A (en) | 2005-06-09 |
WO2002097674A3 (en) | 2004-02-19 |
CN1531692B (en) | 2010-12-08 |
US6877003B2 (en) | 2005-04-05 |
CN1531692A (en) | 2004-09-22 |
JP4685348B2 (en) | 2011-05-18 |
AU2002311984A1 (en) | 2002-12-09 |
US20020184251A1 (en) | 2002-12-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6877003B2 (en) | Efficient collation element structure for handling large numbers of characters | |
US7877364B2 (en) | Method of storing and retrieving miniaturised data | |
TW312771B (en) | ||
US7185018B2 (en) | Method of storing and retrieving miniaturized data | |
US7512533B2 (en) | Method and system of creating and using chinese language data and user-corrected data | |
JPH026252B2 (en) | ||
TWI604318B (en) | Method of data sorting | |
US20050251519A1 (en) | Efficient language-dependent sorting of embedded numerics | |
JP2007042146A (en) | Method and system of creating and using chinese data and user-corrected data | |
US20080319982A1 (en) | Method and Apparatus for Manipulating Data Files | |
JP2001357031A (en) | Method and system for converting unicode text into mixed code page | |
EP1691298B1 (en) | Method and system of creating and using Chinese language data and user-corrected data | |
AU777314B2 (en) | A method of storing and retrieving miniaturised data | |
Afzal et al. | Urdu Computing Standards: Development of Urdu Zabta Takhti-WG2 N2413-2-SC2 N3589-2 (UZT) 1.01 | |
JP4061283B2 (en) | Apparatus, method and program for converting lexical data to data | |
JP2005275880A (en) | Device, method and program for converting word and phrase into data | |
Nishimura | The Retrieval System for the Stellar Bibliography in Japan | |
JPH03282961A (en) | Mutual conversion dictionary system | |
JP2000029879A (en) | Document retrieving device and controlling method | |
JPH01181123A (en) | Information retrieving device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG UZ VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2003500784 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 02809865X Country of ref document: CN |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
122 | Ep: pct application non-entry in european phase |