WO2012090033A1 - A system and a method for visually aided telephone calls - Google Patents

A system and a method for visually aided telephone calls Download PDF

Info

Publication number
WO2012090033A1
WO2012090033A1 PCT/IB2010/056151 IB2010056151W WO2012090033A1 WO 2012090033 A1 WO2012090033 A1 WO 2012090033A1 IB 2010056151 W IB2010056151 W IB 2010056151W WO 2012090033 A1 WO2012090033 A1 WO 2012090033A1
Authority
WO
WIPO (PCT)
Prior art keywords
video call
video
call
server
mobile phone
Prior art date
Application number
PCT/IB2010/056151
Other languages
French (fr)
Inventor
Oguz Demirci
Original Assignee
Turkcell Teknoloji Arastirma Ve Gelistirme Anonim Sirketi
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Turkcell Teknoloji Arastirma Ve Gelistirme Anonim Sirketi filed Critical Turkcell Teknoloji Arastirma Ve Gelistirme Anonim Sirketi
Priority to PCT/IB2010/056151 priority Critical patent/WO2012090033A1/en
Publication of WO2012090033A1 publication Critical patent/WO2012090033A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/142Image acquisition using hand-held instruments; Constructional details of the instruments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2250/00Details of telephonic subscriber devices
    • H04M2250/52Details of telephonic subscriber devices including functional features of a camera

Definitions

  • the United States patent number US2008233999 discloses a mobile station device includes a camera and a calling module that uses information regarding images captured by the camera for automatically placing a call from the mobile station device.
  • An exemplary mobile station device includes a camera that is configured to capture an image of something in a field of vision of the camera.
  • a calling module determines a number to call based on the captured image and automatically calls the number.
  • An exemplary method of communicating using a mobile station device equipped with a camera includes capturing an image with the camera. The captured image is used to determine a number and to automatically call the number.
  • At least one mobile phone (2) which includes at least one camera (21) and specific button (22) and which initiates a video call
  • Mobile phone (2) has a specific button (22) for initiating a video call towards server (5).
  • server (5) After recognizing of the text, server (5) obtains the phone number to be called (1505). Server (5) increases the value of "N” (1506) and controls the value of "N” is bigger than the value of "k” or not (1507). If “N” is not bigger than “k”, server (5) obtains a new video frame from the video call (1502).
  • server (5) finishes the operation (1508). If “N” is bigger than “k”, server (5) finishes the operation (1508). If the text is not recognized, server controls that is there any matched pattern or not (1509). If the pattern is matched, server (5) keeps that video frame (1504). After matching of the pattern, server (5) obtains the phone number to be called (1505). Server (5) increases the value of "N” (1506) and controls the value of "N” is bigger than the value of "k” or not (1507). If “N” is not bigger than “k”, server (5) obtains a new video frame from the video call (1502). If “N” is bigger than "k”, server (5) finishes the operation (1508).
  • server deletes that frame (1510) and obtains a new video frame from the video call (1502).

Abstract

This invention relates to a system and a method for visually aided telephone calls using OCR. The system (1) comprising at least one mobile phone (2) which includes at least one camera (21) and specific button (22) and which initiates a video call, at least one mobile phone (3) which takes a call coming from the other mobile phone (2), at least one carrier network (4) for providing wireless communication between the mobile phones (2 and 3), at least one server (5) which takes a video call coming from the mobile phone (2), employs an optical character recognition (OCR) algorithm on the transferred video frames using principal component analysis to discriminate characters and to extract, non-limiting, digits in the phone number format and directs the video call as an audio call to another mobile phone (3) according to the recognized phone number of the video call content.

Description

DESCRIPTION
A SYSTEM AND A METHOD FOR VISUALLY AIDED TELEPHONE
CALLS
Field of the invention
This invention relates to a system and a method for visually aided telephone calls using OCR (Optical character recognition).
Prior art According to some specific methods, pattern recognition is the assignment of some sort of output value to a given input value. Pattern recognition attempts to assign each input value to one of a given set of classes. This is an example of classification by using pattern recognition, (for example, determine whether a given email is "spam" or "non-spam")
However, pattern recognition is one of the most famous problems that encompasses other types of output as well. Other example is regression, which sets a real-valued output to each input. Another example sets a class to each member of a sequence of values (for example, part of speech tagging, which assigns a part of speech to each word in an input sentence) which is sequence labeling. Another example is parsing, which assigns a parse tree to an input sentence, defining the syntactic structure of the sentence.
Pattern recognition methods generally aim to provide a reasonable answer for all possible inputs to do "fuzzy" matching of inputs. Pattern matching methods look for correct matches in the input with pre-existing patterns as opposed to recognition methods. A general example of a pattern-matching method is regular expression matching, which looks for patterns of a given sort in textual data and text editors and word processors. The difference between pattern recognition and pattern matching is that pattern matching is generally not considered a type of machine learning, although pattern-matching methods can sometimes succeed in providing similar-quality output to the sort provided by pattern-recognition methods.
Many fields including psychology, psychiatry, cognitive science and computer science have studied various recognition methods.
The example describes pattern recognition and matching methods;
Optical character recognition (OCR) is the mechanical or electronic translation which can translate of scanned images of handwritten or printed text. OCR is usually used to convert books and documents into electronic files, to computerize a record-keeping system in an office, or to publish the text. OCR makes it possible to edit the text, search for a word or phrase, store it more compactly, display or print a copy free of scanning artifacts, and apply techniques such as machine translation, text-to-speech and text mining to it. OCR is a field of research in pattern recognition.
Microsoft Tag, 2D barcode readers, GetFugu, Kooba, Upcode, Layar, Junaio and Google Goggle are applications that try to recognize barcodes, icons/images usually in specific formats and help users access to digital content effortlessly. Though, none of these products processes video transferred over 3G networks and they do the processing in the mobile platform.
US2006182311A1, US2006182311A1, US2003012410A1, US2010008265A1, US6700990B1, WO2009114039A1, US7227893B1, US2006233423A1 are some of the patents examined. Some of these patent documents have been either written to solve problems like watermarking, surveillance, pose estimation or explained pattern recognition systems without giving details in the used algorithms and the specific purposes they were designed for. None of these patent documents mentions a telecommunication application connecting a caller that initiated a video call and is directed to the other party based on the recognized telephone number or pattern in the scene.
The United States patent number US2008233999 discloses a mobile station device includes a camera and a calling module that uses information regarding images captured by the camera for automatically placing a call from the mobile station device. An exemplary mobile station device includes a camera that is configured to capture an image of something in a field of vision of the camera. A calling module determines a number to call based on the captured image and automatically calls the number. An exemplary method of communicating using a mobile station device equipped with a camera includes capturing an image with the camera. The captured image is used to determine a number and to automatically call the number.
The United States patent number US6594503 discloses a communication device, such as a cellular mobile phone or a cordless phone, has a dial unit for communication with a base station. An input device for dialing a phone number which forwards a coded dial signal to the dial unit is implemented as an optical character recognition (OCR) scanner that reads phone numbers from a printed or hand-written original. Optionally, especially for recognizing hand-writing, the scanner sends graphical representations of input data to the base station that obtains the code dial signals by an external processor.
The Japan patent number JP6311220 discloses a device which can recognize an image showing the shape of the lips of a user and makes dialing possible. An image pickup part, feature extraction part, memory and shape recognition part are controlled by a CPU, features are extracted from the image showing the shape of the lips by the image pickup part, feature extraction part, memory and dictionary and recognizes as character data corresponding to extracted data, a data base is retrieved from the character data by a telephone number retrieval part, a telephone number corresponding to the character data is read out, and the telephone number is sent out by a sending part. When the image can not be recognized, it is instructed to the user by a recognition disable output part and when the telephone number can not be retrieved from the character data, it is instructed to the user by a retrieval disable output part. Summary of the invention
The object of the invention is to provide a system and a method that uses OCR ability to recognize phone numbers of the calling subscriber from video call content.
Further object of the invention is to provide a system and a method that directs phone calls to another subscriber according to the recognized phone number inthe video call content. Detailed description of the invention
"A system and a method for visually aided telephone calls" designed to fulfill the objects of the present invention is illustrated in the attached figures, where: Figure 1 - is the schematic view of the system.
Figure 2 - is the flow diagram of the method.
Figure 3 - is the flow diagram of the "extracting phone number to be called from video frames of the video call by using OCR algorithm" step of the method. The parts in the figure are each given a reference numeral where the numerals refer to the following:
1. System
2. Mobile phone
21. Camera
22. Button
3. Mobile phone
4. Carrier network
5. Server
1000. Method
Ul . Caller subscriber
U2. Calling subscriber A system (1) for visually aided telephone calls comprises;
- at least one mobile phone (2) which includes at least one camera (21) and specific button (22) and which initiates a video call,
- at least one mobile phone (3) which takes a call coming from the other mobile phone (2),
- at least one carrier network (4) for providing wireless communication between the mobile phones (2 and 3),
- at least one server (5) which takes a video call coming from the mobile phone (2), employs an optical character recognition (OCR) algorithm on the transferred video frames using principal component analysis to discriminate characters and to extract, non-limiting, digits in the phone number format and directs the video call as an audio call to another mobile phone (3) according to the recognized phone number of the video call content.
The system (1) applies image processing tools and then connects calling parties non-limiting, for audio calls to endpoints defined with printed phone numbers or patterns. This is achieved by processing video call content using image processing algorithms.
The server (5) employs an optical character recognition (OCR) algorithm on the transferred video frames using principal component analysis to discriminate characters and to extract, non-limiting, digits in the phone number format.
Mobile phone (2) has a specific button (22) for initiating a video call towards server (5).
Therefore, the system (1) enables a mobile phone (2) user (Ul) to use (a specific button (22)) a certain service number to initiate a video phone call and the call is directed to a certain mobile phone (3) based on the findings of pattern recognition and optical character recognition operations.
Using the system (1) and a specific button (22), even the blind users (Ul) can initiate phone calls without having to specifically read and dial the telephone numbers or perceive the figures printed in posters. The user (Ul) can easily initiate a video call using the previously defined button (22) and direct his camera (21) to the scene either with a figure or a printed telephone number. In the first scenario, the figure is recognized using the pattern recognition algorithm and an audio call is set up between the calling subscriber (U2) and the number associated with the identified figure. In the second scenario, the telephone number is recognized using the optical character recognition algorithm that uses principal component analysis and audio call is set up between the two ends.
In the preferred embodiment of the invention, carrier network (4) is a UMTS (Universal Mobile Telecommunications System) network. Mobile phone (2) users (Ul) dial various set of numbers to connect their calls to different endpoints. Many companies try to obtain numbers that are easy to remember and be accessed so that customers can reach them more easily. As an alternative method, without having to deal with the number to dial, a certain video call number can be used to initiate a video call and the video content transferred form the subscriber to the service provider's servers (5) can be processed to establish the connection. The same video call number can be used for multiple companies. Some merchants may not want to declare their telephone number but just use figures to be called and certain patterns can be directed to different numbers at different times with modifications only on the service provider side.
As in another scenario, the same video call number can be used with a scene including a visually noticeable telephone number. The video content provided by the user (Ul) is transferred to service provider's servers (5) and processed with the optical character recognition (OCR). The telephone number to be called is extracted. The audio call between the user (Ul) and the extracted number is established afterwards. The system (1) provides effortless connection between the users (Ul and U2) and even visually impaired people can initiate phone calls without having to deal with the telephone number. Besides, a dedicated button (22) can be defined on mobile phones (2) to be used with the video call service.
One of the optimized and preferred ways of dialing numbers is clicking on them on touch screen phones when they are available in websites, emails or text messages. The designed system (1) is an extension of the service that extends the abilities to printed telephone numbers in the environment.
Video call services built on 3G networks can also be used unilaterally for a mobile phone (2) user (Ul) to communicate with the telephone service provider and its servers (5). After the user (Ul) initiates the video call, the video of the scene is transferred over the network (4) to service provider's servers (5). In the first of this invention's scenarios where the pattern in the scene is matched one of the figures in the server's (5) library, the system (1) uses the pattern recognition algorithm. After it has been decided that one of the figures in the library exists in the scene, audio call between the caller and the number associated with the pattern is established. Here, in this invention the system (1) explains the optical character recognition algorithm to detect the phrases or the telephone numbers in the scene. A method (1000) for visually aided telephone calls comprises the steps of;
- initiating a video call towards a server (5) (1100),
- directing a mobile phone's (2) camera (21) to a scene (1200),
- establishing a connection (1300),
- directing the video call to the server (5) (1400),
- extracting a phone number to be called from video frames of the video call by using OCR algorithm (1500),
- establishing an audio call between mobile phones (2 and 3) (1600) (Figure 2)· The system (1) for visually aided telephone calls comprising;
- a mobile phone (2) which is adapted to perform for initiating a video call towards a server,
- carrier network (4) which is adapted to perform for establishing a connection,
- carrier network (4) which is adapted to perform for directing the video call to the server (5),
- a server (5) which is adapted to perform for extracting a phone number to be called from video frames of the video call by using OCR algorithm,
- a server (5) establishing an audio call between mobile phones (2 and 3). In the method (1000) firstly mobile phone (2) user (Ul) initiates a video call towards a server (5). After initiating a video call, user (Ul) directs his mobile phone's (2) camera (21) to a scene (1200). This scene can be everything in the environment; like newspaper, picture or guide etc. For initiating a video call, user (Ul) only presses the specific button (22) at the mobile phone (2) which can start video call towards the server (5).
After this, carrier network (4) establishes a connection (1300) between mobile phone (2) and server (5) and directs video call to the server (5) (1400). After video call reaches to the server (5), server (5) starts to detect a telephone number in the video frames of the video call. For achieving this; server (5) compare the video frames with the images stored in its (5) library.
With comparing the video frames with the images stored in the library, server (5) extracts a phone number to be called from video frames of the video call by using OCR algorithm (1500).
Extracting a phone number to be called from video frames of the video call by using OCR algorithm (1500) comprises the sub-steps of;
- assigning a zero value to the "N"; where "N" is an achieved match number
(1501),
- obtaining video frame from the video call (1502),
- controlling recognizing of a text (1503),
- if the text is recognized, keeping that video frame (1504),
- obtaining the phone number to be called (1505),
- increasing the value of "N" (1506),
- controlling the value of "N" is bigger than the value of "k"; where "k" is an predetermined repetition match value for correct matching (1507),
- if "N" is not bigger than "k", obtaining a new video frame from the video call (1502) (in other words, going to the step 1502), - if "N" is bigger than "k", finishing the operation (1508),
- if the text is not recognized, controlling the matching of a pattern (1509),
- if the pattern is matched, keeping that video frame (1504) (in other words, going to the step 1504),
- if the pattern is not matched, deleting that frame (1510),
- obtaining a new video frame from the video call (1502) (in other words, going to the step 1502).
The system (1) for visually aided telephone calls comprising a server (5) which is adapted to perform for;
- assigning a zero value to the "N"; where "N" is an achieved match number,
- obtaining video frame from the video call,
- controlling recognizing of a text,
- if the text is recognized, keeping that video frame,
- obtaining the phone number to be called,
- increasing the value of "N",
- controlling the value of "N" is bigger than the value of "k"; where "k" is an predetermined repetition match value for correct matching,
- if "N" is not bigger than "k", obtaining a new video frame from the video call,
- if "N" is bigger than "k", finishing the operation,
- if the text is not recognized, controlling the matching of a pattern,
- if the pattern is matched, keeping that video frame,
- if the pattern is not matched, deleting that frame,
- obtaining a new video frame from the video call.
For extracting a phone number to be called from video frames of the video call, firstly the server (5) assigns a zero value to the "N"; where "N" is an achieved match number (1501). After that, server (5) obtains a first video frame from the video call (1502). After obtaining the video frame (1502), server (5) controls that is there any recognized text or not (1503). If the text is recognized, server keeps that video frame (1504).
After recognizing of the text, server (5) obtains the phone number to be called (1505). Server (5) increases the value of "N" (1506) and controls the value of "N" is bigger than the value of "k" or not (1507). If "N" is not bigger than "k", server (5) obtains a new video frame from the video call (1502).
If "N" is bigger than "k", server (5) finishes the operation (1508). If the text is not recognized, server controls that is there any matched pattern or not (1509). If the pattern is matched, server (5) keeps that video frame (1504). After matching of the pattern, server (5) obtains the phone number to be called (1505). Server (5) increases the value of "N" (1506) and controls the value of "N" is bigger than the value of "k" or not (1507). If "N" is not bigger than "k", server (5) obtains a new video frame from the video call (1502). If "N" is bigger than "k", server (5) finishes the operation (1508).
If the pattern is not matched, server deletes that frame (1510) and obtains a new video frame from the video call (1502).
The text recognizing operation consists of two stages, Training and Execution. Training stage is carried out offline and the principal component space is generated to be used later, during the call. During the training stage, server (5) is adapted to perform for; - applying median filtering,
- applying normalization,
- detecting lines,
- extracting characters and generating character images,
- applying Radon transform,
- generating observation matrix, X, using the character image and its Radon transform,
- applying eigen-decomposition on the covariance matrix,
- evaluating the first p eigenvalues that include at least 95% of the variance, - generating the principal component space using the first p eigenvectors.
During the line detection stage of the training, server (5) searches for the threshold that minimizes the intra-class variance, defined as a weighted sum of variances of the two classes (black and white pixels in the binary image) and applies Otsu thresholding. Then, a skew correction is applied if necessary. On the corrected image, server (5) obtains histogram of the black pixels using horizontal projection of the image, and uses the end of the histograms to detect top and bottom of the lines as shown in Picture 1. A similar algorithm is applied to extract characters in the image at each line.
Picture 1 - An example of detecting top and bottom of the lines plary embodiments of the
rging customers, based onTop
w exemplary system and -4
isers for use of the systemBottc
a, such as integration of
service (paid indirectly by Server (5) then generates a sample set of character images each with 16x16 pixels. These character images are used in Radon transform to generate corresponding Radon transform images and transformed images are also interpolated to images of 16x16. It can be assumed that total number of character samples is n and there are k different characters to be differentiated from each other (0, 1, 2, 9, a, b, c,..., z), each provides ¾ samples in the total set of n character images.
£=i
¾: number of images that correspond to 0.
n2: number of images that correspond to 1.
n3: number of images that correspond to 2.
nii: number of images that correspond to a.
ni2: number of images that correspond to b, etc.
Server (5) then vectorizes each of these n sample images and their Radon transforms to generate vectors of 256x1x2 (m=512). Average of the n character
Figure imgf000015_0001
Server (5) can combine the n vectorized character images to generate the observation matrix, X, which will be used to generate the covariance matrix.
Figure imgf000015_0002
Mean removed observation matrix is evaluated as follows,
Figure imgf000016_0001
li i]
This matrix, which is composed of individual vectorized and mean removed character images and their Radon transforms, are used to generate the covariance matrix, C. The covariance matrix is then used in the ei gen-decomposition to obtain eigenevalues and eigenvectors.
Q includes the eigenvectors (combination of regular and Radon transformed images) in its columns, and Λ includes the eigenvalues in the diagonal. The first p columns of Q are used to generate the eigenspace, where p is determined based on the first p eigenvalues (diagonal of Λ) that has at least 95% of the variance. These p directions, eigenvectors, indeed are the directions that maximize the variance of the distribution.
Each of the n vectors (character images in the training set) is then projected onto p eigenvectors with dot product operations and character images are represented as points in the principal component (PC) space. These n points belong to k different groups, each representing a different character (a or b or 5, etc.). Centroids of these k groups can be evaluated to represent the average of the character in the principal component space as shown in Graphic 1.
During the execution stage, server (5) repeats the line detection and character extraction steps as in the training stage. After the transfer of the video content including the text for analysis, the lines and the characters are extracted. The character images are put in vectors of 256x1 format. A particular character image is then projected on the selected p eigenvectors. This enables to represent the character to be determined in the principal component space.
Graphic 1 - Representation of n character images in the PC space.
Figure imgf000017_0001
Server (5) can then measure the distance of this point to the centroids of each group (candidate character) and determine the group it belongs to picking the closest distance. The new character image is the character with the closest centroid.
The same operation is repeated this for every extracted character and the decision is based on the projection of the character image and where it corresponds to in the PC space. The overall processing stages of the method (1000) are summarized in Figure 3. As it is indicated in this diagram, the same operation can be repeated on the neighboring frames until the same decision is repeated k times (3, e.g.) and we are certain that the character is predicted correctly.
Based on the extracted text phrase or the telephone number, the audio call connection between the caller and the number is established using the Parlay X system. After extracting a phone number to be called from video frames of the video call by using OCR algorithm (1500), server (5) establishes an audio call between mobile phones (2 and 3) (1600).
Within the scope of this basic concept, it is possible to develop various embodiments of the inventive a system (1) and a method (1000) for visually aided telephone calls. The invention cannot be limited to the examples described herein; it is essentially according to the claims.

Claims

1. The method (1000) for visually aided telephone calls characterized by the steps of;
- initiating a video call towards a server (5) (1100),
- directing a mobile phone's (2) camera (21) to a scene (1200),
- establishing a connection (1300),
- directing the video call to the server (5) (1400),
- extracting a phone number to be called from video frames of the video call by using OCR algorithm (1500),
- establishing an audio call between mobile phones (2 and 3) (1600).
2. The method (1000) according to claim 1, characterized by the step of extracting a phone number to be called from video frames of the video call by using OCR algorithm (1500) comprising the sub-steps of;
- assigning a zero value to the "N"; where "N" is an achieved match number (1501),
- obtaining video frame from the video call (1502),
- controlling recognizing of a text (1503),
- if the text is recognized, keeping that video frame (1504),
- obtaining the phone number to be called (1505),
- increasing the value of "N" (1506),
- controlling the value of "N" is bigger than the value of "k"; where "k" is an predetermined repetition match value for correct matching (1507),
- if "N" is not bigger than "k", obtaining a new video frame from the video call (1502) (in other words, going to the step 1502),
- if "N" is bigger than "k", finishing the operation (1508),
- if the text is not recognized, controlling the matching of a pattern (1509),
- if the pattern is matched, keeping that video frame (1504) (in other words, going to the step 1504), - if the pattern is not matched, deleting that frame (1510),
- obtaining a new video frame from the video call (1502) (in other words, going to the step 1502).
3. The method (1000) according to claim 1 or claim 2, characterized by text recognizing operation consists of two stages which are called Training and Execution. Training stage is carried out offline and the principal component space is generated to be used later, during the call.
4. The method (1000) according to claim 3, characterized by training stage which comprises the steps of;
- applying median filtering,
- applying normalization,
- detecting lines,
- extracting characters and generating character images,
- applying Radon transform,
- generating observation matrix, X, using the character image and its Radon transform,
- applying eigen-decomposition on the covariance matrix,
- evaluating the first p eigenvalues that include at least 95% of the variance,
- generating the principal component space using the first p eigenvectors.
5. The method (1000) according to claim 4, characterized by execution stage which repeats the line detection and character extraction steps as in the training stage.
6. A system (1) for visually aided telephone calls comprising;
- at least one mobile phone (2) which includes at least one camera (21) and which initiates a video call, - at least one mobile phone (3) which takes a call coming from the other mobile phone (2),
- at least one carrier network (4) for providing wireless communication between the mobile phones (2 and 3) and characterized by
- at least one server (5) which takes a video call coming from the mobile phone (2), employs an optical character recognition (OCR) algorithm on the transferred video frames using principal component analysis to discriminate characters and to extract, non-limiting, digits in the phone number format and directs the video call as an audio call to another mobile phone (3) according to the recognized phone number of the video call content.
7. The system (1) according to claim 6, characterized by the mobile phone (2) which has a specific button (22) for initiating a video call towards server (5).
8. The system (1) according to claim 6 or claim 7, characterized by the carrier network (4) which is a UMTS (Universal Mobile Telecommunications System) network.
9. The system (1) according to claim 6 to claim 8, characterized by
- the mobile phone (2) which is adapted to perform for initiating a video call towards a server,
- the carrier network (4) which is adapted to perform for establishing a connection and directing the video call to the server (5),
- the server (5) which is adapted to perform for extracting a phone number to be called from video frames of the video call by using OCR algorithm and establishing an audio call between mobile phones (2 and 3).
10. The system (1) according to claim 6 to claim 9, characterized by the server (5) which is adapted to perform for;
- assigning a zero value to the "N"; where "N" is an achieved match number, - obtaining video frame from the video call,
- controlling recognizing of a text,
- if the text is recognized, keeping that video frame,
- obtaining the phone number to be called,
- increasing the value of "N",
- controlling the value of "N" is bigger than the value of "k"; where "k" is an predetermined repetition match value for correct matching,
- if "N" is not bigger than "k", obtaining a new video frame from the video call,
- if "N" is bigger than "k", finishing the operation,
- if the text is not recognized, controlling the matching of a pattern,
- if the pattern is matched, keeping that video frame,
- if the pattern is not matched, deleting that frame,
- obtaining a new video frame from the video call.
11. The system (1) according to claim 6 to claim 10, characterized by the server (5) which is adapted to perform for;
- detecting lines,
- extracting characters,
- generating observation matrix, X,
- applying eigen-decomposition on the covariance matrix,
- evaluating the first p eigenvalues that include at least 95% of the variance,
- generating the principal component space using the first p eigenvectors.
PCT/IB2010/056151 2010-12-31 2010-12-31 A system and a method for visually aided telephone calls WO2012090033A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/IB2010/056151 WO2012090033A1 (en) 2010-12-31 2010-12-31 A system and a method for visually aided telephone calls

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/IB2010/056151 WO2012090033A1 (en) 2010-12-31 2010-12-31 A system and a method for visually aided telephone calls

Publications (1)

Publication Number Publication Date
WO2012090033A1 true WO2012090033A1 (en) 2012-07-05

Family

ID=44114529

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2010/056151 WO2012090033A1 (en) 2010-12-31 2010-12-31 A system and a method for visually aided telephone calls

Country Status (1)

Country Link
WO (1) WO2012090033A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2821934A1 (en) * 2013-07-03 2015-01-07 Open Text S.A. System and method for optical character recognition and document searching based on optical character recognition
US9342533B2 (en) 2013-07-02 2016-05-17 Open Text S.A. System and method for feature recognition and document searching based on feature recognition

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06311220A (en) 1993-04-21 1994-11-04 Kyocera Corp Image recognizing dialer
US20030012410A1 (en) 2001-07-10 2003-01-16 Nassir Navab Tracking and pose estimation for augmented reality using real features
US6594503B1 (en) 2000-02-02 2003-07-15 Motorola, Inc. Communication device with dial function using optical character recognition, and method
US6700990B1 (en) 1993-11-18 2004-03-02 Digimarc Corporation Digital watermark decoding method
WO2006002706A1 (en) * 2004-06-25 2006-01-12 Sony Ericsson Mobile Communications Ab Mobile terminals, methods, and program products that generate communication information based on characters recognized in image data
US20060142054A1 (en) * 2004-12-27 2006-06-29 Kongqiao Wang Mobile communications terminal and method therefor
US20060182311A1 (en) 2005-02-15 2006-08-17 Dvpv, Ltd. System and method of user interface and data entry from a video call
US20060233423A1 (en) 2005-04-19 2006-10-19 Hesam Najafi Fast object detection for augmented reality systems
US7227893B1 (en) 2002-08-22 2007-06-05 Xlabs Holdings, Llc Application-specific object-based segmentation and recognition system
US20080094496A1 (en) * 2006-10-24 2008-04-24 Kong Qiao Wang Mobile communication terminal
US20080233999A1 (en) 2007-03-21 2008-09-25 Willigenburg Willem Van Image recognition for placing a call
WO2009114039A1 (en) 2008-03-14 2009-09-17 Sony Ericsson Mobile Communications Ab Enhanced video telephony through augmented reality
US20100008265A1 (en) 2008-07-14 2010-01-14 Carl Johan Freer Augmented reality method and system using logo recognition, wireless application protocol browsing and voice over internet protocol technology

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06311220A (en) 1993-04-21 1994-11-04 Kyocera Corp Image recognizing dialer
US6700990B1 (en) 1993-11-18 2004-03-02 Digimarc Corporation Digital watermark decoding method
US6594503B1 (en) 2000-02-02 2003-07-15 Motorola, Inc. Communication device with dial function using optical character recognition, and method
US20030012410A1 (en) 2001-07-10 2003-01-16 Nassir Navab Tracking and pose estimation for augmented reality using real features
US7227893B1 (en) 2002-08-22 2007-06-05 Xlabs Holdings, Llc Application-specific object-based segmentation and recognition system
WO2006002706A1 (en) * 2004-06-25 2006-01-12 Sony Ericsson Mobile Communications Ab Mobile terminals, methods, and program products that generate communication information based on characters recognized in image data
US20060142054A1 (en) * 2004-12-27 2006-06-29 Kongqiao Wang Mobile communications terminal and method therefor
US20060182311A1 (en) 2005-02-15 2006-08-17 Dvpv, Ltd. System and method of user interface and data entry from a video call
US20060233423A1 (en) 2005-04-19 2006-10-19 Hesam Najafi Fast object detection for augmented reality systems
US20080094496A1 (en) * 2006-10-24 2008-04-24 Kong Qiao Wang Mobile communication terminal
US20080233999A1 (en) 2007-03-21 2008-09-25 Willigenburg Willem Van Image recognition for placing a call
WO2009114039A1 (en) 2008-03-14 2009-09-17 Sony Ericsson Mobile Communications Ab Enhanced video telephony through augmented reality
US20100008265A1 (en) 2008-07-14 2010-01-14 Carl Johan Freer Augmented reality method and system using logo recognition, wireless application protocol browsing and voice over internet protocol technology

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MIROSLAW MICIAK: "RADON TRANSFORMATION AND PRINCIPAL COMPONENT ANALYSIS METHOD APPLIED IN POSTAL ADDRESS RECOGNITION TASK", INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND APPLICATIONS, vol. 7, no. 3, 2010, pages 33 - 44, XP055000545, Retrieved from the Internet <URL:http://www.tmrfindia.org/ijcsa/v7i34.pdf> [retrieved on 20110615] *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9342533B2 (en) 2013-07-02 2016-05-17 Open Text S.A. System and method for feature recognition and document searching based on feature recognition
US9563690B2 (en) 2013-07-02 2017-02-07 Open Text Sa Ulc System and method for feature recognition and document searching based on feature recognition
US10031924B2 (en) 2013-07-02 2018-07-24 Open Text Sa Ulc System and method for feature recognition and document searching based on feature recognition
US10282374B2 (en) 2013-07-02 2019-05-07 Open Text Sa Ulc System and method for feature recognition and document searching based on feature recognition
EP2821934A1 (en) * 2013-07-03 2015-01-07 Open Text S.A. System and method for optical character recognition and document searching based on optical character recognition

Similar Documents

Publication Publication Date Title
Hamad et al. A detailed analysis of optical character recognition technology
US8774453B2 (en) Method and arrangement for retrieving information comprised in a barcode
EP1398726B1 (en) Apparatus and method for recognizing character image from image screen
Luo et al. Design and implementation of a card reader based on build-in camera
US20100215261A1 (en) Apparatus and method for improving text recognition capability
US11367310B2 (en) Method and apparatus for identity verification, electronic device, computer program, and storage medium
Demilew et al. Ancient Geez script recognition using deep learning
KR20220106842A (en) Facial expression recognition method and apparatus, device, computer readable storage medium, computer program product
Hung et al. Implementing an android application for automatic vietnamese business card recognition
CN106709488A (en) Business card identification method and device
Raj et al. Devanagari text extraction from natural scene images
KR100593986B1 (en) Device and method for recognizing character image in picture screen
Shehu et al. Character recognition using correlation & hamming distance
Jalilian et al. Persian sign language recognition using radial distance and Fourier transform
Bae et al. Character recognition system for cellular phone with camera
WO2012090033A1 (en) A system and a method for visually aided telephone calls
JP4800144B2 (en) Character string determination device, character string determination method, character string determination program, and computer-readable recording medium
Sourvanos et al. Challenges in input preprocessing for mobile OCR applications: A realistic testing scenario
JP2005182772A (en) Character recognition device, program and recording medium
KR101048399B1 (en) Character detection method and apparatus
Jayashree et al. Voice based application as medicine spotter for visually impaired
CN111382408A (en) Intelligent user identification method and device and computer readable storage medium
Bari et al. Android based object recognition and motion detection to aid visually impaired
Somyat et al. Thai Lottery Number Reader App for Blind Lottery Ticket Sellers
Patil et al. Sign Language Recognition System

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10809190

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2013/07813

Country of ref document: TR

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10809190

Country of ref document: EP

Kind code of ref document: A1