WO2004057450A1 - Hand pointing apparatus - Google Patents

Hand pointing apparatus Download PDF

Info

Publication number
WO2004057450A1
WO2004057450A1 PCT/EP2002/014739 EP0214739W WO2004057450A1 WO 2004057450 A1 WO2004057450 A1 WO 2004057450A1 EP 0214739 W EP0214739 W EP 0214739W WO 2004057450 A1 WO2004057450 A1 WO 2004057450A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
pointing
hand
cameras
location
Prior art date
Application number
PCT/EP2002/014739
Other languages
French (fr)
Inventor
Alberto Del Bimbo
Alessandro Valli
Carlo Colombo
Original Assignee
Universita' Degli Studi Di Firenze
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Universita' Degli Studi Di Firenze filed Critical Universita' Degli Studi Di Firenze
Priority to EP02796729A priority Critical patent/EP1579304A1/en
Priority to PCT/EP2002/014739 priority patent/WO2004057450A1/en
Priority to AU2002361212A priority patent/AU2002361212A1/en
Publication of WO2004057450A1 publication Critical patent/WO2004057450A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/0304Detection arrangements using opto-electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/041Digitisers, e.g. for touch screens or touch pads, characterised by the transducing means
    • G06F3/042Digitisers, e.g. for touch screens or touch pads, characterised by the transducing means by opto-electronic means
    • G06F3/0425Digitisers, e.g. for touch screens or touch pads, characterised by the transducing means by opto-electronic means using a single imaging device like a video camera for tracking the absolute position of a single or a plurality of objects with respect to an imaged reference surface, e.g. video camera imaging a display or a projection screen, a table or a wall surface, on which a computer generated image is displayed or projected

Definitions

  • This inventions refers to a hand pointing detection apparatus for determining a specific location pointed at by the user.
  • Human-machine interfaces enabling the transfer of information between the user and the system represent a field of growing importance.
  • Human-machine interfaces enable bi-directional communication; on the one side input devices allow users to send commands to the system, on the other side output devices provide users with both responses to commands and feedback about user actions.
  • keyboard, mouse, touch screens are typical input devices while display, loudspeakers and printers are output devices.
  • An important drawback of the most common input devices descends from the physical contact of the user with some of their mechanical parts that ends up wearing the device out.
  • these kinds of input devices require to be close to the PC making it difficult, for the user, to input data when distant from the computer .
  • a certain degree of training and familiarity with the device is required to the user for an efficient use of the device itself.
  • vision-based hand pointing systems appear to be particularly promising. These systems are typically based on a certain number of cameras, a video projector, a screen and a data processing system like a personal computer. The cameras are located so as to have the user and the screen in view; the system output is displayed by the projector onto the screen whose locations can be pointed at by the user. The presence of the screen is not necessary, the user pointing action can be detected even if it's related to objects located in a closed space (i.e. appliances in a room) or in an open one (i.e.
  • the present invention overcomes the above drawbacks introducing a method and an apparatus for the detection of hand pointing of a user based on standard, low cost hardware equipment.
  • This method and apparatus is independent of the number of cameras used, the minimum number being two, and no constraints are set on cameras placement, save that the user must be in view of at least two cameras.
  • the user is allowed to move freely while pointing and the system is independent of environmental changes and user position.
  • users of the apparatus described in the present invention are not requested to calibrate the system before interacting with it, since self-calibration at run time ensures adaptation to user characteristics such as physical dimensions and pointing style.
  • Fig. 1 is an overview of a typical embodiment of the present invention.
  • Fig. 2 shows how the location of the point P pointed at by the user is calculated as the intersection of the screen plane and the line L described by the user's pointing arm.
  • Fig. 3 is a block diagram of the algorithm followed by the data processing unit to detect the hand pointing action.
  • Fig. 4 is the flowchart of the "Background Learning” step of the algorithm.
  • Fig. 5 is the flowchart of the "Calibration” step of the algorithm.
  • Fig. 6 is the flowchart of the "User Detection” step of the algorithm.
  • Fig. 7 is the flowchart of the "Lighting Adaptation” step of the algorithm.
  • Fig. 8 is the flowchart of the "User Localization” step of the algorithm.
  • FIG. 9 is the flowchart of the "Re-mapping” step of the algorithm.
  • Fig. 10 is the flowchart of the "Selection” step of the algorithm.
  • Fig. 11 is the flowchart of the "Adaptation” step of the algorithm.
  • Fig. 1 A preferred embodiment of the present invention is depicted in Fig. 1 where we can see the systems components:
  • a personal computer (23) that processes data received by the cameras and turns them into interaction parameters and then into commands for its graphical interface.
  • An image projector (24) driven by the graphical interface of the personal computer.
  • the projector illuminates the screen (22) pointed at by the user (21 ).
  • Graphic Interface operation is based on both spatial and temporal analysis of user action.
  • the screen location P currently pointed at by the user is continuously evaluated as the intersection of the pointing direction with the screen plane. From each and every acquisition of the systems cameras the position of the head and of the pointing are arm of the user are detected and input to the next processing phase based on a stereo triangulation algorithm.
  • the system monitors persistency: as point P is detected on a limited portion of the screen for an appropriate amount of time, a discrete event, similar to a mouse click, i.e.
  • a selection action is generated for the interface.
  • the overall interaction system behavior is that of a one-button mouse, whose "drags” and “clicks” reflect respectively changes and fixations of interest as communicated by the user through his natural hand pointing actions.
  • the operation of the hand pointing system described in the present invention can be sketched as in Fig. 3. After the cameras have acquired the images of the user, said images are transferred to the PC that processes them following three distinct operational steps: Initialization (200) , Feature Extraction (201 ) and Runtime (203) where the Feature Extraction is a procedure that is used by both the other two phases since it is the one that is able to understand where, in the images, the head and the arm of the user are located.
  • Initialization 200
  • Feature Extraction a procedure that is used by both the other two phases since it is the one that is able to understand where, in the images, the head and the arm of the user are located.
  • the Initialization is composed by two sub-steps: a phase of Background Learning (A) and a phase of Calibration (B).
  • the Background Learning is described in Fig. 4.
  • a number N of frames are chosen for background modeling.
  • the N frames acquired by the cameras are input to the PC (100) and then, for each chromatic channel the mean value and the variance are calculated at each pixel (101 ).
  • the mean value and the variance of the three color channels at each pixel of the background images are calculated (103).
  • the Initialization phase proceeds with the Calibration step (Fig. 3 - B) that will be described later on.
  • the next operational step is called Feature Extraction (201 ) at the end of which the system will acquire the information regarding the possible presence of a user in the cameras view field and his possible pointing action.
  • the Feature Extraction starts with the phase of User Detection (Fig. 6).
  • the current frame is acquired by the camcorders (100) then the background image previously detected is subtracted from the acquired frame (104).
  • the difference image current frame minus background image
  • variance 105
  • the calculated value is then compared to an appropriate threshold value X to decide if the pixel under consideration belongs to the background (calculated difference is less than X) or to the foreground (calculated difference is greater than X).
  • X an appropriate threshold value
  • the system updates its parameters basing on the light level of the actual frame acquired by the cameras.
  • the statistics of the background pixels are thus recalculated in terms of mean value and variance (107).
  • the system updates the thresholds used for the image binarization during the previous step.
  • the number of isolated foreground pixels is computed (108), in order to estimate the noise level of the CCD (charge coupled device) sensor of the camera and consequently update the threshold values (109) used to binarize the image in order to dynamically adjust system sensitiveness.
  • the threshold values used to binarize the image in order to dynamically adjust system sensitiveness.
  • the updated parameters that the system will use to compute the image binarization of the next frame acquired and the binary mask used at the previous cycle of acquisition is refined by topological filters (Fig. 6 - 110).
  • the user presence is classified by his shape and eventually detected (112).
  • the User Localization (Fig. 8) step is started and it is carried on through two different and parallel processes.
  • the silhouette of the user is estimated by detecting the user head and the user arm position (115) through the use of the binary mask previously computed and of geometrical heuristics.
  • the system detects the color of the detected user shape, to determine the zone of exposed skin internal to it. This process runs through several sub-steps: first the foreground is split up into skin and non-skin parts (116) applying the binary mask previously computed and a skin color model to the image acquired by the cameras.
  • the detected skin parts are aggregated into connected blobs (117) and then again the user head and arm are identified by the use of geometrical heuristics (115).
  • the results of the above described estimation are filtered by a smoothing filter (118) and a predictor (119) to reach the final estimate of the color based user localization step.
  • the shape based estimate and the color based estimate are then combined (120) and the coefficients of the image line on each single image acquired are finally determined (121 ), where the image line is the line ideally connecting the head and the hand of the user and represents the pointing direction.
  • the next step, once the pointing direction for every single frame is determined, is called runtime processing (Fig. 3 - 202).
  • the first sub-step of this phase is called Remapping (Fig. 9).
  • the system described in the present invention determines, in the way described above, as many lines as the cameras employed (li, dx; li, sx). These lines, together with the point the cameras are located at (Cdx, Csx), determine a plane in the real 3D space ( ⁇ p.dx; rip.sx). Each one of these planes determines in turn a screen line (Ip.dx; Ip.sx) as their intersection with the plane of the screen (IT) pointed at by the user. The point to be determined is thus the intersection P of these screen lines.
  • the remapping phase starts with the computing of the screen lines described above (122) one each iteration (123). Once the screen lines are all determined the location the user is pointing at is determined as the pseudo- intersection of the screen lines (124).
  • the system After remapping, the system enters the phase of Selection (Fig. 3 - G). With reference to Fig. 10, the screen point detected at the end of the previous phase is recorded (125), then its position is periodically checked with respect to a certain radius R (126, 127, 128 and 129) to determine if the point maintains the same position for a time that is recognized to be enough to show a pointing action by the user and as a consequence the system performs a "clicking" action in response to the persisting pointing action by the user.
  • the current screen point (130) represents the input datum for the following phase of Adaptation by which the system is trained to work with different users.
  • the system calibration parameters are recomputed by optimization (132).
  • the Calibration phase is displayed in detail in Fig. 5.
  • the PC drives the projector to show on the screen the calibration points (133) that have to be pointed at by the user, then the image line coefficients coming from the phase of User Localization are recorded too (134) and these steps are taken for each one of the K points chosen for the calibration (135).
  • a new set of optimised system calibration parameters is estimated (136).
  • the above described system can be implemented in presence of any kind of actuator and any kind of interface driven by the data processing unit.
  • the present invention can be applied to home automation systems where the target of the user's pointing action might be a set of appliances and the computer interface might simply be a control board for switching the appliances on and off.
  • the target of the user's pointing action could be represented by the landscape in front of the user.
  • the computer interface in this case, can be just a driver for an audio playback system providing, for example, information regarding the monuments and locations pointed at by the user.

Abstract

A system for the detection of the hand pointing action of a user is described. The system first performs user presence detection and then determines the user pointing action and the exact location of the object the user is pointing at. The persistence of the user pointing action is detected and interpreted as a selection operated by said user.

Description

HAND POINTING APPARATUS Field of the invention
This inventions refers to a hand pointing detection apparatus for determining a specific location pointed at by the user. State of the art
In today's computer systems design and engineering, human-machine interfaces enabling the transfer of information between the user and the system represent a field of growing importance. Human-machine interfaces enable bi-directional communication; on the one side input devices allow users to send commands to the system, on the other side output devices provide users with both responses to commands and feedback about user actions. Considering a standard graphic interface of a personal computer, we have that keyboard, mouse, touch screens are typical input devices while display, loudspeakers and printers are output devices. An important drawback of the most common input devices descends from the physical contact of the user with some of their mechanical parts that ends up wearing the device out. Moreover these kinds of input devices require to be close to the PC making it difficult, for the user, to input data when distant from the computer . A certain degree of training and familiarity with the device is required to the user for an efficient use of the device itself.
The latest development of technology in the computer vision field provided means to overcome the constraints and limitations of standard interfaces such as the ones described above. Several vision-based interaction approaches have been presented so far and, in this category, vision-based hand pointing systems appear to be particularly promising. These systems are typically based on a certain number of cameras, a video projector, a screen and a data processing system like a personal computer. The cameras are located so as to have the user and the screen in view; the system output is displayed by the projector onto the screen whose locations can be pointed at by the user. The presence of the screen is not necessary, the user pointing action can be detected even if it's related to objects located in a closed space (i.e. appliances in a room) or in an open one (i.e. sites that are part of the landscape in front of the user). Systems like the one described above may find interesting fields of application in museum automatic information systems as well as in home automation systems. Systems of the above-kind developed so far have several drawbacks that limit the possibility of being widely and easily used. In fact they need the environment in which they operate to be adapted to their strict requirements and this can be very difficult sometimes. These systems require a large number of cameras whose placement is not flexible but has to follow several constraints; the overall flexibility is thus limited. Moreover the user is normally required not to move while pointing and has to stay away from persons or objects that can cause the system to fail to recognize his pointing action.
More drawbacks of the state-of-the-art systems are related to the need of dedicated hardware components, sometimes difficult to find and very expensive. In addition to that, these systems require complicated and time consuming calibration procedures performed with dedicated devices such as marker plates. The present invention overcomes the above drawbacks introducing a method and an apparatus for the detection of hand pointing of a user based on standard, low cost hardware equipment. This method and apparatus is independent of the number of cameras used, the minimum number being two, and no constraints are set on cameras placement, save that the user must be in view of at least two cameras. The user is allowed to move freely while pointing and the system is independent of environmental changes and user position. Moreover, users of the apparatus described in the present invention are not requested to calibrate the system before interacting with it, since self-calibration at run time ensures adaptation to user characteristics such as physical dimensions and pointing style. Brief description of the drawings:
Fig. 1 is an overview of a typical embodiment of the present invention.
Fig. 2 shows how the location of the point P pointed at by the user is calculated as the intersection of the screen plane and the line L described by the user's pointing arm. Fig. 3 is a block diagram of the algorithm followed by the data processing unit to detect the hand pointing action. Fig. 4 is the flowchart of the "Background Learning" step of the algorithm. Fig. 5 is the flowchart of the "Calibration" step of the algorithm. Fig. 6 is the flowchart of the "User Detection" step of the algorithm. Fig. 7 is the flowchart of the "Lighting Adaptation" step of the algorithm. Fig. 8 is the flowchart of the "User Localization" step of the algorithm. Fig. 9 is the flowchart of the "Re-mapping" step of the algorithm. Fig. 10 is the flowchart of the "Selection" step of the algorithm. Fig. 11 is the flowchart of the "Adaptation" step of the algorithm. Detailed description of the invention A preferred embodiment of the present invention is depicted in Fig. 1 where we can see the systems components:
A pair of standard cameras (20) placed in a way to have in view the user (21 ) and the screen (22) pointed at by the user.
A personal computer (23) that processes data received by the cameras and turns them into interaction parameters and then into commands for its graphical interface.
An image projector (24) driven by the graphical interface of the personal computer. The projector illuminates the screen (22) pointed at by the user (21 ). Graphic Interface operation is based on both spatial and temporal analysis of user action. On the spatial side and referring to Fig. 1 and Fig. 2, the screen location P currently pointed at by the user is continuously evaluated as the intersection of the pointing direction with the screen plane. From each and every acquisition of the systems cameras the position of the head and of the pointing are arm of the user are detected and input to the next processing phase based on a stereo triangulation algorithm. On the temporal side, the system monitors persistency: as point P is detected on a limited portion of the screen for an appropriate amount of time, a discrete event, similar to a mouse click, i.e. a selection action, is generated for the interface. The overall interaction system behavior is that of a one-button mouse, whose "drags" and "clicks" reflect respectively changes and fixations of interest as communicated by the user through his natural hand pointing actions. The operation of the hand pointing system described in the present invention can be sketched as in Fig. 3. After the cameras have acquired the images of the user, said images are transferred to the PC that processes them following three distinct operational steps: Initialization (200) , Feature Extraction (201 ) and Runtime (203) where the Feature Extraction is a procedure that is used by both the other two phases since it is the one that is able to understand where, in the images, the head and the arm of the user are located. In detail, referring to Fig. 3, the Initialization is composed by two sub-steps: a phase of Background Learning (A) and a phase of Calibration (B). The Background Learning is described in Fig. 4. A number N of frames are chosen for background modeling. The N frames acquired by the cameras are input to the PC (100) and then, for each chromatic channel the mean value and the variance are calculated at each pixel (101 ). At the end of the iteration of N steps (102) the mean value and the variance of the three color channels at each pixel of the background images are calculated (103).
The Initialization phase proceeds with the Calibration step (Fig. 3 - B) that will be described later on. The next operational step is called Feature Extraction (201 ) at the end of which the system will acquire the information regarding the possible presence of a user in the cameras view field and his possible pointing action. The Feature Extraction starts with the phase of User Detection (Fig. 6). The current frame is acquired by the camcorders (100) then the background image previously detected is subtracted from the acquired frame (104). At this point the difference image (current frame minus background image) is weighted by variance (105). It means that at every pixel, for every color channel, the difference between the current frame value and the background value is calculated and weighted by the variance of the same background color channel considered. The calculated value is then compared to an appropriate threshold value X to decide if the pixel under consideration belongs to the background (calculated difference is less than X) or to the foreground (calculated difference is greater than X). As a result of these calculations, the image acquired is binarized into Background and Foreground. With reference to Fig. 6, after the binarization of the image, we have the Lighting Adaptation (D) described in detail in Fig. 7.
Through the Lighting Adaptation, the system updates its parameters basing on the light level of the actual frame acquired by the cameras. The statistics of the background pixels are thus recalculated in terms of mean value and variance (107). Next, the system updates the thresholds used for the image binarization during the previous step. The number of isolated foreground pixels is computed (108), in order to estimate the noise level of the CCD (charge coupled device) sensor of the camera and consequently update the threshold values (109) used to binarize the image in order to dynamically adjust system sensitiveness. At the end of the Lighting Adaptation phase we have the updated parameters that the system will use to compute the image binarization of the next frame acquired and the binary mask used at the previous cycle of acquisition is refined by topological filters (Fig. 6 - 110). With the next step (111 ) the user presence is classified by his shape and eventually detected (112).
If a user is detected, and he is pointing, the User Localization (Fig. 8) step is started and it is carried on through two different and parallel processes. On the one side the silhouette of the user is estimated by detecting the user head and the user arm position (115) through the use of the binary mask previously computed and of geometrical heuristics. On the other side the system detects the color of the detected user shape, to determine the zone of exposed skin internal to it. This process runs through several sub-steps: first the foreground is split up into skin and non-skin parts (116) applying the binary mask previously computed and a skin color model to the image acquired by the cameras. Then the detected skin parts are aggregated into connected blobs (117) and then again the user head and arm are identified by the use of geometrical heuristics (115). The results of the above described estimation are filtered by a smoothing filter (118) and a predictor (119) to reach the final estimate of the color based user localization step. The shape based estimate and the color based estimate are then combined (120) and the coefficients of the image line on each single image acquired are finally determined (121 ), where the image line is the line ideally connecting the head and the hand of the user and represents the pointing direction. The next step, once the pointing direction for every single frame is determined, is called runtime processing (Fig. 3 - 202). This is the final phase at the end of which the location the user is pointing at is determined. The first sub-step of this phase is called Remapping (Fig. 9). With reference to Fig. 2 we see that the system described in the present invention determines, in the way described above, as many lines as the cameras employed (li, dx; li, sx). These lines, together with the point the cameras are located at (Cdx, Csx), determine a plane in the real 3D space (πp.dx; rip.sx). Each one of these planes determines in turn a screen line (Ip.dx; Ip.sx) as their intersection with the plane of the screen (IT) pointed at by the user. The point to be determined is thus the intersection P of these screen lines. The remapping phase (Fig. 9) starts with the computing of the screen lines described above (122) one each iteration (123). Once the screen lines are all determined the location the user is pointing at is determined as the pseudo- intersection of the screen lines (124).
After remapping, the system enters the phase of Selection (Fig. 3 - G). With reference to Fig. 10, the screen point detected at the end of the previous phase is recorded (125), then its position is periodically checked with respect to a certain radius R (126, 127, 128 and 129) to determine if the point maintains the same position for a time that is recognized to be enough to show a pointing action by the user and as a consequence the system performs a "clicking" action in response to the persisting pointing action by the user. At the end of the Selection phase, the current screen point (130) represents the input datum for the following phase of Adaptation by which the system is trained to work with different users. First the selected screen point and related line coefficients are combined with the screen points end line coefficients from previous calibrations (131) then the system calibration parameters are recomputed by optimization (132). The Calibration phase is displayed in detail in Fig. 5. The PC drives the projector to show on the screen the calibration points (133) that have to be pointed at by the user, then the image line coefficients coming from the phase of User Localization are recorded too (134) and these steps are taken for each one of the K points chosen for the calibration (135). At the end of the K iterations a new set of optimised system calibration parameters is estimated (136). The above described system can be implemented in presence of any kind of actuator and any kind of interface driven by the data processing unit. For example the present invention can be applied to home automation systems where the target of the user's pointing action might be a set of appliances and the computer interface might simply be a control board for switching the appliances on and off. Another embodiment of the present invention could have the target of the user's pointing action represented by the landscape in front of the user. The computer interface, in this case, can be just a driver for an audio playback system providing, for example, information regarding the monuments and locations pointed at by the user.

Claims

Claims
1. A hand pointing apparatus comprising:
- at least two cameras;
- a data processing unit capable of processing the data received by the above said cameras turning them into interaction parameters and commands for an appropriate interface;
2. A hand pointing apparatus according to claim 1 wherein the data processing unit is a personal computer.
3. A hand pointing apparatus according to claims 1 and 2 wherein such data processing unit continuously records images of a defined location in order to detect whether a user is present or not in such defined location and if a user is present determines whether such user is pointing at a target and which is the location of the target he is pointing at.
4. A hand pointing apparatus according to claim 3 wherein the detection of the presence of the user is performed by comparing the previously detected background (without user) and the actual image transmitted by the cameras.
5. A hand pointing apparatus according to claims 3 and 4 wherein the determination of user's pointing action is performed by simultaneous detection of the user's silhouette and the user's skin colour. 6. A hand pointing apparatus according to Claim 5 wherein the data processing unit calculates as many lines ideally connecting the head and the pointing hand of the user as the number of cameras employed and determines the planes containing such lines and the point of the location of the corresponding cameras, determining the user's pointing direction as the line originated by the intersection of such planes and finally determining the location of the target the user is pointing at, as the intersection between such line and the plane on which said target is located.
7. Method for determining the location of the target a user is pointing at wherein a processing unit: - calculates as many lines ideally connecting the head and the pointing hand of the user as the number of cameras employed;
- determines the planes containing such lines and the point of the location of the corresponding cameras,
- determines the user's pointing direction as the line originated by the intersection of such planes
- and determines the location of the target the user is pointing at as the intersection between such line and the plane on which said target is located.
9. A hand pointing apparatus according to claims 1 - 6 wherein the data processing unit drives a graphic interface.
10. A hand pointing apparatus according to claims 1 - 6 and 8 comprising also a projector driven by the data processing unit graphic interface.
11. A hand pointing apparatus according to claim 9 comprising a screen which displays the images beamed by the projector.
PCT/EP2002/014739 2002-12-23 2002-12-23 Hand pointing apparatus WO2004057450A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP02796729A EP1579304A1 (en) 2002-12-23 2002-12-23 Hand pointing apparatus
PCT/EP2002/014739 WO2004057450A1 (en) 2002-12-23 2002-12-23 Hand pointing apparatus
AU2002361212A AU2002361212A1 (en) 2002-12-23 2002-12-23 Hand pointing apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2002/014739 WO2004057450A1 (en) 2002-12-23 2002-12-23 Hand pointing apparatus

Publications (1)

Publication Number Publication Date
WO2004057450A1 true WO2004057450A1 (en) 2004-07-08

Family

ID=32668686

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2002/014739 WO2004057450A1 (en) 2002-12-23 2002-12-23 Hand pointing apparatus

Country Status (3)

Country Link
EP (1) EP1579304A1 (en)
AU (1) AU2002361212A1 (en)
WO (1) WO2004057450A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102013206569A1 (en) * 2013-04-12 2014-10-16 Siemens Aktiengesellschaft Gesture control with automated calibration

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6101268A (en) * 1996-04-22 2000-08-08 Gilliland; Malcolm T. Method and apparatus for determining the configuration of a workpiece
US6147678A (en) * 1998-12-09 2000-11-14 Lucent Technologies Inc. Video hand image-three-dimensional computer interface with multiple degrees of freedom
US6198485B1 (en) * 1998-07-29 2001-03-06 Intel Corporation Method and apparatus for three-dimensional input entry
US6442416B1 (en) * 1993-04-22 2002-08-27 Image Guided Technologies, Inc. Determination of the position and orientation of at least one object in space

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6442416B1 (en) * 1993-04-22 2002-08-27 Image Guided Technologies, Inc. Determination of the position and orientation of at least one object in space
US6101268A (en) * 1996-04-22 2000-08-08 Gilliland; Malcolm T. Method and apparatus for determining the configuration of a workpiece
US6198485B1 (en) * 1998-07-29 2001-03-06 Intel Corporation Method and apparatus for three-dimensional input entry
US6147678A (en) * 1998-12-09 2000-11-14 Lucent Technologies Inc. Video hand image-three-dimensional computer interface with multiple degrees of freedom

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JENNINGS C: "Robust finger tracking with multiple cameras", RECOGNITION, ANALYSIS, AND TRACKING OF FACES AND GESTURES IN REAL-TIME SYSTEMS, 1999. PROCEEDINGS. INTERNATIONAL WORKSHOP ON CORFU, GREECE 26-27 SEPT. 1999, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, US, 26 September 1999 (1999-09-26), pages 152 - 160, XP002229564, ISBN: 0-7695-0378-0 *
JOJIC N ET AL: "Detection and estimation of pointing gestures in dense disparity maps", AUTOMATIC FACE AND GESTURE RECOGNITION, 2000. PROCEEDINGS. FOURTH IEEE INTERNATIONAL CONFERENCE ON GRENOBLE, FRANCE 28-30 MARCH 2000, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, US, 28 March 2000 (2000-03-28), pages 468 - 475, XP010378301, ISBN: 0-7695-0580-5 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102013206569A1 (en) * 2013-04-12 2014-10-16 Siemens Aktiengesellschaft Gesture control with automated calibration
US9880670B2 (en) 2013-04-12 2018-01-30 Siemens Aktiengesellschaft Gesture control having automated calibration
DE102013206569B4 (en) 2013-04-12 2020-08-06 Siemens Healthcare Gmbh Gesture control with automated calibration

Also Published As

Publication number Publication date
AU2002361212A1 (en) 2004-07-14
EP1579304A1 (en) 2005-09-28

Similar Documents

Publication Publication Date Title
US11887312B2 (en) Fiducial marker patterns, their automatic detection in images, and applications thereof
US20190126484A1 (en) Dynamic Multi-Sensor and Multi-Robot Interface System
US8698898B2 (en) Controlling robotic motion of camera
CN108283018B (en) Electronic device and method for gesture recognition of electronic device
CN107852447B (en) Balancing exposure and gain at an electronic device based on device motion and scene distance
JP2018522348A (en) Method and system for estimating the three-dimensional posture of a sensor
KR20120014925A (en) Method for the real-time-capable, computer-assisted analysis of an image sequence containing a variable pose
US11675178B2 (en) Virtual slide stage (VSS) method for viewing whole slide images
Akman et al. Multi-cue hand detection and tracking for a head-mounted augmented reality system
WO2004057450A1 (en) Hand pointing apparatus
Arita et al. Maneuvering assistance of teleoperation robot based on identification of gaze movement
Lee et al. Robust multithreaded object tracker through occlusions for spatial augmented reality
US20160110881A1 (en) Motion tracking device control systems and methods
EP3745332A1 (en) Systems, device and method of managing a building automation environment
JP2021174089A (en) Information processing device, information processing system, information processing method and program
EP3734960A1 (en) Information processing device, information processing method and information processing system
Guedri et al. Finger movements tracking of the human hand using the smart camera to control the Allegro Hand robot
Espinosa et al. Minimalist artificial eye for autonomous robots and path planning
Shanmugapriya et al. Gesture Recognition using a Touch less Feeler Machine
CN116204060A (en) Gesture-based movement and manipulation of a mouse pointer
CN115702320A (en) Information processing apparatus, information processing method, and program
CN114527922A (en) Method for realizing touch control based on screen identification and screen control equipment
Nair et al. 3D Position based multiple human servoing by low-level-control of 6 DOF industrial robot
Nair et al. Visual servoing of presenters in augmented virtual reality TV studios

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SC SD SE SG SK SL TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LU MC NL PT SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2002796729

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 2002796729

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP