WO2005017729A2 - Interface method and device between man and machine realised by manipulating virtual objects - Google Patents

Interface method and device between man and machine realised by manipulating virtual objects Download PDF

Info

Publication number
WO2005017729A2
WO2005017729A2 PCT/IB2004/002667 IB2004002667W WO2005017729A2 WO 2005017729 A2 WO2005017729 A2 WO 2005017729A2 IB 2004002667 W IB2004002667 W IB 2004002667W WO 2005017729 A2 WO2005017729 A2 WO 2005017729A2
Authority
WO
WIPO (PCT)
Prior art keywords
user
virtual object
virtual
virtual objects
panoramic image
Prior art date
Application number
PCT/IB2004/002667
Other languages
French (fr)
Other versions
WO2005017729A3 (en
Inventor
Luigi Giubbolini
Original Assignee
Luigi Giubbolini
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Luigi Giubbolini filed Critical Luigi Giubbolini
Publication of WO2005017729A2 publication Critical patent/WO2005017729A2/en
Publication of WO2005017729A3 publication Critical patent/WO2005017729A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/012Head tracking input arrangements
    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B27/00Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
    • G02B27/01Head-up displays
    • G02B27/017Head mounted
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B27/00Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
    • G02B27/01Head-up displays
    • G02B27/0101Head-up displays characterised by optical features
    • G02B2027/014Head-up displays characterised by optical features comprising information/image processing systems
    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B27/00Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
    • G02B27/01Head-up displays
    • G02B27/0179Display position adjusting means not related to the information to be displayed
    • G02B2027/0187Display position adjusting means not related to the information to be displayed slaved to motion of at least a part of the body of the user, e.g. head, eye

Definitions

  • the present invention concerns an interface method and device between man and machine realised by manipulating virtual objects. More precisely, the invention concerns a device comprising a pair of glasses to be worn by a user on his/her head, said glasses allowing to normally see the real scene and at the same time to display the image of virtual objects superimposed on the observed image, and a method using said device. It is known that in any processing system data are usually inputted by means of a keyboard or a mouse or a pointer and so on, and are usually displayed on a screen which, if necessary, may be integrated inside a device having the aspect of a traditional pair of glasses. Said device is also known under the name of "Head Mounted Display".
  • the screen of a computer may be replaced by a so called "Head-Up Display” which allows to display a synthetic image superimposed on a real image.
  • a Head-Up Display may be integrated into the windscreen of a vehicle, thereby allowing to project the image of a virtual dashboard superimposed on the normal real image.
  • the focusing of the virtual image is set at some meters in front of the windscreen and appears fixed with respect to the vehicle: in this way, the user does not have any more the need of adjusting his/her eye at two different focal lengths as it normally occurs when shifting from observing the scene to observing the real dashboard.
  • the miniaturisation of the electronic circuits and of the memory supports allows to realise calculation systems more and more powerful and having a more and more limited size. Therefore the use of a Head Mounted Display integrated inside a pair of glasses allows to realise portable computers having a more limited size since the screen becomes superfluous as it is integrated into the glasses.
  • the need of manufacturing smaller and smaller and more and more portable electronic apparatuses is limited by the physical size of the keyboard which in any case should not have a too much limited size with respect to the average size of one's fingers.
  • the realisation of keyboards having more characters associated to the same physical key involves a slowdown and makes it difficult to input the data.
  • the patent US 6,346,929 describes a device allowing to display virtual images (for instance functional icons) superimposed on the real image and to automatically recognise gestures performed by the user with his/her hands or with other pointing objects which select the virtual icons and activate the controls associated with them.
  • virtual objects described in the patent US 6,346,929 are superimposed on the real scene but are not integral with it. This involves that when the user moves his/her observation point of the real scene by translating or rotating his/her head, the virtual objects move integrally with the Head Mounted Display, that is integrally with the user's head, and therefore said virtual objects move with respect to the observed real scene.
  • a further object of the present invention is that of realising a method and a device for allowing a user to interface himself/ herself with an external processing system by manipulating virtual objects.
  • the above and other objects are achieved by the method and the device as claimed in the hereby appended claims.
  • the particularity of the invention consists in the fact that the displayed objects are kept in a fixed spatial relationship with respect to the observed scene, independently from the user's observation point.
  • the method and the device according to the invention allow to:
  • the virtual objects are set by the user in real physical positions of the environment surrounding the user himself/ herself and indicated by him/her;
  • the view of the virtual objects is made three-dimensional and follows the perspective laws, with respect to the user's eyes, of the physical point in which the virtual object has been set;
  • the view of the virtual objects provides the user with aspects and particulars of the same according to their position and orientation with regard to the user's eyes. In this way, the user does not perceive any difference in the view of virtual objects with respect to the view of real objects.
  • the virtual objects are seen integral, that is "anchored", with the surrounding physical reality and their view follows the same optical laws as the view of the real objects. The user can then feel the following perceptions:
  • the virtual objects disappear from the user's visual field when the real physical space in which they have been set is not observed any more; - the virtual objects reduce/ increase their own visible size when the user moves away from them/ moves close to them;
  • the virtual objects show different views when the user rotates his/her observation point around them;
  • each virtual object is described by means of a three-dimensional model defining its shape and material and allowing the synthesis of different views according to the observation point.
  • each three-dimensional model of each virtual object can be constituted by a plurality of shapes which may be automatically displayed in a sequential way, thereby allowing to display virtual objects of dynamical type, like for instance a metronome, which autonomously change shape and position inside the real scene according to a scheduled sequence, even if they remain set at the physical point defined by the user.
  • virtual objects of dynamical type like for instance a metronome, which autonomously change shape and position inside the real scene according to a scheduled sequence, even if they remain set at the physical point defined by the user.
  • the real environment surrounding the user can be fitted out with objects having different geometries or with geometric drawings, for instance on the walls of a room.
  • the visual design with three-dimensional geometric objects advantageously increases the robustness of the anchoring points identified in the objects themselves, thereby improving the functionality of self-localisation of the device with respect to the environment and the stability of the virtual objects perceived by the user, in the sense of maintaining the virtual objects integral with the physical reality in which they have been set.
  • the preliminary process of self-localisation of the device exploits the principle according to which two stereoscopic images of a real scene taken by two cameras can be processed in order to obtain the three-dimensional position of a constellation of points known in literature as "points of interest".
  • points exhibit the characteristic of being physical points of the real scene which may be easily recognised in an automatic way upon varying both of the lighting conditions and of the angular field of the scene itself.
  • the storage of a constellation of points of interest recognised in a determined environment subsequently allows the self-localisation of the stereoscopic cameras in front of the same scene: the points of interest act in this case as “anchoring points” for the self- localisation.
  • a plurality of bi-dimensional images of the same scene taken with different observation angles may be "merged" for creating a panoramic image with a wider angular field, provided that the edges of adjacent images are partially overlapped in order to favour their alignment and connection.
  • the user can interact with a world of virtual objects, that he/she sees superimposed and anchored to the real scene, able to exchange information with an external processing system.
  • the user can define, besides the number and the type of virtual objects, the real physical positions where they have to be displayed, that is the setting points of the virtual objects in the real scene.
  • it is possible to update in real time the display of the virtual objects according to the user's observation point with respect to them so that they appear fixed with respect to the observed physical scene, that is they remain integral with the real physical points while the user moves his/her observation point both moving his/her eyes and head.
  • the perception of the virtual objects is then made realistic by the invention thanks to the fact the virtual objects are displayed in stereoscopic way, thereby providing the user with a perception of three-dimensionality, and to the fact that the stereoscopic view of the virtual objects provided to the user is synthesised by the physical position of his/her eyes and by the physical positions in which the virtual objects have been set.
  • the physical positions are defined in relationship with the constellation of anchoring points that the system has recognised in the panoramic image of the surrounding environment.
  • the user's gestural commands on the basis of the spatial coincidence of the gestures with the spatial position of the displayed virtual objects. For instance, it is possible to interpret the movement of the fingers of the user's hands with respect to the position of the virtual objects.
  • An applicative example is that of writing on a virtual keyboard.
  • an interface man-machine of virtual type wherein the gestural and voice commands provided by the user are interpreted and decoded both for interacting with the virtual objects themselves and for sending information towards an external processing system; at the same time, information coming from the external processing system may be communicated to the user by means of sound messages or of notification of one or more graphical attributes of the displayed virtual objects like shape, colour, alphanumeric text, icons and so on.
  • information coming from the external processing system may be communicated to the user by means of sound messages or of notification of one or more graphical attributes of the displayed virtual objects like shape, colour, alphanumeric text, icons and so on.
  • An applicative example is the virtual display of a screen on which it is displayed the text inputted by the user by means of a virtual keyboard: in this way, the invention allows to replace the keyboard and the screen of the normal computers, thereby allowing the user to maintain the normal interaction with the real scene.
  • FIG. 1 is a schematic view of the device according to the invention.
  • FIG. 2 is a block diagram of the architecture of the device of Fig. 1;
  • - Fig. 3A is a flow-chart of a start-up configuration phase of the device of Fig. 1 aimed to acquire a panoramic image;
  • FIG. 3B is a flow chart showing the normal operation of the device of Fig. i;
  • FIG. 4 shows an applicative example of the device according to the invention able to replace a screen, a keyboard and a pointing device of a computer.
  • a device 1 according to the invention and comprising a display means 1A having the aspect of a pair of glasses, a processing and power system 7 and a real keyboard 5, which allows the user's physical manual interaction with said processing and power system 7.
  • the keyboard 5 is interconnected with the system 7 by means of a cable 6.
  • the device 1 is connected with an external processing system, like a
  • the display means 1A embodies a display comprising two semi-transparent screens ID and IS, the respective geometric centres of which are set at a distance of about 7 centimetres the one from the other and the size of which is such to occupy at least part of the whole visual field of a user once he/she has worn the display means 1A.
  • the suffixes "D” and "S” indicate right and left respectively.
  • the propriety of semi-transparency of the two screens ID and IS allows the normal view of the real scene on which the display means 1A is able to superimpose a stereoscopic image of virtual objects.
  • the correct stereoscopic display of the virtual object provides the user with the three-dimensional perception of the virtual object itself.
  • Two adjusting wheels 1Q, 1W allow to adjust the optical focal length of the display means 1A in order to optimally display the virtual image on the basis of the user's visual ability.
  • the display means 1A is provided with two cameras 2D and 2S, respectively positioned above the two screens ID and IS and spaced at about 7 centimetres the one from the other.
  • the two cameras 2D,2S take the real scene in stereoscopic way and with an observation angle equal to about 140° by 140°, that is similar to the observation angle of the human eye, with respect to an axis 1Y in the two planes respectively identified by the axes 1Y-1X and 1Y- 1Z.
  • the display means 1A is provided with two earphones 3D and 3S and with a microphone 4 provided for the voice interaction of the user with the device 1.
  • the earphones 3D and 3S are positioned so that, once the device 1A has been worn, they are in the immediate proximity of the user's ears, while the microphone 4 is in front or near the user's mouth.
  • the earphones 3D and 3S are connected with the side-pieces 11D, 11S of the display means 1A through two short cables 3A and 3B which make it easier to insert the two earphones 3D,3S into the two user's ears.
  • the microphone 4 is integrally connected with the left side-piece 11 S of the display means 1A.
  • a stereophonic microphone constituted by two monophonic microphones 4D and 4S housed in the side-pieces 11D, 11S of the display means 1A, allows to acquire the ambient sound.
  • the display means 1A is not operating or there is no virtual image active, the user normally sees the real scene thanks to the propriety of semi-transparency of the two screens IS, ID which do not deform the real image.
  • the screens ID and IS may be two non-transparent screens on which it is represented the stereoscopic image taken by the two cameras 2D and 2S on which the synthesised stereoscopic image of the virtual objects is electronically superimposed.
  • the processing and power system is housed in the part 7, containing also a power feed source, preferably a battery, and is connected with a device 1A through a cable 8 and to the external processing system through a cable 9.
  • the whole device 1 or the only display means 1A may, if necessary, be integrated into a helmet like for instance a crash or safety helmet or the like.
  • the constituting components and the operation of the device 1 of Figure 1 are shown in greater detail.
  • the images acquired by the cameras 2S and 2D are processed by a module 13 for processing stereoscopic images which reconstructs a three-dimensional image of the observed scene in real time.
  • a start-up configuration phase which will be described in greater detail in the following, a plurality of three-dimensional images acquired by different angles and points of view are merged for constructing the panoramic image of the real environment surrounding the user.
  • the image so acquired is a three-dimensional panoramic image of the scene surrounding the user taken by a plurality of points of view, said image being stored in a memory 14.
  • the invention makes use of one among the many methods of known type.
  • the three-dimensional image is analysed by a module 15 for detecting and localising the anchoring points, as shown in greater detail in the following, which are compared with the points already recognised in the environmental panoramic image stored in the memory 14 in order to identify the visual angular sector observed by the user.
  • the identified and localised anchoring points are used by the module 18 for processing virtual objects in order to make it possible to synthesise the stereoscopic perspective view of the virtual objects which have been activated and set by the user in the observed visual field.
  • the stereoscopic perspective view is constituted by two perspective views which are stored in two image memories 12D and 12S and displayed to the user through the two semi-transparent screens ID and IS which are connected to the respective image memories 12D and 12S.
  • the two perspective views are synthesised starting from the two projection centres set in the user's corneas that are assumed to be set at about one centimetre behind the geometric centres of the two screens ID and IS. In this way, it is presented to the two user's eyes a stereoscopic view of the virtual objects, said view being the same view which would be produced by a physical object similar to the virtual object and set in the same position. This causes to the user a three-dimensional perception of the displayed virtual objects.
  • the synthesis of the two perspective views and the update in real time of the two memories 12D and 12S are performed by the module 18 for processing virtual objects starting from: a) the three-dimensional models of the virtual objects contained in the memory 16; b) the space co-ordinates, at which the virtual objects have been set by the user, and their orientation are stored in the memory 17 containing the data base of the status of the activated virtual objects; c) the position of the anchoring points acquired through the module 15 for detecting and localising the anchoring points; d) the attributes of the activated virtual objects, stored in the memory 17.
  • the attributes of the virtual objects stored in the memory 17 provide a complete description both of the elementary visual characteristics of the single virtual objects, like for instance the colour, the lighting point, the properties of optical reflection and so on, and of the constituting features of the composite virtual objects, like for instance aspect and shape, relative position of the various component parts and so on, and of all the auxiliary information associated with the virtual objects.
  • the lighting point of the virtual objects may be set a priori or set coincident with the lighting point of the real scene gathered during the panoramic image acquisition process.
  • the processing and power system 7 comprises a control unit 19, a communication unit 19A and a power source 19B.
  • the control unit 19 is in charge of all the processing activity and process synchronisation and communicates through the cable 9 with an external processing system by means of a communication unit 19A.
  • the power source 19B feeds the whole device 1.
  • the real keyboard 5, connected through the cable 6 to the control unit 19, allows a physical intervention on the system during the start-up configuration of the device and in case of fault situations.
  • the external processing system receives the information, contained in the memory 17, relative to the status of the virtual objects activated, if necessary modifies it, and retransmits it to the control unit 19 which provides for updating the memory 17.
  • This operating scheme allows the external system both to know the content of the status memory of the virtual objects and to modify its content in real time. This implements the user's virtual interaction with the external processing system.
  • the module 21 for detecting the user's actions allows to identify the actions performed on the virtual objects by the user's fingers or by a pointing device he/she employs.
  • the module 21 detects the objects in motion in the observed three-dimensional image acquired by the module 13 and acquires both its position and velocity. Then, said module 21 checks whether the objects in motion have validly activated a virtual object. The validity check is performed by comparing the position of the physical object in motion and the space occupation of the virtual objects and of their parts. When the space coincidence has been checked, the status of the "touched" object is modified. In order to minimise false alarms, only the actions performed by objects having a velocity included in a certain range predetermined during the start-up configuration phase of the whole device 1 are considered to be valid.
  • the processor 20 is also in charge of the synthesis of the stereophonic audio produced through the two earphones 3D and 3S by opportunely mixing the ambient sound with the voice suggestions and the sounds produced by the virtual objects.
  • voice commands acquired through the microphone 4 and a suitable voice recognition software or through manual commands acquired through the auxiliary keyboard 5, the user can activate virtual objects, static or dynamic, having an input functionality (e.g. a keyboard, a mouse, a switch, and so on) or an output functionality (e.g.
  • the user identifies in the real visual field a physical point for setting the virtual object that he/she wishes to activate and shows it to the device by a movement of the pointer object, wherein the pointer object and the type of movement must be defined during the start-up configuration phase: the movement may be, for instance, a rhythmic movement of the right index finger like "a double mouse-click";
  • the stereoscopic system identifies the finger in the scene and localises it with regard to the constellation of anchoring points, stores in the memory 17 the co-ordinates as a setting point of the new virtual object, synthesises and activates the three-dimensional display of the virtual object (the type of which had been defined in the voice command) according to the user's point of view by superimposing said three-dimensional display on the real image.
  • the constellation of the anchoring points detected in the observed scene is compared with the panoramic constellation in order to localise the position of the device 1A with respect to the real scene.
  • the invention For detecting the anchoring points in the observed scene, the invention employs one of the many methods known in literature as recognition of the points of interest in a stereoscopic image, that is those points which have the maximum probability of being automatically identified upon varying the observation point of the scene.
  • the invention employs for the three-dimensional localisation of the anchoring point one among the many stereoscopic methods well known in literature by using the cameras 2D and 2S arranged in the immediate proximity of the eyes and having an observation angular opening equal to that of the human eye.
  • the use of the stereophony allows to provide the user with sound suggestions coming from different directions and allows then to direct the user's attention in the space, thereby making it more intuitive his/her interaction with the device 1: this is particularly useful in environmental panoramic acquisition phase. Besides, the same mechanism allows to perceive the sounds produced by the virtual objects coherently with their physical settings.
  • the figure 3A it is described the operation mode concerning the start-up configuration of the device 1 aimed to acquire the panoramic image starting at the step 100A and ending at the step 101A.
  • the device 1 performs a stereoscopic image processing loop aimed to acquire a three-dimensional panoramic image of the scene surrounding the user and taken by a plurality of points of view.
  • the image is stored in the memory 14.
  • a stereoscopic image of the real scene is acquired and a corresponding three-dimensional image is constructed.
  • the constructed three-dimensional images are merged in order to obtain a complete panoramic image of the environment surrounding the user.
  • the loop is performed a plurality of times until the device 1 recognises that a complete acquisition of the panoramic image surrounding the user has been completed (step 24).
  • the optical scanning of the environment is performed by the user in a way guided by a voice suggestion process (step 25) which provides the user with indications about moving his/her head like for instance "rotate to the right”, “rotate to the left”, “rotate upwards”, “rotate downwards”, “bent to the right”, “bent to the left” and so on.
  • the user wearing the display means 1A performs an angular scanning of the environment surrounding him/her around the O 2005/017729 15 three axes IZ, IX and 1Y, shown in Figure 1, respectively.
  • the three-dimensional environmental panoramic image is acquired by merging the three-dimensional images which partially overlap: the overlapping zones are used for their alignment and connection.
  • the angular amplitude of the panoramic scanning depends on the application's uses of the device; for instance, for the application of virtual desk-top, a scanning of the head by 90° around the two axes IX and IZ and by 60° around the axis 1Y is considered to be sufficient.
  • a linear scanning of the head equal to 30 centimetres along the three axes IX, 1Y and IZ is considered to be sufficient.
  • the detection and the storing (step 25B) of the constellation of anchoring points in the panoramic image which is stored together with the panoramic image itself in the memory 14.
  • the operation mode called "normal operation" starting at step 100B and ending at step 10 IB.
  • the device 1 performs a loop of the following procedures: at step 26 a stereoscopic image is acquired and a corresponding three-dimensional image is constructed. At step 27 some anchoring points are detected and localised.
  • step 28 the memory 17 of the status of the virtual objects is updated; at step 29 the position, the velocity and the shape of objects possibly in motion in the observed scene are detected. If the velocity of the object is considered to be valid (step 30), that is to say if said velocity belongs to the range of allowable values defined during the configuration phase of the device 1, and if the shape of the object is valid, that is to say if the shape of the object belongs to a collection of shapes defined during the start-up configuration phase of the device 1, then the control of the object is considered to be valid. In this case, it is checked (step 37) whether the voice recognition process has recognised a voice command for activating a new virtual object.
  • step 31 the position of the object in motion is compared (step 31) with the space occupation of the virtual objects and of their parts: if the space coincidence has been verified, the analysed movement of the real object is considered to be valid and, as a consequence, the status of the "touched" object, or of its constituting part, is modified (step 32).
  • step 36 the position of the object in motion is compared with the position of the anchoring points of the panoramic constellation. If the space coincidence has been checked, the analysed movement of the real object is considered to be valid and, as a consequence, the "touched" anchoring point is used for setting the new virtual object and the memory 17 is updated.
  • the memories 12S, 12D of the two displays IS, ID are updated and it is then performed (step 34) the process of communicating the status of the virtual objects to the external processing system.
  • the status is received by the external processing system which, after having modified it if necessary, transmits it updated to the device 1 which, in the next loop, will use it for displaying the virtual objects in an updated way.
  • step 35 it is performed a control of the on-off status of the device 1 for deactivating it if required.
  • the control is performed by analysing both a virtual switch (if its display has been activated by the user), and a real switch-on key existing in the real keyboard 5.
  • the interface device 60 comprises a device 1 as previously described and a plastic tablet 50, for instance of the type used for filling in paper forms, equipped with coloured adhesive edges.
  • the display means 1A is connected with a standard port VGA of an external processing system 59 constituted by a personal computer of the note-book type.
  • the pair of stereoscopic cameras 2D,2S mounted on the display means 1A are connected with the personal computer 59 by means of standard ports, for instance USB or Video, of the computer.
  • the personal computer 59 may be defined three areas 52, 54 and 56 on which are respectively displayed three virtual objects: a monitor of the touch-screen type, a virtual paper module, of which the personal computer 59 provides the bitmap image for acquiring in the personal computer notes on virtual paper, and a touch-pad.
  • the areas 52 and 56 are analysed for providing the co-ordinates x,y of a pointing means, preferably a pen held by the user, when the pointing means touches the tablet 50.
  • the interface device 60 allows to:
  • the Visual Processing software preferably written in MATLAB and /or C language in order to allow its compilation in a Windows environment, comprises the following code segments: a) a code segment for acquiring the images from the pair of stereoscopic cameras 2S,2D provided on the display means 1A; b) a code segment for segmenting the tablet 50 from the background on which it rests; c) a code segment for segmenting the pen on the tablet 50; d) a code segment for segmenting other objects on the tablet like typically the user's hands; e) a code segment for the perspective projection of the virtual screen and of the touch-pad in the respective rectangular areas 52,56 of the tablet 50, wherein in the virtual display 52 it is displayed the image normally present on the screen of the computer 59, the bitmap of which is acquired through a suitable video redirection driver, and
  • the key pressure will be simulated by a circular movement to the right, or other movement of the tip of the pen, on the key which is considered to be more favourable.
  • the information displayed and acquired by the Visual Processing software must be interfaced to the operating system of the personal computer 59. To this end are provided the two above mentioned drivers: the pointing driver and the video redirect driver.
  • the pointing driver provides the operating system, and then the applications, with: the absolute co-ordinates x,y of the tip of the pen held by the user, when the pen is set inside the tablet in the area 52; the relative movement along x and y of the tip of the pen held by the user, when the pen is set inside the tablet in the area 56; the "pressure" on the three keys displayed in the area 56; the image of the area 54 of the tablet taken by the camera in bitmap format, for possible applicative uses.
  • the video redirect driver sends to the occlusive glasses, through the standard VGA port, the image synthesised by the Visual Processing software according to the previous point f) and sends to the Visual Processing software the screen bitmap generated by the operating system.
  • connection between the processing and power system 7 and the display means 1A can be made via radio instead of via cable.
  • the electric power source for the display means 1A may be housed in the side-pieces 1 1S, 11D of the display means 1A, while the power source of the processing system will be housed in the part 7.
  • the connection between the processing and feed system 7 of the device 1 and the external processing system will be made wireless as well, so that the external processing system may be also constituted by a processing system of fixed type arranged remotely with respect to the user.
  • the panoramic image acquisition phase may be made superfluous by previously storing the image in the external system and by transferring the same in the device 1 during an automatic start-up phase.
  • the operative condition of the invention is that of a user which, without needing to perform a system configuration, interacts with different virtual objects according to the physical environment where he/she is: the virtual objects are activated by the external processing system according to the objectives predetermined by the system itself.
  • Possible examples of applications are the virtual office, the assistance in the visit of museums and libraries, the assistance in the diagnostics and in the intervention both in the surgery field and in the apparatus and manufactured products field.
  • the display of the virtual images may be shared among many users, each wearing a display 1: this operating mode is obtained by updating simultaneously and identically the memory of the status of the virtual objects 17 existing in the different devices.
  • the update may be performed by the external processing system through the communication channel 9.
  • the shared manipulation of virtual objects may find, for instance, an application in the group work for designing a system, a machine, for the diagnostics, for the didactics and so on.

Abstract

Interface device (1) with a processing system (59) by manipulating virtual objects, said device comprising display means (lA) wearable on the head of a user and comprising at least one screen (1 D,1S) for making it possible to display at least one virtual object simultaneously with the real scene observed by the user of said device (1), wherein the virtual object is associated to a physical point of the real scene and follows the optical laws of said physical point, in particular appearing or disappearing according to whether the user is observing the physical point associated to the virtual object or not; the invention also concerns a method using said device (1).

Description

INTERFACE METHOD AND DEVICE BETWEEN MAN AND MACHINE REALISED BY MANIPULATING VIRTUAL OBJECTS
The present invention concerns an interface method and device between man and machine realised by manipulating virtual objects. More precisely, the invention concerns a device comprising a pair of glasses to be worn by a user on his/her head, said glasses allowing to normally see the real scene and at the same time to display the image of virtual objects superimposed on the observed image, and a method using said device. It is known that in any processing system data are usually inputted by means of a keyboard or a mouse or a pointer and so on, and are usually displayed on a screen which, if necessary, may be integrated inside a device having the aspect of a traditional pair of glasses. Said device is also known under the name of "Head Mounted Display". As an alternative, the screen of a computer may be replaced by a so called "Head-Up Display" which allows to display a synthetic image superimposed on a real image. Many are the applications using said devices. For instance, in the car sector, a Head-Up Display may be integrated into the windscreen of a vehicle, thereby allowing to project the image of a virtual dashboard superimposed on the normal real image. The focusing of the virtual image is set at some meters in front of the windscreen and appears fixed with respect to the vehicle: in this way, the user does not have any more the need of adjusting his/her eye at two different focal lengths as it normally occurs when shifting from observing the scene to observing the real dashboard. The miniaturisation of the electronic circuits and of the memory supports allows to realise calculation systems more and more powerful and having a more and more limited size. Therefore the use of a Head Mounted Display integrated inside a pair of glasses allows to realise portable computers having a more limited size since the screen becomes superfluous as it is integrated into the glasses. However, the need of manufacturing smaller and smaller and more and more portable electronic apparatuses (computers, cellular phones, GPS, and so on) is limited by the physical size of the keyboard which in any case should not have a too much limited size with respect to the average size of one's fingers. The realisation of keyboards having more characters associated to the same physical key (as in the existing cellular phones) involves a slowdown and makes it difficult to input the data. Devices integrating cameras, earphones, microphone and keyboard into an Head Mounted Display are known as well. For instance, the patent US 6,346,929 describes a device allowing to display virtual images (for instance functional icons) superimposed on the real image and to automatically recognise gestures performed by the user with his/her hands or with other pointing objects which select the virtual icons and activate the controls associated with them. However, the virtual objects described in the patent US 6,346,929 are superimposed on the real scene but are not integral with it. This involves that when the user moves his/her observation point of the real scene by translating or rotating his/her head, the virtual objects move integrally with the Head Mounted Display, that is integrally with the user's head, and therefore said virtual objects move with respect to the observed real scene. This causes to the user a feeling of deviation between the physical reality and the virtual reality. It is therefore an object of the invention to provide a method and a device able to superimpose on the normal visual perception of the real scene a realistic perception of virtual objects that remain anchored to the real scene also when the user moves his/her head. A further object of the present invention is that of realising a method and a device for allowing a user to interface himself/ herself with an external processing system by manipulating virtual objects. The above and other objects are achieved by the method and the device as claimed in the hereby appended claims. The particularity of the invention consists in the fact that the displayed objects are kept in a fixed spatial relationship with respect to the observed scene, independently from the user's observation point. The method and the device according to the invention allow to:
- display the virtual objects superimposed on the observed scene while maintaining their images anchored to fixed physical points of the real scene;
- detect the user's interaction with the virtual objects;
- communicate to an external processing system the status of the virtual objects and the detected interaction; - control, modify and use the virtual objects. The perception of the virtual objects is made realistic by the invention thanks to the fact that:
- the virtual objects are set by the user in real physical positions of the environment surrounding the user himself/ herself and indicated by him/her;
- the view of the virtual objects is made three-dimensional and follows the perspective laws, with respect to the user's eyes, of the physical point in which the virtual object has been set;
- the view of the virtual objects provides the user with aspects and particulars of the same according to their position and orientation with regard to the user's eyes. In this way, the user does not perceive any difference in the view of virtual objects with respect to the view of real objects. The virtual objects are seen integral, that is "anchored", with the surrounding physical reality and their view follows the same optical laws as the view of the real objects. The user can then feel the following perceptions:
- the virtual objects disappear from the user's visual field when the real physical space in which they have been set is not observed any more; - the virtual objects reduce/ increase their own visible size when the user moves away from them/ moves close to them;
- the virtual objects show different views when the user rotates his/her observation point around them;
- the aspects of the virtual objects, like visible particulars, reflex lighting, shading and so on, change in relationship with their position relative to the user's eyes and to the lighting direction of the real scene. The device according to the invention is preferably realised as a common pair of glasses. However, the device can be integrated into an helmet visor. Advantageously, each virtual object is described by means of a three-dimensional model defining its shape and material and allowing the synthesis of different views according to the observation point. Advantageously, each three-dimensional model of each virtual object can be constituted by a plurality of shapes which may be automatically displayed in a sequential way, thereby allowing to display virtual objects of dynamical type, like for instance a metronome, which autonomously change shape and position inside the real scene according to a scheduled sequence, even if they remain set at the physical point defined by the user. In order to favour a preliminary process of self-localisation of the device in a real environment, the real environment surrounding the user can be fitted out with objects having different geometries or with geometric drawings, for instance on the walls of a room. In fact, the visual design with three-dimensional geometric objects, like for instance cubes, pyramids and so on, or bi-dimensional, like figures or geometric drawings, advantageously increases the robustness of the anchoring points identified in the objects themselves, thereby improving the functionality of self-localisation of the device with respect to the environment and the stability of the virtual objects perceived by the user, in the sense of maintaining the virtual objects integral with the physical reality in which they have been set. The preliminary process of self-localisation of the device exploits the principle according to which two stereoscopic images of a real scene taken by two cameras can be processed in order to obtain the three-dimensional position of a constellation of points known in literature as "points of interest". These points exhibit the characteristic of being physical points of the real scene which may be easily recognised in an automatic way upon varying both of the lighting conditions and of the angular field of the scene itself. The storage of a constellation of points of interest recognised in a determined environment subsequently allows the self-localisation of the stereoscopic cameras in front of the same scene: the points of interest act in this case as "anchoring points" for the self- localisation. A plurality of bi-dimensional images of the same scene taken with different observation angles may be "merged" for creating a panoramic image with a wider angular field, provided that the edges of adjacent images are partially overlapped in order to favour their alignment and connection. Likewise, it is possible to "merge" the constellations of anchoring points relative to different angular sectors of the environment around the user, thereby obtaining a panoramic constellation of anchoring points of the environment. If the camera is not simply rotated with respect to its optical centre but undergoes a translation as well, which is surely true for at least one camera in case of rotation of two integral cameras which take the same scene while rotating, in some images appear particulars which are hidden in other images, given the three-dimensional nature of the observed scene. By means of rotations and translations it is therefore possible to advantageously acquire a three-dimensional panoramic image of the scene surrounding the user taken by a plurality of points of view and constituted by the constellation of all the anchoring points which have been observed and recognised during the scanning by rotation and translation. According to the invention, the user can interact with a world of virtual objects, that he/she sees superimposed and anchored to the real scene, able to exchange information with an external processing system. According to the invention, the user can define, besides the number and the type of virtual objects, the real physical positions where they have to be displayed, that is the setting points of the virtual objects in the real scene. According to the invention, it is possible to update in real time the display of the virtual objects according to the user's observation point with respect to them so that they appear fixed with respect to the observed physical scene, that is they remain integral with the real physical points while the user moves his/her observation point both moving his/her eyes and head. The perception of the virtual objects is then made realistic by the invention thanks to the fact the virtual objects are displayed in stereoscopic way, thereby providing the user with a perception of three-dimensionality, and to the fact that the stereoscopic view of the virtual objects provided to the user is synthesised by the physical position of his/her eyes and by the physical positions in which the virtual objects have been set. The physical positions are defined in relationship with the constellation of anchoring points that the system has recognised in the panoramic image of the surrounding environment. According to the invention, it is possible to detect the user's interaction with the virtual objects by detecting and localising the objects in motion in the observed scene. The user's interaction with the virtual objects observed by the user himself/ herself is made possible by means of the analysis of the movements of the observed real objects. According to the invention, it is possible to interpret the user's gestural commands on the basis of the spatial coincidence of the gestures with the spatial position of the displayed virtual objects. For instance, it is possible to interpret the movement of the fingers of the user's hands with respect to the position of the virtual objects. An applicative example is that of writing on a virtual keyboard. According to the invention, it is realised an interface man-machine of virtual type wherein the gestural and voice commands provided by the user are interpreted and decoded both for interacting with the virtual objects themselves and for sending information towards an external processing system; at the same time, information coming from the external processing system may be communicated to the user by means of sound messages or of notification of one or more graphical attributes of the displayed virtual objects like shape, colour, alphanumeric text, icons and so on. According to the invention, it is possible to communicate in real time to an external processing system the status of the virtual objects, if necessary modified by the user's interaction: in this way, the external processing system receives information from the user. The status may be also updated by the external system; in this way, information coming from the external processing system is provided to the user. An applicative example is the virtual display of a screen on which it is displayed the text inputted by the user by means of a virtual keyboard: in this way, the invention allows to replace the keyboard and the screen of the normal computers, thereby allowing the user to maintain the normal interaction with the real scene. The invention will be now described in detail with particular reference to the attached drawings, provided as a non-limitative example, wherein:
- Fig. 1 is a schematic view of the device according to the invention;
- Fig. 2 is a block diagram of the architecture of the device of Fig. 1;
- Fig. 3A is a flow-chart of a start-up configuration phase of the device of Fig. 1 aimed to acquire a panoramic image;
- Fig. 3B is a flow chart showing the normal operation of the device of Fig. i;
- Fig. 4 shows an applicative example of the device according to the invention able to replace a screen, a keyboard and a pointing device of a computer. With reference to Figure 1 , it is shown a device 1 according to the invention and comprising a display means 1A having the aspect of a pair of glasses, a processing and power system 7 and a real keyboard 5, which allows the user's physical manual interaction with said processing and power system 7. The keyboard 5 is interconnected with the system 7 by means of a cable 6. The device 1 is connected with an external processing system, like a
(not shown) computer, by means of a data connection cable 9. The display means 1A embodies a display comprising two semi-transparent screens ID and IS, the respective geometric centres of which are set at a distance of about 7 centimetres the one from the other and the size of which is such to occupy at least part of the whole visual field of a user once he/she has worn the display means 1A. The suffixes "D" and "S" indicate right and left respectively. The propriety of semi-transparency of the two screens ID and IS allows the normal view of the real scene on which the display means 1A is able to superimpose a stereoscopic image of virtual objects. The correct stereoscopic display of the virtual object, as described more in detail in the following, provides the user with the three-dimensional perception of the virtual object itself. Two adjusting wheels 1Q, 1W allow to adjust the optical focal length of the display means 1A in order to optimally display the virtual image on the basis of the user's visual ability. The display means 1A is provided with two cameras 2D and 2S, respectively positioned above the two screens ID and IS and spaced at about 7 centimetres the one from the other. The two cameras 2D,2S take the real scene in stereoscopic way and with an observation angle equal to about 140° by 140°, that is similar to the observation angle of the human eye, with respect to an axis 1Y in the two planes respectively identified by the axes 1Y-1X and 1Y- 1Z. The display means 1A is provided with two earphones 3D and 3S and with a microphone 4 provided for the voice interaction of the user with the device 1. The earphones 3D and 3S are positioned so that, once the device 1A has been worn, they are in the immediate proximity of the user's ears, while the microphone 4 is in front or near the user's mouth.
The earphones 3D and 3S are connected with the side-pieces 11D, 11S of the display means 1A through two short cables 3A and 3B which make it easier to insert the two earphones 3D,3S into the two user's ears. The microphone 4 is integrally connected with the left side-piece 11 S of the display means 1A. A stereophonic microphone, constituted by two monophonic microphones 4D and 4S housed in the side-pieces 11D, 11S of the display means 1A, allows to acquire the ambient sound. When the display means 1A is not operating or there is no virtual image active, the user normally sees the real scene thanks to the propriety of semi-transparency of the two screens IS, ID which do not deform the real image. The screens ID and IS, instead of being semi-transparent, may be two non-transparent screens on which it is represented the stereoscopic image taken by the two cameras 2D and 2S on which the synthesised stereoscopic image of the virtual objects is electronically superimposed. The processing and power system is housed in the part 7, containing also a power feed source, preferably a battery, and is connected with a device 1A through a cable 8 and to the external processing system through a cable 9. The whole device 1 or the only display means 1A may, if necessary, be integrated into a helmet like for instance a crash or safety helmet or the like. With reference to Figure 2, the constituting components and the operation of the device 1 of Figure 1 are shown in greater detail. The images acquired by the cameras 2S and 2D are processed by a module 13 for processing stereoscopic images which reconstructs a three-dimensional image of the observed scene in real time. During a start-up configuration phase, which will be described in greater detail in the following, a plurality of three-dimensional images acquired by different angles and points of view are merged for constructing the panoramic image of the real environment surrounding the user. The image so acquired is a three-dimensional panoramic image of the scene surrounding the user taken by a plurality of points of view, said image being stored in a memory 14. For constructing the panoramic image, the invention makes use of one among the many methods of known type. The three-dimensional image is analysed by a module 15 for detecting and localising the anchoring points, as shown in greater detail in the following, which are compared with the points already recognised in the environmental panoramic image stored in the memory 14 in order to identify the visual angular sector observed by the user. The identified and localised anchoring points are used by the module 18 for processing virtual objects in order to make it possible to synthesise the stereoscopic perspective view of the virtual objects which have been activated and set by the user in the observed visual field. The stereoscopic perspective view is constituted by two perspective views which are stored in two image memories 12D and 12S and displayed to the user through the two semi-transparent screens ID and IS which are connected to the respective image memories 12D and 12S. The two perspective views are synthesised starting from the two projection centres set in the user's corneas that are assumed to be set at about one centimetre behind the geometric centres of the two screens ID and IS. In this way, it is presented to the two user's eyes a stereoscopic view of the virtual objects, said view being the same view which would be produced by a physical object similar to the virtual object and set in the same position. This causes to the user a three-dimensional perception of the displayed virtual objects. The synthesis of the two perspective views and the update in real time of the two memories 12D and 12S are performed by the module 18 for processing virtual objects starting from: a) the three-dimensional models of the virtual objects contained in the memory 16; b) the space co-ordinates, at which the virtual objects have been set by the user, and their orientation are stored in the memory 17 containing the data base of the status of the activated virtual objects; c) the position of the anchoring points acquired through the module 15 for detecting and localising the anchoring points; d) the attributes of the activated virtual objects, stored in the memory 17. The attributes of the virtual objects stored in the memory 17 provide a complete description both of the elementary visual characteristics of the single virtual objects, like for instance the colour, the lighting point, the properties of optical reflection and so on, and of the constituting features of the composite virtual objects, like for instance aspect and shape, relative position of the various component parts and so on, and of all the auxiliary information associated with the virtual objects. The lighting point of the virtual objects may be set a priori or set coincident with the lighting point of the real scene gathered during the panoramic image acquisition process. The processing and power system 7 comprises a control unit 19, a communication unit 19A and a power source 19B. The control unit 19 is in charge of all the processing activity and process synchronisation and communicates through the cable 9 with an external processing system by means of a communication unit 19A. The power source 19B feeds the whole device 1. The real keyboard 5, connected through the cable 6 to the control unit 19, allows a physical intervention on the system during the start-up configuration of the device and in case of fault situations. The external processing system receives the information, contained in the memory 17, relative to the status of the virtual objects activated, if necessary modifies it, and retransmits it to the control unit 19 which provides for updating the memory 17. This operating scheme allows the external system both to know the content of the status memory of the virtual objects and to modify its content in real time. This implements the user's virtual interaction with the external processing system. The module 21 for detecting the user's actions allows to identify the actions performed on the virtual objects by the user's fingers or by a pointing device he/she employs. To this end, the module 21 detects the objects in motion in the observed three-dimensional image acquired by the module 13 and acquires both its position and velocity. Then, said module 21 checks whether the objects in motion have validly activated a virtual object. The validity check is performed by comparing the position of the physical object in motion and the space occupation of the virtual objects and of their parts. When the space coincidence has been checked, the status of the "touched" object is modified. In order to minimise false alarms, only the actions performed by objects having a velocity included in a certain range predetermined during the start-up configuration phase of the whole device 1 are considered to be valid. An audio processing unit 20, preferably provided on the display device 1A, allows to recognise the voice commands acquired through the microphone 4 and to acquire the real sound in a stereophonic way through the microphones 4D and 4S. The processor 20 is also in charge of the synthesis of the stereophonic audio produced through the two earphones 3D and 3S by opportunely mixing the ambient sound with the voice suggestions and the sounds produced by the virtual objects. By means of voice commands acquired through the microphone 4 and a suitable voice recognition software, or through manual commands acquired through the auxiliary keyboard 5, the user can activate virtual objects, static or dynamic, having an input functionality (e.g. a keyboard, a mouse, a switch, and so on) or an output functionality (e.g. a monitor, a display, a vocabulary, and so on) or simply having an ornamental function (a picture, a carpet, and so on). During the normal operation, when the user wishes to activate a virtual object, he/she proceeds according to the following scheme: - the user gives out a voice command in the microphone 4 for activating the procedure (e.g. "new keyboard", "new screen" or "new mouse", and so on) which, when recognised by the module 20, automatically activates the three-dimensional display of the section of the panoramic constellation of the anchoring points relative to the observed angular sector, wherein each anchoring point is marked with a cross;
- the user identifies in the real visual field a physical point for setting the virtual object that he/she wishes to activate and shows it to the device by a movement of the pointer object, wherein the pointer object and the type of movement must be defined during the start-up configuration phase: the movement may be, for instance, a rhythmic movement of the right index finger like "a double mouse-click";
- the stereoscopic system identifies the finger in the scene and localises it with regard to the constellation of anchoring points, stores in the memory 17 the co-ordinates as a setting point of the new virtual object, synthesises and activates the three-dimensional display of the virtual object (the type of which had been defined in the voice command) according to the user's point of view by superimposing said three-dimensional display on the real image. Upon varying the user's observation point, the constellation of the anchoring points detected in the observed scene is compared with the panoramic constellation in order to localise the position of the device 1A with respect to the real scene. For detecting the anchoring points in the observed scene, the invention employs one of the many methods known in literature as recognition of the points of interest in a stereoscopic image, that is those points which have the maximum probability of being automatically identified upon varying the observation point of the scene. The invention employs for the three-dimensional localisation of the anchoring point one among the many stereoscopic methods well known in literature by using the cameras 2D and 2S arranged in the immediate proximity of the eyes and having an observation angular opening equal to that of the human eye. For recognising the user's voice commands and for synthesising the stereophonic audio for giving suggestions to the user, it is employed one among the many recognising and voice synthesis software well known in literature. The use of the stereophony allows to provide the user with sound suggestions coming from different directions and allows then to direct the user's attention in the space, thereby making it more intuitive his/her interaction with the device 1: this is particularly useful in environmental panoramic acquisition phase. Besides, the same mechanism allows to perceive the sounds produced by the virtual objects coherently with their physical settings. With reference to the figure 3A, it is described the operation mode concerning the start-up configuration of the device 1 aimed to acquire the panoramic image starting at the step 100A and ending at the step 101A. In said operation mode, the device 1 performs a stereoscopic image processing loop aimed to acquire a three-dimensional panoramic image of the scene surrounding the user and taken by a plurality of points of view. The image is stored in the memory 14. During the loop, the following processes are performed. At step 22 a stereoscopic image of the real scene is acquired and a corresponding three-dimensional image is constructed. At step 23 the constructed three-dimensional images are merged in order to obtain a complete panoramic image of the environment surrounding the user. The loop is performed a plurality of times until the device 1 recognises that a complete acquisition of the panoramic image surrounding the user has been completed (step 24). The optical scanning of the environment is performed by the user in a way guided by a voice suggestion process (step 25) which provides the user with indications about moving his/her head like for instance "rotate to the right", "rotate to the left", "rotate upwards", "rotate downwards", "bent to the right", "bent to the left" and so on. In this way, the user wearing the display means 1A performs an angular scanning of the environment surrounding him/her around the O 2005/017729 15 three axes IZ, IX and 1Y, shown in Figure 1, respectively. The three-dimensional environmental panoramic image is acquired by merging the three-dimensional images which partially overlap: the overlapping zones are used for their alignment and connection. The angular amplitude of the panoramic scanning depends on the application's uses of the device; for instance, for the application of virtual desk-top, a scanning of the head by 90° around the two axes IX and IZ and by 60° around the axis 1Y is considered to be sufficient. Besides, by means of further voice commands, like for instance "move to the right", "move to the left", "move upwards", "move downwards", "move ahead", "move backwards", the user is invited to move the shooting point of the visual field by performing movements of the head along, respectively, the three axes IX, IZ and 1Y. In this way the user wearing the device performs a linear scanning of the environment surrounding him along the three reference axes of the device. Through the linear scanning, performed after the angular scanning, the three-dimensional panoramic image is enriched with particulars visible only from some points of view and not from others. For the application of virtual desk-top, a linear scanning of the head equal to 30 centimetres along the three axes IX, 1Y and IZ is considered to be sufficient. At the end of the panoramic image acquisition it is performed the detection and the storing (step 25B) of the constellation of anchoring points in the panoramic image, which is stored together with the panoramic image itself in the memory 14. With reference to the figure 3B, it is described the operation mode called "normal operation" starting at step 100B and ending at step 10 IB. During the normal operation the device 1 performs a loop of the following procedures: at step 26 a stereoscopic image is acquired and a corresponding three-dimensional image is constructed. At step 27 some anchoring points are detected and localised. At step 28 the memory 17 of the status of the virtual objects is updated; at step 29 the position, the velocity and the shape of objects possibly in motion in the observed scene are detected. If the velocity of the object is considered to be valid (step 30), that is to say if said velocity belongs to the range of allowable values defined during the configuration phase of the device 1, and if the shape of the object is valid, that is to say if the shape of the object belongs to a collection of shapes defined during the start-up configuration phase of the device 1, then the control of the object is considered to be valid. In this case, it is checked (step 37) whether the voice recognition process has recognised a voice command for activating a new virtual object. If no new virtual object has been activated, the position of the object in motion is compared (step 31) with the space occupation of the virtual objects and of their parts: if the space coincidence has been verified, the analysed movement of the real object is considered to be valid and, as a consequence, the status of the "touched" object, or of its constituting part, is modified (step 32). On the contrary, in case a new virtual object has been created at step 37, at step 36 the position of the object in motion is compared with the position of the anchoring points of the panoramic constellation. If the space coincidence has been checked, the analysed movement of the real object is considered to be valid and, as a consequence, the "touched" anchoring point is used for setting the new virtual object and the memory 17 is updated. The user may adjust the orientation of the virtual object as well. At step 33 the memories 12S, 12D of the two displays IS, ID are updated and it is then performed (step 34) the process of communicating the status of the virtual objects to the external processing system. The status is received by the external processing system which, after having modified it if necessary, transmits it updated to the device 1 which, in the next loop, will use it for displaying the virtual objects in an updated way. Finally, at step 35, it is performed a control of the on-off status of the device 1 for deactivating it if required. The control is performed by analysing both a virtual switch (if its display has been activated by the user), and a real switch-on key existing in the real keyboard 5. With reference to Figure 4, it is now shown a scheme of an application example of the invention, wherein it is used a man-machine interface device as an alternative of a traditional interface device constituted by a screen, by an electro-mechanical keyboard and by a touchpad of the type used in the existing computers of the note-book type. The interface device 60 comprises a device 1 as previously described and a plastic tablet 50, for instance of the type used for filling in paper forms, equipped with coloured adhesive edges. The display means 1A is connected with a standard port VGA of an external processing system 59 constituted by a personal computer of the note-book type. Also the pair of stereoscopic cameras 2D,2S mounted on the display means 1A are connected with the personal computer 59 by means of standard ports, for instance USB or Video, of the computer. During the start-up configuration phase of the interface device, on the tablet 50 may be defined three areas 52, 54 and 56 on which are respectively displayed three virtual objects: a monitor of the touch-screen type, a virtual paper module, of which the personal computer 59 provides the bitmap image for acquiring in the personal computer notes on virtual paper, and a touch-pad. The areas 52 and 56 are analysed for providing the co-ordinates x,y of a pointing means, preferably a pen held by the user, when the pointing means touches the tablet 50. The interface device 60 allows to:
- view as a virtual object the screen of the computer 59 inside the area 52 of the tablet in a perspective way with respect to the user and upon varying the position of the tablet with respect to the user;
- input data in the area 52 by the user as it normally occurs for the "touch- screens", therein being included the possibility of activating a virtual keyboard: the data input occurs by recognising and tracking the movement of a red pencil, or of another most suitable colour, held by the user;
- move the pointer by means of the virtual touch-pad displayed in the area 56 in a perspective way as well. For operating the interface device 60 there are further provided a Visual Processing software and an Image Processing software. The Visual Processing software, preferably written in MATLAB and /or C language in order to allow its compilation in a Windows environment, comprises the following code segments: a) a code segment for acquiring the images from the pair of stereoscopic cameras 2S,2D provided on the display means 1A; b) a code segment for segmenting the tablet 50 from the background on which it rests; c) a code segment for segmenting the pen on the tablet 50; d) a code segment for segmenting other objects on the tablet like typically the user's hands; e) a code segment for the perspective projection of the virtual screen and of the touch-pad in the respective rectangular areas 52,56 of the tablet 50, wherein in the virtual display 52 it is displayed the image normally present on the screen of the computer 59, the bitmap of which is acquired through a suitable video redirection driver, and, superimposed, the image of the objects superimposed on the area 52 of the tablet like for instance the pen and/or the operator's hands, while in the area 56 it is displayed the image of the touch-pad where it is superimposed the image of the pencil, if existing: f) a code segment for sending the image synthesised at stage e) to the video redirection driver; g) a code segment for acquiring the co-ordinates x,y of the tip of the pen when it "touches" the tablet in the area 52 or in the area 56; h) a code segment for sending to the operating system through a suitable pointing driver the absolute co-ordinates x,y of the pen in the area 52, the information coming from the virtual touch-pad like the relative movement along the axes x and y, and the virtual pressure of one of the three keys. The key pressure will be simulated by a circular movement to the right, or other movement of the tip of the pen, on the key which is considered to be more favourable. The information displayed and acquired by the Visual Processing software must be interfaced to the operating system of the personal computer 59. To this end are provided the two above mentioned drivers: the pointing driver and the video redirect driver. The pointing driver provides the operating system, and then the applications, with: the absolute co-ordinates x,y of the tip of the pen held by the user, when the pen is set inside the tablet in the area 52; the relative movement along x and y of the tip of the pen held by the user, when the pen is set inside the tablet in the area 56; the "pressure" on the three keys displayed in the area 56; the image of the area 54 of the tablet taken by the camera in bitmap format, for possible applicative uses. In turn, the video redirect driver sends to the occlusive glasses, through the standard VGA port, the image synthesised by the Visual Processing software according to the previous point f) and sends to the Visual Processing software the screen bitmap generated by the operating system. It is clear that the above description is purely given as a non-limitative example and that variants and modifications are possible without departing from the field of protection of the present invention. For instance, further devices different from the user's fingers may be provided for activating virtual objects, like for instance the tip of a pencil or an automatic mechanism, and also different sequences and velocities of stereotyped movements for recognising the validity of an action on a virtual object. By the term "user" it is not therefore necessarily meant a human being. The connection between the processing and power system 7 and the display means 1A can be made via radio instead of via cable. In this case the electric power source for the display means 1A may be housed in the side-pieces 1 1S, 11D of the display means 1A, while the power source of the processing system will be housed in the part 7. Likewise, the connection between the processing and feed system 7 of the device 1 and the external processing system will be made wireless as well, so that the external processing system may be also constituted by a processing system of fixed type arranged remotely with respect to the user. The panoramic image acquisition phase may be made superfluous by previously storing the image in the external system and by transferring the same in the device 1 during an automatic start-up phase. In this way the operative condition of the invention is that of a user which, without needing to perform a system configuration, interacts with different virtual objects according to the physical environment where he/she is: the virtual objects are activated by the external processing system according to the objectives predetermined by the system itself. Possible examples of applications are the virtual office, the assistance in the visit of museums and libraries, the assistance in the diagnostics and in the intervention both in the surgery field and in the apparatus and manufactured products field. The display of the virtual images may be shared among many users, each wearing a display 1: this operating mode is obtained by updating simultaneously and identically the memory of the status of the virtual objects 17 existing in the different devices. The update may be performed by the external processing system through the communication channel 9. The shared manipulation of virtual objects may find, for instance, an application in the group work for designing a system, a machine, for the diagnostics, for the didactics and so on.

Claims

Claims
1. Interface device with a processing system (59) by manipulating virtual objects, said device comprising: - means for generating at least one virtual object; - display means (1A) wearable on the head of a user and comprising at least one screen (ID, IS) for making it possible to display said at least one virtual object simultaneously with the scene observed by the user of said device (1); - means for associating said at least one virtual object to a physical point, or anchoring point, of said observed scene; - means (21) for detecting an interaction of said user with said at least one virtual object or an information coming from said processing system (59); - means (17) for modifying the status of said at least one virtual object on the basis of said interaction /information; - a control unit (19) for controlling said at least one virtual object so that it is kept in a fixed space relationship with said physical point and follows the same optical laws of said physical point to which it is associated, whereby in particular said at least one virtual object is not displayed if the physical point to which it is associated is not observed by said user.
2. Device according to claim 1 , wherein said display means (1A) comprise two cameras (2S,2D) for acquiring stereoscopic images of said real scene and said device (1) further comprises: - a processing module (13) for processing said stereoscopic images acquired by said two cameras (2S,2D) and synthesised in a three-dimensional panoramic image storable in an image memory (14); - a module (15) for analysing said panoramic image in order to detect and to localise the anchoring points existing in said panoramic image; - a module (18) for processing virtual objects in order to synthesise the view, on said at least one screen (ID, IS) of said display means (1A), of possible virtual objects associated with the anchoring points detected in said panoramic image.
3. Device according to claim 2, wherein said display means (1A) have the shape of glasses comprising means (3D,3S,4) for receiving and/or imparting voice commands to said device (1), means (4D,4S) for acquiring the ambient sound surrounding said real scene and an audio processor (20) for processing said voice commands and/ or sounds.
4. Device according to claim 3, wherein said at least one virtual object is described by a three-dimensional model stored in a suitable memory (16) of said device, said model describing the attributes of said virtual object, preferably aspect, shape, colour, lighting point, property of reflex lighting, relative position of the different parts constituting said virtual object and possible auxiliary information.
5. Device according to claim 4, wherein said device comprises a keyboard (5), connected with said control unit (19), for allowing a physical intervention on the device during a start-up configuration phase of said device (1) and in case of fault situations.
6. Device according to claim 4, wherein said virtual object is a keyboard, a screen (52), a data input area (54), a touch-pad (56) or a switch.
7. Device according to any one of the preceding claims, wherein said device (1) communicates with said processing system (59) by means of a cable or, through a unit communication (19A), wireless.
8. Device according to any one of the preceding claims, wherein said display means communicates with said control unit (19) by means of a cable (8) or, through a unit communication (19A), wireless.
9. Device according to any one of the preceding claims, wherein said device (1) comprises an electric power source (19B), preferably a battery.
10. Device according to any one of the preceding claims, wherein said device (1) is integrated into a helmet.
11. Interface method with a processing system (59) by manipulating virtual objects, said method comprising the steps of: a) generating one virtual object; b) associating said at least one virtual object to a physical point, or anchoring point, of said real scene; c) displaying on at least one screen (ID, IS) of a display means (1A) wearable on the head of a user said at least one virtual object simultaneously with the scene observed by said user; d) detecting an interaction of said user with said at least one virtual object or an information coming from said processing system (59) and, in case an interaction/ information has been detected, modifying the status of said at least one virtual object on the basis of said interaction/ information; e) controlling said at least one virtual object so that it is kept in a fixed space relationship with said physical point and follows the same optical laws of said physical point to which it is associated, whereby in particular said virtual object is not displayed if the physical point to which it is associated is not observed by the user.
12. Method according to claim 1 1, wherein said step c) comprises the steps of: al) acquiring and processing stereoscopic images of the real scene; a2) synthesising said stereoscopic images in a three-dimensional panoramic image and storing it in an image memory; a3) analysing said panoramic image for detecting and localising the anchoring points existing in said panoramic image; a4) synthesising the view of the virtual objects associated with the anchoring points detected in said panoramic image on said at least one screen (IS, ID).
13. Method according to claim 12, wherein said interaction of said user consists in manipulating said virtual object with the hand or with a pointing device, preferably a pen.
14. Method according to claim 13, wherein said interaction is considered to be valid if the velocity of the entity in motion generating the interaction is comprised in a predetermined range of allowable values and if the shape of the entity belongs to a collection of predefined shapes.
15. Method according to claim 10, wherein said at least one virtual object is generated by means of a voice command or a keyboard (5).
16. Method according to claim 10, wherein before said step a), it is provided the step of acquiring a complete panoramic image of the environment surrounding the user.
17. Method according to any one of the claims 11 to 17, wherein said observed scene is a predefined environmental scenario, preferably a museum or library room or an operating theatre.
PCT/IB2004/002667 2003-08-19 2004-08-13 Interface method and device between man and machine realised by manipulating virtual objects WO2005017729A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
ITTO20030640 ITTO20030640A1 (en) 2003-08-19 2003-08-19 MAN INTERFACE SYSTEM - MACHINE USING
ITTO2003A000640 2003-08-19

Publications (2)

Publication Number Publication Date
WO2005017729A2 true WO2005017729A2 (en) 2005-02-24
WO2005017729A3 WO2005017729A3 (en) 2005-06-16

Family

ID=34179350

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2004/002667 WO2005017729A2 (en) 2003-08-19 2004-08-13 Interface method and device between man and machine realised by manipulating virtual objects

Country Status (2)

Country Link
IT (1) ITTO20030640A1 (en)
WO (1) WO2005017729A2 (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102005046762A1 (en) * 2005-09-29 2007-04-05 Siemens Ag System and method for displaying user information, in particular augmented reality information, with the aid of tracking information stored in RFID data memories
WO2012089576A1 (en) * 2010-12-30 2012-07-05 Danmarks Tekniske Universitet System and device with three-dimensional image display
US8625200B2 (en) 2010-10-21 2014-01-07 Lockheed Martin Corporation Head-mounted display apparatus employing one or more reflective optical surfaces
US8781794B2 (en) 2010-10-21 2014-07-15 Lockheed Martin Corporation Methods and systems for creating free space reflective optical surfaces
DE102013206173A1 (en) * 2013-04-09 2014-10-09 Bayerische Motoren Werke Aktiengesellschaft Selection of individual elements for display on data glasses
EP2427812A4 (en) * 2009-05-08 2016-06-08 Kopin Corp Remote control of host application using motion and voice commands
US9507772B2 (en) 2012-04-25 2016-11-29 Kopin Corporation Instant translation system
US9632315B2 (en) 2010-10-21 2017-04-25 Lockheed Martin Corporation Head-mounted display apparatus employing one or more fresnel lenses
US9720228B2 (en) 2010-12-16 2017-08-01 Lockheed Martin Corporation Collimating display with pixel lenses
US9939650B2 (en) 2015-03-02 2018-04-10 Lockheed Martin Corporation Wearable display system
US9995936B1 (en) 2016-04-29 2018-06-12 Lockheed Martin Corporation Augmented reality systems having a virtual image overlaying an infrared portion of a live scene
US10013976B2 (en) 2010-09-20 2018-07-03 Kopin Corporation Context sensitive overlays in voice controlled headset computer displays
US10359545B2 (en) 2010-10-21 2019-07-23 Lockheed Martin Corporation Fresnel lens with reduced draft facet visibility
US10474418B2 (en) 2008-01-04 2019-11-12 BlueRadios, Inc. Head worn wireless computer having high-resolution display suitable for use as a mobile internet device
US10627860B2 (en) 2011-05-10 2020-04-21 Kopin Corporation Headset computer that uses motion and voice commands to control information display and remote devices
US10684476B2 (en) 2014-10-17 2020-06-16 Lockheed Martin Corporation Head-wearable ultra-wide field of view display device
US10754156B2 (en) 2015-10-20 2020-08-25 Lockheed Martin Corporation Multiple-eye, single-display, ultrawide-field-of-view optical see-through augmented reality system
CN113196209A (en) * 2018-10-05 2021-07-30 奇跃公司 Rendering location-specific virtual content at any location
US11114199B2 (en) 2018-01-25 2021-09-07 Mako Surgical Corp. Workflow systems and methods for enhancing collaboration between participants in a surgical procedure
US11468111B2 (en) 2016-06-01 2022-10-11 Microsoft Technology Licensing, Llc Online perspective search for 3D components

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113849112B (en) * 2021-09-30 2024-04-16 西安交通大学 Augmented reality interaction method, device and storage medium suitable for power grid regulation and control

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020075286A1 (en) * 2000-11-17 2002-06-20 Hiroki Yonezawa Image generating system and method and storage medium
GB2376397A (en) * 2001-06-04 2002-12-11 Hewlett Packard Co Virtual or augmented reality
US20030032484A1 (en) * 1999-06-11 2003-02-13 Toshikazu Ohshima Game apparatus for mixed reality space, image processing method thereof, and program storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030032484A1 (en) * 1999-06-11 2003-02-13 Toshikazu Ohshima Game apparatus for mixed reality space, image processing method thereof, and program storage medium
US20020075286A1 (en) * 2000-11-17 2002-06-20 Hiroki Yonezawa Image generating system and method and storage medium
GB2376397A (en) * 2001-06-04 2002-12-11 Hewlett Packard Co Virtual or augmented reality

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MATTHIAS M. WLOKA, BRIAN G. ANDERSON: "Resolving Occlusion in Augmented Reality" 12 April 1995 (1995-04-12), PROCEEDINGS OF THE 1995 SYMPOSIUM ON INTERACTIVE 3D GRAPHICS , MONTEREY, CALIFORNIA, USA , XP002324602 pages 5-12 *

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102005046762A1 (en) * 2005-09-29 2007-04-05 Siemens Ag System and method for displaying user information, in particular augmented reality information, with the aid of tracking information stored in RFID data memories
US10474418B2 (en) 2008-01-04 2019-11-12 BlueRadios, Inc. Head worn wireless computer having high-resolution display suitable for use as a mobile internet device
US10579324B2 (en) 2008-01-04 2020-03-03 BlueRadios, Inc. Head worn wireless computer having high-resolution display suitable for use as a mobile internet device
EP2427812A4 (en) * 2009-05-08 2016-06-08 Kopin Corp Remote control of host application using motion and voice commands
US10013976B2 (en) 2010-09-20 2018-07-03 Kopin Corporation Context sensitive overlays in voice controlled headset computer displays
US8781794B2 (en) 2010-10-21 2014-07-15 Lockheed Martin Corporation Methods and systems for creating free space reflective optical surfaces
US9632315B2 (en) 2010-10-21 2017-04-25 Lockheed Martin Corporation Head-mounted display apparatus employing one or more fresnel lenses
US8625200B2 (en) 2010-10-21 2014-01-07 Lockheed Martin Corporation Head-mounted display apparatus employing one or more reflective optical surfaces
US10359545B2 (en) 2010-10-21 2019-07-23 Lockheed Martin Corporation Fresnel lens with reduced draft facet visibility
US10495790B2 (en) 2010-10-21 2019-12-03 Lockheed Martin Corporation Head-mounted display apparatus employing one or more Fresnel lenses
US9720228B2 (en) 2010-12-16 2017-08-01 Lockheed Martin Corporation Collimating display with pixel lenses
WO2012089576A1 (en) * 2010-12-30 2012-07-05 Danmarks Tekniske Universitet System and device with three-dimensional image display
US11947387B2 (en) 2011-05-10 2024-04-02 Kopin Corporation Headset computer that uses motion and voice commands to control information display and remote devices
US11237594B2 (en) 2011-05-10 2022-02-01 Kopin Corporation Headset computer that uses motion and voice commands to control information display and remote devices
US10627860B2 (en) 2011-05-10 2020-04-21 Kopin Corporation Headset computer that uses motion and voice commands to control information display and remote devices
US9507772B2 (en) 2012-04-25 2016-11-29 Kopin Corporation Instant translation system
DE102013206173A1 (en) * 2013-04-09 2014-10-09 Bayerische Motoren Werke Aktiengesellschaft Selection of individual elements for display on data glasses
US10684476B2 (en) 2014-10-17 2020-06-16 Lockheed Martin Corporation Head-wearable ultra-wide field of view display device
US9939650B2 (en) 2015-03-02 2018-04-10 Lockheed Martin Corporation Wearable display system
US10754156B2 (en) 2015-10-20 2020-08-25 Lockheed Martin Corporation Multiple-eye, single-display, ultrawide-field-of-view optical see-through augmented reality system
US9995936B1 (en) 2016-04-29 2018-06-12 Lockheed Martin Corporation Augmented reality systems having a virtual image overlaying an infrared portion of a live scene
US11468111B2 (en) 2016-06-01 2022-10-11 Microsoft Technology Licensing, Llc Online perspective search for 3D components
US11114199B2 (en) 2018-01-25 2021-09-07 Mako Surgical Corp. Workflow systems and methods for enhancing collaboration between participants in a surgical procedure
US11850010B2 (en) 2018-01-25 2023-12-26 Mako Surgical Corp. Workflow systems and methods for enhancing collaboration between participants in a surgical procedure
CN113196209A (en) * 2018-10-05 2021-07-30 奇跃公司 Rendering location-specific virtual content at any location

Also Published As

Publication number Publication date
ITTO20030640A1 (en) 2005-02-20
WO2005017729A3 (en) 2005-06-16

Similar Documents

Publication Publication Date Title
WO2005017729A2 (en) Interface method and device between man and machine realised by manipulating virtual objects
US11875013B2 (en) Devices, methods, and graphical user interfaces for displaying applications in three-dimensional environments
US9377869B2 (en) Unlocking a head mountable device
US20040095311A1 (en) Body-centric virtual interactive apparatus and method
US11853527B2 (en) Devices, methods, and graphical user interfaces for providing computer-generated experiences
WO2013028268A1 (en) Method and system for use in providing three dimensional user interface
US11314396B2 (en) Selecting a text input field using eye gaze
JP2006301654A (en) Image presentation apparatus
CN110291577B (en) Method, device and system for enhancing augmented reality experience of user
US20210303107A1 (en) Devices, methods, and graphical user interfaces for gaze-based navigation
US11934569B2 (en) Devices, methods, and graphical user interfaces for interacting with three-dimensional environments
US10656705B2 (en) Assisted item selection for see through glasses
WO2019121654A1 (en) Methods, apparatus, systems, computer programs for enabling mediated reality
Alcañiz et al. Technological background of VR
WO2020188721A1 (en) Head-mounted information processing device and head-mounted display system
WO2018201150A1 (en) Control system for a three dimensional environment
US20240104849A1 (en) User interfaces that include representations of the environment
KR102628667B1 (en) Virtual reality interface system for imitating solar revolution system
US20240103680A1 (en) Devices, Methods, and Graphical User Interfaces For Interacting with Three-Dimensional Environments
US20240061547A1 (en) Devices, Methods, and Graphical User Interfaces for Improving Accessibility of Interactions with Three-Dimensional Environments
US20240104871A1 (en) User interfaces for capturing media and manipulating virtual objects
US20240103678A1 (en) Devices, methods, and graphical user interfaces for interacting with extended reality experiences
US20240103677A1 (en) User interfaces for managing sharing of content in three-dimensional environments
US20240103679A1 (en) Devices, Methods, and Graphical User Interfaces for Interacting with Three-Dimensional Environments
WO2024039666A1 (en) Devices, methods, and graphical user interfaces for improving accessibility of interactions with three-dimensional environments

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase