WO2006079951A1 - Mobile telecommunications device - Google Patents

Mobile telecommunications device Download PDF

Info

Publication number
WO2006079951A1
WO2006079951A1 PCT/IB2006/050201 IB2006050201W WO2006079951A1 WO 2006079951 A1 WO2006079951 A1 WO 2006079951A1 IB 2006050201 W IB2006050201 W IB 2006050201W WO 2006079951 A1 WO2006079951 A1 WO 2006079951A1
Authority
WO
WIPO (PCT)
Prior art keywords
optical element
lens
portable device
image sensor
camera
Prior art date
Application number
PCT/IB2006/050201
Other languages
French (fr)
Inventor
Cornelis Janse
Harm Belt
Original Assignee
Koninklijke Philips Electronics N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics N.V. filed Critical Koninklijke Philips Electronics N.V.
Publication of WO2006079951A1 publication Critical patent/WO2006079951A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/60Substation equipment, e.g. for use by subscribers including speech amplifiers
    • H04M1/6033Substation equipment, e.g. for use by subscribers including speech amplifiers for providing handsfree use or a loudspeaker mode in telephone sets
    • H04M1/6041Portable telephones adapted for handsfree use
    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B13/00Optical objectives specially designed for the purposes specified below
    • G02B13/001Miniaturised objectives for electronic devices, e.g. portable telephones, webcams, PDAs, small digital cameras
    • G02B13/0055Miniaturised objectives for electronic devices, e.g. portable telephones, webcams, PDAs, small digital cameras employing a special optical element
    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B13/00Optical objectives specially designed for the purposes specified below
    • G02B13/06Panoramic objectives; So-called "sky lenses" including panoramic objectives having reflecting surfaces
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/142Constructional details of the terminal equipment, e.g. arrangements of the camera and the display
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/02Constructional features of telephone sets
    • H04M1/0202Portable telephone sets, e.g. cordless phones, mobile phones or bar type handsets
    • H04M1/0206Portable telephones comprising a plurality of mechanically joined movable body parts, e.g. hinged housings
    • H04M1/0208Portable telephones comprising a plurality of mechanically joined movable body parts, e.g. hinged housings characterized by the relative motions of the body parts
    • H04M1/0214Foldable telephones, i.e. with body parts pivoting to an open position around an axis parallel to the plane they define in closed position
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/02Constructional features of telephone sets
    • H04M1/0202Portable telephone sets, e.g. cordless phones, mobile phones or bar type handsets
    • H04M1/0206Portable telephones comprising a plurality of mechanically joined movable body parts, e.g. hinged housings
    • H04M1/0208Portable telephones comprising a plurality of mechanically joined movable body parts, e.g. hinged housings characterized by the relative motions of the body parts
    • H04M1/0235Slidable or telescopic telephones, i.e. with a relative translation movement of the body parts; Telephones using a combination of translation and other relative motions of the body parts
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/02Constructional features of telephone sets
    • H04M1/03Constructional features of telephone transmitters or receivers, e.g. telephone hand-sets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2250/00Details of telephonic subscriber devices
    • H04M2250/52Details of telephonic subscriber devices including functional features of a camera
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/142Constructional details of the terminal equipment, e.g. arrangements of the camera and the display
    • H04N2007/145Handheld terminals

Definitions

  • This invention relates generally to a mobile telecommunications device and, more particularly, to a mobile telecommunications device with an image capture device for capturing images within its field of view and hands free functionality to support applications such as teleconferencing videoconferencing, and the like.
  • the hands-free speech feature of mobile telephones is of increasing importance to the market, and algorithms are known for removing echo and stationary noise in hands-free telecommunications applications, which allow for full-duplex speech communication whereas previously only half-duplex systems have been possible.
  • acoustic echo cancellation and noise suppression are required.
  • Single-microphone noise suppression algorithms based on spectral subtraction and minimum-mean-square error have long been known. Improved noise suppression can be achieved with multi-microphone solutions where spatial selectivity is exploited, by means of which reverberation (reflected sound waves), can also be suppressed.
  • a unique two- microphone noise suppression algorithm for mobile telephones has been proposed, which works when the mobile telephone is used in handset mode, meaning that the user holds the phone against the ear, and typically achieves 15 dB suppression of ambient noises without disturbing distortion of speech.
  • the proposed algorithm is based on an adaptive beamformer as described in WO99/27522 and a specific spectral post-processor.
  • the mobile telephone lies on a surface (e.g. a table or desk) between several people.
  • a surface e.g. a table or desk
  • Signals that need to be suppressed come from all other directions.
  • the hands-free scenario one cannot make any assumption about the direction relative to the microphones from which the desired speech comes. A wrong assumption will lead to suppression of desired speech and enhancement of noises and reverberation, while clearly the opposite is desired.
  • an adaptive beamformer for speech enhancement needs to know where the participants are (e.g. under which angle and/or which tilt) in order to spatially enhance desired speech signals and spatially suppress undesired signals (noises and reverberation).
  • There exist techniques to estimate the position of participants using microphone signals see for example, WO00/28740. However, these techniques only work when participants speak, and one still cannot distinguish between desired and undesired sound events.
  • To have means for achieving position data of participants from video is beneficial. With such position information an adaptive beamformer, such as that of WO99/27522 can be given good initial coefficients when a participant starts to talk. With a good set of initial coefficients an in-beam speech activity detector can properly work, the audio beamformer can further adapt for a larger speech quality improvement, and it can track small movements of the participant by itself.
  • Position information from video can be achieved by several means.
  • an economic method to localize participants is to use a skin color detection algorithm on video pixels combined with a smart clustering algorithm.
  • the skin detection is done by comparison of the color of individual pixels in an image with histogram data for skin and non-skin classes obtained from a large training set.
  • the histogram data can be put in a look-up table (economic for mobile telephone applications), and the colors can also be economically quantized.
  • Alternative or complementary solutions for people localization involve face recognition technologies, object motion estimation, edge detection, still background subtraction, and the like.
  • the difficulty with video capture in the hands-free mobile telephone scenario is that, typically, the camera faces the ceiling and the participants are not located in the camera's field of view. Even if the mobile telephone is positioned on the table in such a way that it can see one of the participants, then in general other participants will not be seen.
  • US Patent Application Publication US2004/0057622 describes the use of a 360- degree view camera to generate images of faces within the camera field of view and identify the location of eyes in the respective faces.
  • this arrangement would not be suitable for use in a mobile telecommunications application, because the resultant image is heavily distorted so that, it is suitable for determining the location of people within the 360-degree field of view, but not for the normal function of providing images of the subject(s) of interest.
  • specific processing techniques for at least partly undoing the distortion or "morphing" are known, these processing techniques are less desirable within the limited processing and power consumption provisions in mobile telecommunications applications.
  • a portable device comprising a camera for capturing images within its field of view, the camera comprising an image sensor and a lens for focusing light signals received from within said field of view onto said image sensor so as to generate respective image data, wherein an optical element is provided for collecting and directing onto said image sensor via said lens, light signals derived from several directions within a substantially radial area surrounding said camera, said optical element being mounted relative to said lens so as to be movable selectively to and from an operating position in which it is located between the image sensor and said light signals.
  • the portable device is beneficially a telecommunications device.
  • the telecommunications device can be advantageously used in a hands-free mobile teleconferencing application to provide position (angle and/or tilt) information of participants to a multi-microphone speech enhancement algorithm, leading to a largely increased robustness of the speech enhancement (background noise suppression and dereverberation).
  • said image sensor comprises a plurality of sections, each of which is associated with a unique direction within said area, such that the direction from which a light signal and corresponding image data originates can be determined from the section of said image sensor on which said light signal is incident.
  • image data can be captured in respect of a wide-angled, e.g. 180° or even 360°, field of view, and the direction from which such image data originates can be relatively easily determined.
  • This information can then be passed to another application, such as an application for determining the direction from which speech signals may originate, which information can then be fed to a noise suppression and/or echo cancellation application.
  • the optical element may be movably mounted on the telecommunications device on, for example, a hinged or slidable mounting arrangement, which is movable relative to the lens.
  • the optical element may be mounted on the cover, such that when the flip phone is closed, the optical element is located over the lens, between it and the light signals, whereas when it is open, the optical element is away from the lens, and the camera can be used in a normal mode of operation.
  • the optical element may comprise a light reflecting or diffracting element, such as a conical or parabolic shaped mirror.
  • the optical element may be embedded in a body of light conducting (i.e. transparent) material, the body of light conducting material having substantially the same shape and size as the camera lens. Sliding a device across the camera lens has the advantage that it can be used for protection of the lens (e.g. against water).
  • a specific color filter passing mostly skin colors can be placed in front of the lens to advantageously achieve an improved robustness of the skin color detector (independently of the light conditions in the room).
  • Detected people at a large distance e.g. when using skin color detection then a person far away leads to only a small cluster of skin pixels
  • the video localization algorithm can be intentionally ignored by the video localization algorithm.
  • the multi-microphone hands-free solution can be used in office environments where only the speech of people close to the mobile is enhanced and all other signal components (noises, reverberation, speech from people further away) are suppressed.
  • Figure 1 is a schematic front view of a mobile telephone according to an exemplary embodiment of the present invention
  • Figure 2 is a schematic illustration of an optical element for use in a device according to an exemplary embodiment of the present invention
  • Figure 3 is a schematic illustration of an optical element for use in a device according to another exemplary embodiment of the present invention
  • Figure 4 is a schematic illustration of an optical element for use in a device according to yet another exemplary embodiment of the present invention.
  • Figure 5 is a schematic side view of a mobile telephone according to an exemplary embodiment of the present invention.
  • Figure 6 is a schematic diagram illustrating a wide-angled projected image captured on an image sensor in a device according to an exemplary embodiment of the present invention.
  • the present invention involves the provision of a light reflecting and/or diffracting device which, when placed over the lens of a camera incorporated in a mobile telecommunications device, causes light from all relevant directions (e.g. all directions from which desired speech signals may originate, in the case of a videoconferencing application) to be "bundled" onto the camera lens in such a way that each projected point on the camera image sensor corresponds to a unique direction.
  • all relevant directions e.g. all directions from which desired speech signals may originate, in the case of a videoconferencing application
  • the light reflecting and/or diffracting device is beneficially designed to provide a viewing angle which is sufficient to capture all potential participants in a proposed video conference. Ideally, a 360° view around the camera would be provided, although obviously, it would suffice to have a more limited view if it is known that all potential participants are, for example, located on one side of the device.
  • a mobile telephone 80 comprises a key entry section 82 which comprises a number of button switches 83 for dial entry and other functions.
  • a display device 85 is disposed above the key entry section 82.
  • An image capture device is incorporated in the mobile telephone 80, only the outer lens 86 of which is visible generally centrally between the key entry section 82 and the display section 85.
  • First and second microphones 88a, 88b, located at opposite ends of the mobile telephone 80, are provided for receiving audio signals from the surrounding area.
  • An optical element 89 comprising a light reflecting and/or diffracting element, is mounted on the mobile telephone 80 by means of a mechanical slider arrangement 90, which can be slidably moved relative to the mobile telephone 80 back and forth in a longitudinal direction, as indicated by arrow A, so as to move the optical element 89 selectively into and away from a position in which it is located over the camera lens 86.
  • the optical element may comprise a conical mirror 89a, the apex of which faces the camera lens 86, in use, which enables light 91 to be captured from all directions in a 360° field of view.
  • a parabolic mirror 89b may be employed to provide the desired 360° field of view.
  • Many other different types of light reflecting and/or diffracting elements are envisaged which could be used to provide the desired circular or partially circular field of view to achieve the object of the present invention, and these will be apparent to a person skilled in the art such that the present invention is not necessarily intended to be limited in this regard.
  • the outer surface 92a, 92b of the respective optical element 89a, 89b may be coated or otherwise provided with a color filter for selective transmission of certain colors, say, skin tones, such that a video person localization algorithm based on the recognition of skin colors becomes relatively more robust against varying lighting conditions.
  • a video person localization algorithm based on the recognition of skin colors becomes relatively more robust against varying lighting conditions.
  • Various types of person localization algorithm will be well known to a person skilled in the art, and one realization thereof, based on the recognition of skin colors, is described in detail in MJ. Jones & J.M. Rehg, "Statistical Color Models with Application to Skin Tone Detection", Int. J. Computer Vision, 46(l):81-96, Jan 2002.
  • the optical element 89b (in this case, a parabolic mirror) may be embedded in a block 93 of light conducting (i.e. transparent) material, which not only protects the optical element 89b, but also enables the entire arrangement to serve as a camera lens protector since it can be shaped and configured to cover the entire camera lens 86, in use.
  • a block 93 of light conducting (i.e. transparent) material which not only protects the optical element 89b, but also enables the entire arrangement to serve as a camera lens protector since it can be shaped and configured to cover the entire camera lens 86, in use.
  • a hinged cover 200 is provided to open and close the telephone 180.
  • the optical element 189 may be mounted on the cover 200 such that, when the cover 200 is closed, the optical element 189 is over the camera lens 186, whereas, when the cover 200 is open, the optical element 189 is away from the camera lens 186, allowing the camera to be used in a normal mode of operation.
  • FIG. 6 of the drawings four distinct sections 300a, b, c, and d of a camera sensor are depicted in respect of which a 360° morphed or distorted image is projected.
  • Two faces 301, 302 can be determined by, for example, using a skin tone detection algorithm and are found, in this case, at respective angles ⁇ and ⁇ .
  • the small face 303 in the bottom right-hand sector of the image sensor is intentionally ignored, because due to the head size, the person is judged to be too far away to be a legitimate participant in the videoconference call. Consequently, the audio beamformer referred to above would suppress sounds coming from that direction.
  • a light reflecting/diffracting device is placed directly above the camera lens of a mobile telephone in a multi-microphone, hands-free teleconferencing scenario with one or more participants, such that all near-end participants are visible on the image projected on the camera sensor.
  • This device can be advantageously used in combination with a video-based person localization algorithm in order to optimally initialize the coefficients of an audio adaptive beamformer for speech enhancement when someone starts to speak, and this results in greatly increased robustness of such a speech enhancement algorithm.
  • the present invention has been described above in relation the improvement in robustness of speech enhancement in hands-free teleconferencing with multiple participants, using a mobile telephone equipped with a camera and (preferably) two or more microphones (to facilitate the preferred noise suppression and echo cancellation technique described above.
  • a mobile telephone equipped with a camera and (preferably) two or more microphones to facilitate the preferred noise suppression and echo cancellation technique described above.
  • the present invention is not necessarily intended to be limited in this regard, and many other applications of the inventive combination of features of the various aspects of the present invention are envisaged.
  • a new feature can be provided in respect of mobile telephones having cameras incorporated therein, whereby a wide angle (morphed) picture/movie can be captured, following which specific post-processing (for example, on the user's home personal computer) can be used to at least partially undo the morphing.
  • specific post-processing for example, on the user's home personal computer
  • Another option might be to use the light reflecting/diffracting element in a presence detection application, whereby the mobile telephone monitors its surrounding environment and reacts in response to movement/faces, etc detected therein. For example, this could be used as a mobile security device in, say, a caravan, whereby an occupant can be alerted in the event that some predetermined event occurs within the area surrounding the device.

Abstract

A mobile telephone (80) having an image capture device incorporated therein, only the outer lens (86) of which is visible. First and second microphones (88a,88b), located at opposite ends of the mobile telephone (80), are provided for receiving audio signals from the surrounding area. An optical element (89), comprising a light reflecting and/or diffracting element, is mounted on the mobile telephone (80) so as to be movable selectively into and away from a position in which it is located over the camera lens (86). The optical element (89,189) is a reflective and/or diffractive element arranged to collect and direct light signals (91) from up to 360° around the telephone (80) onto the image sensor (300) of the image capture device via the lens (86), such that, for example, the location of people within the surrounding area can be determined.

Description

MOBILE TELECOMMUNICATIONS DEVICE
FIELD OF THE INVENTION
This invention relates generally to a mobile telecommunications device and, more particularly, to a mobile telecommunications device with an image capture device for capturing images within its field of view and hands free functionality to support applications such as teleconferencing videoconferencing, and the like.
BACKGROUND OF THE INVENTION
The hands-free speech feature of mobile telephones is of increasing importance to the market, and algorithms are known for removing echo and stationary noise in hands-free telecommunications applications, which allow for full-duplex speech communication whereas previously only half-duplex systems have been possible. In order to ensure adequate speech reproduction in hands-free mobile telecommunications applications, acoustic echo cancellation and noise suppression are required. Single-microphone noise suppression algorithms, based on spectral subtraction and minimum-mean-square error have long been known. Improved noise suppression can be achieved with multi-microphone solutions where spatial selectivity is exploited, by means of which reverberation (reflected sound waves), can also be suppressed. A unique two- microphone noise suppression algorithm for mobile telephones has been proposed, which works when the mobile telephone is used in handset mode, meaning that the user holds the phone against the ear, and typically achieves 15 dB suppression of ambient noises without disturbing distortion of speech. The proposed algorithm is based on an adaptive beamformer as described in WO99/27522 and a specific spectral post-processor.
In a hands-free teleconferencing, or videoconferencing scenario, the mobile telephone lies on a surface (e.g. a table or desk) between several people. For an adaptive beamforming algorithm, it is important to have knowledge about where the relevant people are relative to the microphones in order to know from which direction to expect desired sounds. Signals that need to be suppressed (noises and reverberation) come from all other directions. In contrast to the handset scenario, where a user holds the phone against the ear, in the hands-free scenario one cannot make any assumption about the direction relative to the microphones from which the desired speech comes. A wrong assumption will lead to suppression of desired speech and enhancement of noises and reverberation, while clearly the opposite is desired. Thus an adaptive beamformer for speech enhancement needs to know where the participants are (e.g. under which angle and/or which tilt) in order to spatially enhance desired speech signals and spatially suppress undesired signals (noises and reverberation). There exist techniques to estimate the position of participants using microphone signals (see for example, WO00/28740. However, these techniques only work when participants speak, and one still cannot distinguish between desired and undesired sound events. To have means for achieving position data of participants from video is beneficial. With such position information an adaptive beamformer, such as that of WO99/27522 can be given good initial coefficients when a participant starts to talk. With a good set of initial coefficients an in-beam speech activity detector can properly work, the audio beamformer can further adapt for a larger speech quality improvement, and it can track small movements of the participant by itself.
Position information from video can be achieved by several means. For mobile telephone applications, an economic method to localize participants is to use a skin color detection algorithm on video pixels combined with a smart clustering algorithm. In one known technique, the skin detection is done by comparison of the color of individual pixels in an image with histogram data for skin and non-skin classes obtained from a large training set. The histogram data can be put in a look-up table (economic for mobile telephone applications), and the colors can also be economically quantized. Alternative or complementary solutions for people localization involve face recognition technologies, object motion estimation, edge detection, still background subtraction, and the like.
The difficulty with video capture in the hands-free mobile telephone scenario is that, typically, the camera faces the ceiling and the participants are not located in the camera's field of view. Even if the mobile telephone is positioned on the table in such a way that it can see one of the participants, then in general other participants will not be seen.
US Patent Application Publication US2004/0057622 describes the use of a 360- degree view camera to generate images of faces within the camera field of view and identify the location of eyes in the respective faces. However, this arrangement would not be suitable for use in a mobile telecommunications application, because the resultant image is heavily distorted so that, it is suitable for determining the location of people within the 360-degree field of view, but not for the normal function of providing images of the subject(s) of interest. On the other hand, although specific processing techniques for at least partly undoing the distortion or "morphing" are known, these processing techniques are less desirable within the limited processing and power consumption provisions in mobile telecommunications applications.
SUMMARY OF THE INVENTION
Thus, it is an object of the present invention to provide a mobile telecommunications device in which a camera is incorporated, whereby means are for providing a wide-angled field of view in respect of the area surrounding the camera, whilst retaining the ability thereof to function in a normal mode of operation and without significantly increasing the processing and power consumption requirements of the device.
In accordance with the present invention, there is provided a portable device comprising a camera for capturing images within its field of view, the camera comprising an image sensor and a lens for focusing light signals received from within said field of view onto said image sensor so as to generate respective image data, wherein an optical element is provided for collecting and directing onto said image sensor via said lens, light signals derived from several directions within a substantially radial area surrounding said camera, said optical element being mounted relative to said lens so as to be movable selectively to and from an operating position in which it is located between the image sensor and said light signals. The portable device is beneficially a telecommunications device. With an optical element, which may be diffractive or reflective, the telecommunications device can be advantageously used in a hands-free mobile teleconferencing application to provide position (angle and/or tilt) information of participants to a multi-microphone speech enhancement algorithm, leading to a largely increased robustness of the speech enhancement (background noise suppression and dereverberation).
In one exemplary embodiment, said image sensor comprises a plurality of sections, each of which is associated with a unique direction within said area, such that the direction from which a light signal and corresponding image data originates can be determined from the section of said image sensor on which said light signal is incident. Thus, image data can be captured in respect of a wide-angled, e.g. 180° or even 360°, field of view, and the direction from which such image data originates can be relatively easily determined. This information can then be passed to another application, such as an application for determining the direction from which speech signals may originate, which information can then be fed to a noise suppression and/or echo cancellation application. Of course, the wide angled image projected onto the image sensor will be heavily distorted, but this does not matter merely for the purpose of detecting the presence and/or location within the surrounding area of a person or other subject of interest. In another exemplary embodiment, the optical element may be movably mounted on the telecommunications device on, for example, a hinged or slidable mounting arrangement, which is movable relative to the lens. Alternatively, in a flip-phone having a hinged or otherwise rotatably mounted cover, the optical element may be mounted on the cover, such that when the flip phone is closed, the optical element is located over the lens, between it and the light signals, whereas when it is open, the optical element is away from the lens, and the camera can be used in a normal mode of operation.
As stated above, the optical element may comprise a light reflecting or diffracting element, such as a conical or parabolic shaped mirror. In one exemplary embodiment, the optical element may be embedded in a body of light conducting (i.e. transparent) material, the body of light conducting material having substantially the same shape and size as the camera lens. Sliding a device across the camera lens has the advantage that it can be used for protection of the lens (e.g. against water).
When skin color detection is used for localizing participants then with the device a specific color filter passing mostly skin colors can be placed in front of the lens to advantageously achieve an improved robustness of the skin color detector (independently of the light conditions in the room). Detected people at a large distance (e.g. when using skin color detection then a person far away leads to only a small cluster of skin pixels) can be intentionally ignored by the video localization algorithm. In this way the multi-microphone hands-free solution can be used in office environments where only the speech of people close to the mobile is enhanced and all other signal components (noises, reverberation, speech from people further away) are suppressed.
These and other aspects of the present invention will be apparent from, and elucidated with reference to, the embodiments described herein.
BRIEF DESCRIPTION OF THE DRAWINGS
Embodiments of the present invention will now be described by way of examples only and with reference to the accompanying drawings, in which: Figure 1 is a schematic front view of a mobile telephone according to an exemplary embodiment of the present invention;
Figure 2 is a schematic illustration of an optical element for use in a device according to an exemplary embodiment of the present invention; - Figure 3 is a schematic illustration of an optical element for use in a device according to another exemplary embodiment of the present invention;
Figure 4 is a schematic illustration of an optical element for use in a device according to yet another exemplary embodiment of the present invention;
Figure 5 is a schematic side view of a mobile telephone according to an exemplary embodiment of the present invention; and
Figure 6 is a schematic diagram illustrating a wide-angled projected image captured on an image sensor in a device according to an exemplary embodiment of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
The present invention involves the provision of a light reflecting and/or diffracting device which, when placed over the lens of a camera incorporated in a mobile telecommunications device, causes light from all relevant directions (e.g. all directions from which desired speech signals may originate, in the case of a videoconferencing application) to be "bundled" onto the camera lens in such a way that each projected point on the camera image sensor corresponds to a unique direction. As stated above, the fact that the projected image will be heavily distorted or "morphed" is not a significant issue, since it is only required to derive position information from the projected image.
The light reflecting and/or diffracting device is beneficially designed to provide a viewing angle which is sufficient to capture all potential participants in a proposed video conference. Ideally, a 360° view around the camera would be provided, although obviously, it would suffice to have a more limited view if it is known that all potential participants are, for example, located on one side of the device.
The light reflecting and/or diffracting device is preferably mounted or otherwise disposed relative to the camera lens such that it can be selectively moved from a first position in which it is located directly over the camera lens and a second position in which it is not. Referring to Figure 1 of the drawings, a mobile telephone 80 according to an exemplary embodiment of the present invention, comprises a key entry section 82 which comprises a number of button switches 83 for dial entry and other functions. A display device 85 is disposed above the key entry section 82. An image capture device is incorporated in the mobile telephone 80, only the outer lens 86 of which is visible generally centrally between the key entry section 82 and the display section 85. First and second microphones 88a, 88b, located at opposite ends of the mobile telephone 80, are provided for receiving audio signals from the surrounding area. An optical element 89, comprising a light reflecting and/or diffracting element, is mounted on the mobile telephone 80 by means of a mechanical slider arrangement 90, which can be slidably moved relative to the mobile telephone 80 back and forth in a longitudinal direction, as indicated by arrow A, so as to move the optical element 89 selectively into and away from a position in which it is located over the camera lens 86.
Referring to Figure 2 of the drawings, the optical element may comprise a conical mirror 89a, the apex of which faces the camera lens 86, in use, which enables light 91 to be captured from all directions in a 360° field of view.
Alternatively, and as illustrated in Figure 3 of the drawings, a parabolic mirror 89b may be employed to provide the desired 360° field of view. Many other different types of light reflecting and/or diffracting elements are envisaged which could be used to provide the desired circular or partially circular field of view to achieve the object of the present invention, and these will be apparent to a person skilled in the art such that the present invention is not necessarily intended to be limited in this regard.
In all cases, the outer surface 92a, 92b of the respective optical element 89a, 89b may be coated or otherwise provided with a color filter for selective transmission of certain colors, say, skin tones, such that a video person localization algorithm based on the recognition of skin colors becomes relatively more robust against varying lighting conditions. Various types of person localization algorithm will be well known to a person skilled in the art, and one realization thereof, based on the recognition of skin colors, is described in detail in MJ. Jones & J.M. Rehg, "Statistical Color Models with Application to Skin Tone Detection", Int. J. Computer Vision, 46(l):81-96, Jan 2002.
Referring to Figure 4 of the drawings, in an exemplary embodiment of the present invention, the optical element 89b (in this case, a parabolic mirror) may be embedded in a block 93 of light conducting (i.e. transparent) material, which not only protects the optical element 89b, but also enables the entire arrangement to serve as a camera lens protector since it can be shaped and configured to cover the entire camera lens 86, in use.
Referring to Figure 5 of the drawings, in a flip-phone type of device 180, a hinged cover 200 is provided to open and close the telephone 180. In this type of device, the optical element 189 may be mounted on the cover 200 such that, when the cover 200 is closed, the optical element 189 is over the camera lens 186, whereas, when the cover 200 is open, the optical element 189 is away from the camera lens 186, allowing the camera to be used in a normal mode of operation.
Referring to Figure 6 of the drawings, four distinct sections 300a, b, c, and d of a camera sensor are depicted in respect of which a 360° morphed or distorted image is projected. Two faces 301, 302 can be determined by, for example, using a skin tone detection algorithm and are found, in this case, at respective angles φ and θ. The small face 303 in the bottom right-hand sector of the image sensor is intentionally ignored, because due to the head size, the person is judged to be too far away to be a legitimate participant in the videoconference call. Consequently, the audio beamformer referred to above would suppress sounds coming from that direction.
In summary, in the above-described exemplary embodiment of the present invention, a light reflecting/diffracting device is placed directly above the camera lens of a mobile telephone in a multi-microphone, hands-free teleconferencing scenario with one or more participants, such that all near-end participants are visible on the image projected on the camera sensor. This device can be advantageously used in combination with a video-based person localization algorithm in order to optimally initialize the coefficients of an audio adaptive beamformer for speech enhancement when someone starts to speak, and this results in greatly increased robustness of such a speech enhancement algorithm.
The present invention has been described above in relation the improvement in robustness of speech enhancement in hands-free teleconferencing with multiple participants, using a mobile telephone equipped with a camera and (preferably) two or more microphones (to facilitate the preferred noise suppression and echo cancellation technique described above. However, it will be appreciated that the present invention is not necessarily intended to be limited in this regard, and many other applications of the inventive combination of features of the various aspects of the present invention are envisaged. For example, with the selective availability of the light reflecting and/or diffracting optical element, a new feature can be provided in respect of mobile telephones having cameras incorporated therein, whereby a wide angle (morphed) picture/movie can be captured, following which specific post-processing (for example, on the user's home personal computer) can be used to at least partially undo the morphing. Another option might be to use the light reflecting/diffracting element in a presence detection application, whereby the mobile telephone monitors its surrounding environment and reacts in response to movement/faces, etc detected therein. For example, this could be used as a mobile security device in, say, a caravan, whereby an occupant can be alerted in the event that some predetermined event occurs within the area surrounding the device.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be capable of designing many alternative embodiments without departing from the scope of the invention as defined by the appended claims. In the claims, any reference signs placed in parentheses shall not be construed as limiting the claims. The word "comprising" and "comprises", and the like, does not exclude the presence of elements or steps other than those listed in any claim or the specification as a whole. The singular reference of an element does not exclude the plural reference of such elements and vice- versa. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In a device claim enumerating several means, several of these means may be embodied by one and the same item of hardware. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.

Claims

1. A portable device comprising a camera for capturing images within its field of view, the camera comprising an image sensor (300) and a lens (86,186) for focusing light signals (91) received from within said field of view onto said image sensor (300) so as to generate respective image data, wherein an optical element (89,189) is provided for collecting and directing onto said image sensor (300) via said lens (86,186), light signals (91) derived from several directions within a substantially radial area surrounding said camera, said optical element (89,189) being mounted relative to said lens (86,186) so as to be movable selectively to and from an operating position in which it is located between the image sensor (300) and said light signals (91).
2. A portable device according to claim 1, wherein said image sensor (300) comprises a plurality of sections (3OOa,3OOb,3OOc,3OOd), each of which is associated with a unique direction within said area, such that the direction from which a light signal (91) and corresponding image data originates can be determined from the section of said image sensor on which said light signal (91) is incident.
3. A portable device according to claim 1, wherein the optical element (89,189) comprises a light reflecting and/or diffracting element.
4. A portable device according to claim 1 , wherein said optical element (89, 189) comprises a conical or parabolic shaped mirror.
5. A portable device according to claim 1, wherein the optical element (89,189) is embedded in a body (93) of light conducting material.
6. A portable device according to claim 5, wherein said body of light conducting material has substantially the same shape and size as the lens (86,186).
7. A portable device according to claim 1, comprising a mechanical slider arrangement (90) on which the optical element (89) is mounted, which mechanical slider arrangement can be slidably moved relative to the lens (86).
8. A portable device according to claim 1, comprising a rotatably mounted cover (200) on which the optical element (189) is mounted such that, when the rotatably mounted cover is closed, the optical element is over the lens (186), whereas, when the rotatably mounted cover is open, the optical element is away from the lens.
9. A portable device according to claim 1, wherein the outer surface (92a,92b) of the optical element (89a, 89b) is provided with a color filter passing mostly skin colors which is placed in front of the lens (86).
10. A mobile telecommunications device (80,180) including the portable device according to claim 1.
PCT/IB2006/050201 2005-01-25 2006-01-19 Mobile telecommunications device WO2006079951A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP05300058.4 2005-01-25
EP05300058 2005-01-25

Publications (1)

Publication Number Publication Date
WO2006079951A1 true WO2006079951A1 (en) 2006-08-03

Family

ID=36577534

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2006/050201 WO2006079951A1 (en) 2005-01-25 2006-01-19 Mobile telecommunications device

Country Status (1)

Country Link
WO (1) WO2006079951A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2354806A1 (en) * 2010-02-01 2011-08-10 Sick Ag Optoelectronic sensor

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020061767A1 (en) * 2000-11-10 2002-05-23 Peter Sladen Mobile imaging
US6532035B1 (en) * 2000-06-29 2003-03-11 Nokia Mobile Phones Ltd. Method and apparatus for implementation of close-up imaging capability in a mobile imaging system
US20030160862A1 (en) * 2002-02-27 2003-08-28 Charlier Michael L. Apparatus having cooperating wide-angle digital camera system and microphone array
US20030164895A1 (en) * 2001-11-16 2003-09-04 Jarkko Viinikanoja Mobile termanal device having camera system
US20040057622A1 (en) * 2002-09-25 2004-03-25 Bradski Gary R. Method, apparatus and system for using 360-degree view cameras to identify facial features

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6532035B1 (en) * 2000-06-29 2003-03-11 Nokia Mobile Phones Ltd. Method and apparatus for implementation of close-up imaging capability in a mobile imaging system
US20020061767A1 (en) * 2000-11-10 2002-05-23 Peter Sladen Mobile imaging
US20030164895A1 (en) * 2001-11-16 2003-09-04 Jarkko Viinikanoja Mobile termanal device having camera system
US20030160862A1 (en) * 2002-02-27 2003-08-28 Charlier Michael L. Apparatus having cooperating wide-angle digital camera system and microphone array
US20040057622A1 (en) * 2002-09-25 2004-03-25 Bradski Gary R. Method, apparatus and system for using 360-degree view cameras to identify facial features

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2354806A1 (en) * 2010-02-01 2011-08-10 Sick Ag Optoelectronic sensor

Similar Documents

Publication Publication Date Title
US9924112B2 (en) Automatic video stream selection
US5940118A (en) System and method for steering directional microphones
US9197974B1 (en) Directional audio capture adaptation based on alternative sensory input
Wang et al. Voice source localization for automatic camera pointing system in videoconferencing
US9668077B2 (en) Electronic device directional audio-video capture
US8248448B2 (en) Automatic camera framing for videoconferencing
US10904658B2 (en) Electronic device directional audio-video capture
US9445045B2 (en) Video conferencing device for a communications device and method of manufacturing and using the same
EP1489596A1 (en) Device and method for voice activity detection
US20110285807A1 (en) Voice Tracking Camera with Speaker Identification
US20130271559A1 (en) Videoconferencing Endpoint Having Multiple Voice-Tracking Cameras
US20190156849A1 (en) Interference-free audio pickup in a video conference
JPH11331827A (en) Television camera
JP2004528766A (en) Method and apparatus for sensing and locating a speaker using sound / image
WO2004066273A1 (en) Noise reduction and audio-visual speech activity detection
KR20070004893A (en) Adaptive beamformer, sidelobe canceller, handsfree speech communication device
EP1377041A3 (en) Integrated design for omni-directional camera and microphone array
CN115482830B (en) Voice enhancement method and related equipment
KR20110052678A (en) Communication device with peripheral viewing means
CN108063910A (en) For the camera base and its method in video conferencing system
WO2022253003A1 (en) Speech enhancement method and related device
WO2006079951A1 (en) Mobile telecommunications device
Pingali et al. Audio-visual tracking for natural interactivity
Sudharsan et al. A microphone array and voice algorithm based smart hearing aid
JP2008227775A (en) Communication terminal device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase

Ref document number: 06710699

Country of ref document: EP

Kind code of ref document: A1

WWW Wipo information: withdrawn in national office

Ref document number: 6710699

Country of ref document: EP