USRE41602E1 - Digital camera with voice recognition annotation - Google Patents

Digital camera with voice recognition annotation Download PDF

Info

Publication number
USRE41602E1
USRE41602E1 US11/392,923 US39292306A USRE41602E US RE41602 E1 USRE41602 E1 US RE41602E1 US 39292306 A US39292306 A US 39292306A US RE41602 E USRE41602 E US RE41602E
Authority
US
United States
Prior art keywords
data
voice
digital camera
text
routines
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
US11/392,923
Inventor
Viktors Berstis
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
MediaTek Inc
Original Assignee
MediaTek Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by MediaTek Inc filed Critical MediaTek Inc
Priority to US11/392,923 priority Critical patent/USRE41602E1/en
Application granted granted Critical
Publication of USRE41602E1 publication Critical patent/USRE41602E1/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N9/00Details of colour television systems
    • H04N9/79Processing of colour television signals in connection with recording
    • H04N9/80Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback
    • H04N9/82Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback the individual colour picture signal components being recorded simultaneously only
    • H04N9/8205Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback the individual colour picture signal components being recorded simultaneously only involving the multiplexing of an additional signal and the colour video signal
    • H04N9/8233Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback the individual colour picture signal components being recorded simultaneously only involving the multiplexing of an additional signal and the colour video signal the additional signal being a character code signal
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/4223Cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/433Content storage operation, e.g. storage operation in response to a pause request, caching operations
    • H04N21/4334Recording operations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440236Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by media transcoding, e.g. video is transformed into a slideshow of still pictures, audio is converted into text
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8146Monomedia components thereof involving graphical data, e.g. 3D object, 2D graphics
    • H04N21/8153Monomedia components thereof involving graphical data, e.g. 3D object, 2D graphics comprising still images, e.g. texture, background image
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/765Interface circuits between an apparatus for recording and another apparatus
    • H04N5/77Interface circuits between an apparatus for recording and another apparatus between a recording apparatus and a television camera
    • H04N5/772Interface circuits between an apparatus for recording and another apparatus between a recording apparatus and a television camera the recording apparatus and the television camera being placed in the same enclosure
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N9/00Details of colour television systems
    • H04N9/79Processing of colour television signals in connection with recording
    • H04N9/80Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback
    • H04N9/804Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback involving pulse code modulation of the colour picture signal components
    • H04N9/8042Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback involving pulse code modulation of the colour picture signal components involving data reduction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N9/00Details of colour television systems
    • H04N9/79Processing of colour television signals in connection with recording
    • H04N9/80Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback
    • H04N9/804Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback involving pulse code modulation of the colour picture signal components
    • H04N9/806Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback involving pulse code modulation of the colour picture signal components with processing of the sound signal
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N1/00Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
    • H04N1/32Circuits or arrangements for control or supervision between transmitter and receiver or between image input and image output device, e.g. between a still-image camera and its memory or between a still-image camera and a printer device
    • H04N1/32101Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title
    • H04N1/32128Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title attached to the image data, e.g. file header, transmitted message header, information on the same page or in the same computer file as the image
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N2101/00Still video cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N2201/00Indexing scheme relating to scanning, transmission or reproduction of documents or the like, and to details thereof
    • H04N2201/0077Types of the still picture apparatus
    • H04N2201/0084Digital still camera
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N2201/00Indexing scheme relating to scanning, transmission or reproduction of documents or the like, and to details thereof
    • H04N2201/32Circuits or arrangements for control or supervision between transmitter and receiver or between image input and image output device, e.g. between a still-image camera and its memory or between a still-image camera and a printer device
    • H04N2201/3201Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title
    • H04N2201/3261Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title of multimedia information, e.g. a sound signal
    • H04N2201/3264Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title of multimedia information, e.g. a sound signal of sound signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N2201/00Indexing scheme relating to scanning, transmission or reproduction of documents or the like, and to details thereof
    • H04N2201/32Circuits or arrangements for control or supervision between transmitter and receiver or between image input and image output device, e.g. between a still-image camera and its memory or between a still-image camera and a printer device
    • H04N2201/3201Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title
    • H04N2201/3261Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title of multimedia information, e.g. a sound signal
    • H04N2201/3266Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title of multimedia information, e.g. a sound signal of text or character information, e.g. text accompanying an image
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N2201/00Indexing scheme relating to scanning, transmission or reproduction of documents or the like, and to details thereof
    • H04N2201/32Circuits or arrangements for control or supervision between transmitter and receiver or between image input and image output device, e.g. between a still-image camera and its memory or between a still-image camera and a printer device
    • H04N2201/3201Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title
    • H04N2201/3274Storage or retrieval of prestored additional information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/78Television signal recording using magnetic recording
    • H04N5/781Television signal recording using magnetic recording on disks or drums
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/907Television signal recording using static stores, e.g. storage tubes or semiconductor memories

Definitions

  • the present invention relates to electronic photography, and in particular to a digital camera that translates recorded voice annotations to text annotations when external power is provided.
  • Digital cameras have become popular for both professional and amateur photography. As digital cameras have become more popular, their sophistication has increased, allowing additional features. For example, some digital cameras allow the user to record voice annotations. However, when the pictures are printed, the voice annotations are lost, since recorded voice cannot be usefully displayed on a printed picture. A need arises for a way in which a voice annotation may be recorded when a picture is taken, but a text annotation is included with the picture when it is printed or transmitted.
  • the present invention is a digital camera which allows voice annotations to be recorded for each picture, but which includes text annotations with each such picture when the picture is transmitted from the camera.
  • the digital camera of the present invention includes an image sensing apparatus operable to receive light comprising an image and output image data representing the image, a first memory operable to store the image data, a sound sensing apparatus operable to receive a sound and output sound data representing the sound, wherein the sound is speech and the sound data is voice data, a second memory operable to store the voice data, a third memory operable to store text data; and a voice recognition apparatus operable to access the second memory, translate the stored voice data into text data and store the text data in the third memory, when the digital camera is provided with external power. Because the voice to text translation process is compute-intensive, and thus, power-consuming, the translation is deferred until external power is provided.
  • the present invention may further include an I/O adapter operable to access the first memory and the third memory and transmit the stored image data and the stored text data, when the digital camera is communicatively connected to an external device.
  • I/O adapter operable to access the first memory and the third memory and transmit the stored image data and the stored text data, when the digital camera is communicatively connected to an external device.
  • the image data represent a picture
  • the recorded voice data represent a voice annotation associated with the picture
  • the text data is a text annotation associated with the picture.
  • the voice recognition apparatus includes a microprocessor operable to execute image capture routines, voice recording routines and voice recognition routines.
  • the microprocessor may be further operable to execute data transfer routines.
  • external power and communications connections are provided by a cradle assembly.
  • FIG. 1 shows a digital camera system 100 , according to the present invention.
  • FIG. 2 is an exemplary block diagram of a digital camera shown in FIG. 1 .
  • FIG. 3 is a flow diagram of a process of operation of the system shown in FIG. 1 .
  • FIG. 4 is an exemplary format of data stored in a memory shown in FIG. 2 .
  • FIG. 5 is another exemplary format of data stored in a memory shown in FIG. 2 .
  • FIG. 1 A digital camera system 100 , according to the present invention, is shown in FIG. 1 .
  • System 100 includes digital camera 102 and cradle assembly 104 .
  • Cradle assembly 104 includes cradle 106 , which receives camera 102 , allowing attachment of the cradle to the camera.
  • Cradle assembly 104 includes power connector 108 and data connector 110 , which provide power and data connections to camera 102 during the recharging, data transfer and voice recognition processes.
  • Power is supplied to power connector 108 by power supply 112 via power cable 114 .
  • Power supply 112 may be a wall-mounted device, an automotive power adapter, or a battery-powered device.
  • Data may be transferred via data cable 116 , which connects to data connector 110 , and which provides communicative connection to an external device, such as a personal computer 119 , or to a communication device, such as wireless system 120 , cable modem 122 , asymmetric digital subscriber line (ADSL) modem 124 , local area network interface device 126 , integrated services digital network (ISDN) interface device 128 , or voice line modem 130 .
  • Wireless system 120 includes a modem and wireless transceiver communicatively connected to a wireless network. The recharging, data transfer and voice recognition processes are performed when the camera is returned to the cradle after pictures are taken and voice annotations are recorded.
  • communication devices 120 - 130 provide direct access to destination computer system or server 132 over the Internet 134 .
  • communication devices 120 - 130 provide access to an intermediate system 136 .
  • the intermediate system may be a server or other computer system and is used to improve the convenience and speed of data transfers from camera 102 .
  • cradle 106 may not be used. Rather, power connector 108 and data connector 110 may be directly attached to camera 102 .
  • the connectors may be attached separately or combined in a single assembly.
  • a digital camera 102 is shown in FIG. 2 .
  • Digital camera 102 includes an image sensing apparatus 201 , which receives light comprising an image and outputs digital image data representing the image.
  • Image sensing apparatus 201 typically includes a lens 202 , which focuses the image onto image sensor 204 .
  • Image sensor 204 which is typically a charge-coupled device (CCD) or a complementary metal oxide semiconductor (CMOS) device, outputs a signal representing the image to A/D converter 206 , which converts it to digital image data by digitizing the signal, and outputs the digital image data to microprocessor 208 .
  • Digital camera 102 also includes sound sensing apparatus 209 , which receives sounds, such as speech and outputs digital sound data representing the sound.
  • Microphone 210 senses sounds, typically spoken words, and outputs a signal representing the sensed sounds to A/D converter 212 , which digitizes the signal and outputs the digital sound data to microprocessor 208 .
  • Microprocessor 208 stores the digital image and sound data in memory 214 .
  • Memory 214 is typically semiconductor memory, such as RAM or flash memory. Memory 214 may be built-in to camera 102 or memory 214 may be removable and non-volatile, such as flash memory cards, or may also be disk storage, such as a floppy disk or other removable media drive, or a hard drive in or attached to digital camera 102 .
  • Digital camera 102 includes I/O adapter 216 , which includes connector 217 , for transferring data into or out of the camera via data connector 110 and data cable 116 .
  • Digital camera 102 also includes power supply 218 , which includes a battery, regulating and recharging circuitry and connector 219 . This allows digital camera 102 to be powered by power supply 112 via power cable 114 and power connector 108 .
  • Other well-known components, such as viewfinder, shutter switch, etc., are not shown.
  • Microprocessor 208 stores image data for each picture taken in image data block 220 in memory 214 .
  • the image data in block 220 is typically compressed to save memory space.
  • Microprocessor 208 stores the recorded voice (speech) data associated with each stored image in recorded voice data block 222 .
  • the recorded voice data is also compressed.
  • Text data associated with each stored image is also stored in memory 214 in recognized text annotation data block 223 .
  • the stored text data is generated by performing voice recognition on the recorded voice data, as described below.
  • any sound may be recorded and stored by digital camera 102 , not just speech.
  • the recorded sound will be stored in memory 214 in recorded voice data block 222 .
  • the recorded sound will be treated as recorded voice data and voice recognition will be attempted on the recorded sound. In this situation, voice recognition will fail, causing digital camera 102 to recognize that the recorded sound is not voice data.
  • the recorded sound will then be treated not as voice data, but simply as recorded sound data.
  • the voice recognition is performed by voice recognition unit 224 using voice recognition data 225 .
  • voice recognition is performed using a digital signal processor (DSP).
  • DSP digital signal processor
  • voice recognition unit 224 is not used and voice recognition is performed by microprocessor 208 executing voice recognition routines 226 , using voice recognition data 225 . This embodiment does not provide real-time recognition, but saves the expense of voice recognition unit 224 .
  • the output of the voice recognition process is text data, which is stored in recognized text annotation data block 223 .
  • Digital camera 102 also includes software routines which are executed by microprocessor 208 .
  • Image/voice capture routines 228 control the process of taking digital photographs, recording voice annotations and compressing and storing the data in image data block 220 and recorded sound data block 222 .
  • Voice recognition routines 226 control the process of recognizing the voice annotations stored in recorded sound data block 222 , generating text annotations and storing them in recognized text annotation data block 223 .
  • Data transfer routines 230 control the process of transferring data from digital camera 102 .
  • Voice recognition data 225 is typically stored in RAM built-in to digital camera 102 . However, voice recognition data 225 may be stored in removable memory, so that the camera may be customized to recognize particular voices or languages.
  • Software routines 226 - 230 are typically stored in nonvolatile memory, such as ROM or flash memory.
  • Digital camera system 100 is operated as shown in FIG. 3 .
  • the camera is removed from cradle 106 .
  • the camera is used to take one or more pictures and to record one or more voice annotations.
  • Microprocessor 208 executes image/voice capture routines 228 in order to take each picture, compress the image data, and store the image data in image data block 220 in memory 214 .
  • microprocessor 208 executes image/voice capture routines 228 in order to record each voice annotation, compress the voice data, and store the voice data in recorded voice data block 222 in memory 214 .
  • Camera 102 may be used to take pictures and record voice annotations until the completion of a picture-taking session.
  • a picture-taking session may be completed because memory 214 has become full, because the battery charge has become low, or because the user has taken the desired pictures.
  • camera 102 is placed in cradle 106 , which causes attachment of both power connector 108 and data connector 110 to camera 102 . If cradle 106 is not used, then, at a minimum, power connector 108 must be attached to camera 102 . Typically, data connector 110 is also connected at this time, but that is not required.
  • Microprocessor 208 detects that camera 102 has been provided with external power. The detection may be accomplished by any well-known technique. For example, power supply circuitry 218 may detect the presence of external power on power connector 219 and signal microprocessor 208 . Other well-known techniques may also be used.
  • microprocessor 208 Upon detecting that camera 102 has been provided with external power, in step 308 , microprocessor 208 executes voice recognition routines 226 in order to translate the stored voice annotations to text.
  • the details of the voice recognition routines depend upon the embodiment of digital camera.
  • microprocessor 208 signals unit 224 to begin voice recognition.
  • Voice recognition unit 224 then translates the stored voice annotations to text using voice recognition data 225 and stores the recognized text in block 223 .
  • voice recognition unit 224 signals completion to microprocessor 208 .
  • voice recognition routines 226 include code that cause microprocessor 208 to itself perform the translation of the stored voice annotations to text using voice recognition data 225 .
  • Microprocessor 208 also stores the recognized text block 223 .
  • microprocessor 208 transfers the stored image and text data to an attached device via data cable 116 , if data connector 110 is attached to camera 102 . If data connector 102 is not attached, camera 102 can store the image and text data for later transfer. Alternatively, if memory 214 is removable, the image and text data may be transferred by removing memory 214 .
  • the attached device is typically a personal computer or workstation, but may be a local or wide-area network, a server, a mainframe or mini-computer, a communication device, etc.
  • Voice recognition annotation may be further enhanced by combination with information that modifies the associated annotation.
  • the modifying information may be specified by the user of the camera by manipulating a menu displayed by the camera or by speaking keywords that are recognized as such by the camera.
  • an annotation may be specified as being a description of the picture associated with the annotation, the name of the place depicted, the time the picture was taken, the names of persons depicted, etc.
  • the user may enter information specifying the name, address, e-mail address, etc. of a recipient for each picture of group of pictures.
  • the user may likewise enter different description, place, name, etc. information for each recipient of each picture or group of pictures.
  • FIG. 4 An exemplary format of data stored in memory 214 is shown in FIG. 4 .
  • the image data from each picture taken is stored as a block of image data.
  • the image data from picture 1 is stored in block 402
  • the image data from picture N is stored in block 404 .
  • All blocks of image data 402 - 404 are stored contiguously.
  • the recorded voice data associated with each picture taken is stored as a block of recorded voice data.
  • the recorded voice data from the voice annotation associated with picture 1 is stored in block 406
  • the recorded voice data from the voice annotation associated with picture N is stored in block 408 . All blocks of recorded voice data 406 - 408 are stored contiguously.
  • the translated text annotation data associated with each picture taken is stored as a block of text data.
  • the translated text annotation data associated with picture 1 is stored in block 410
  • the translated text annotation data associated with picture N is stored in block 412 . All blocks of translated text annotation data 410 - 412 are stored contiguously.
  • FIG. 5 Another exemplary format of data stored in memory 214 is shown in FIG. 5 .
  • the image data from each picture, the recorded voice data associated with each picture and the translated text annotation data associated with each picture are each stored as blocks of data.
  • the image data from picture 1 is stored as block 502
  • the recorded voice data associated with picture 1 is stored as block 504
  • the translated text data associated with picture 1 is stored as block 506 .
  • the image data from a picture is stored contiguously with the recorded voice data and the translated text data associated with the picture.
  • blocks 502 , 504 and 506 which are all associated with picture 1
  • block 508 , 510 and 512 which are all associated with picture N, are stored contiguously.
  • FIGS. 4 and 5 are only two examples of data storage formats that may be used. Any other format that maintains the association among the image data, the recorded voice data and the translated text data may be used as well. For example, a well-known file system may be used.

Abstract

A digital camera which allows voice annotations to be recorded for each picture, but which includes text annotations with each such picture when the picture is transmitted from the camera. The digital camera includes an image sensing apparatus operable to receive light comprising an image and output image data representing the image, a first memory operable to store the image data, a sound sensing apparatus operable to receive a sound and output sound data representing the sound, wherein the sound is speech and the sound data is voice data, a second memory operable to store the voice data, a third memory operable to store text data; and a voice recognition apparatus operable to access the second memory, translate the stored voice data into text data and store the text data in the third memory, when the digital camera is provided with external power. In one embodiment, the voice recognition apparatus includes a microprocessor operable to execute image capture routines, voice recording routines and voice recognition routines. The microprocessor may be further operable to execute data transfer routines.

Description

FIELD OF THE INVENTION
The present invention relates to electronic photography, and in particular to a digital camera that translates recorded voice annotations to text annotations when external power is provided.
BACKGROUND OF THE INVENTION
Digital cameras have become popular for both professional and amateur photography. As digital cameras have become more popular, their sophistication has increased, allowing additional features. For example, some digital cameras allow the user to record voice annotations. However, when the pictures are printed, the voice annotations are lost, since recorded voice cannot be usefully displayed on a printed picture. A need arises for a way in which a voice annotation may be recorded when a picture is taken, but a text annotation is included with the picture when it is printed or transmitted.
SUMMARY OF THE INVENTION
The present invention is a digital camera which allows voice annotations to be recorded for each picture, but which includes text annotations with each such picture when the picture is transmitted from the camera. The digital camera of the present invention includes an image sensing apparatus operable to receive light comprising an image and output image data representing the image, a first memory operable to store the image data, a sound sensing apparatus operable to receive a sound and output sound data representing the sound, wherein the sound is speech and the sound data is voice data, a second memory operable to store the voice data, a third memory operable to store text data; and a voice recognition apparatus operable to access the second memory, translate the stored voice data into text data and store the text data in the third memory, when the digital camera is provided with external power. Because the voice to text translation process is compute-intensive, and thus, power-consuming, the translation is deferred until external power is provided.
The present invention may further include an I/O adapter operable to access the first memory and the third memory and transmit the stored image data and the stored text data, when the digital camera is communicatively connected to an external device.
It is preferred that the image data represent a picture, the recorded voice data represent a voice annotation associated with the picture, and the text data is a text annotation associated with the picture.
In one embodiment, the voice recognition apparatus includes a microprocessor operable to execute image capture routines, voice recording routines and voice recognition routines. The microprocessor may be further operable to execute data transfer routines.
In one embodiment, external power and communications connections are provided by a cradle assembly.
BRIEF DESCRIPTION OF THE DRAWINGS
The details of the present invention, both as to its structure and operation, can best be understood by referring to the accompanying drawings, in which like reference numbers and designations refer to like elements.
FIG. 1 shows a digital camera system 100, according to the present invention.
FIG. 2 is an exemplary block diagram of a digital camera shown in FIG. 1.
FIG. 3 is a flow diagram of a process of operation of the system shown in FIG. 1.
FIG. 4 is an exemplary format of data stored in a memory shown in FIG. 2.
FIG. 5 is another exemplary format of data stored in a memory shown in FIG. 2.
DETAILED DESCRIPTION OF THE INVENTION
A digital camera system 100, according to the present invention, is shown in FIG. 1. System 100 includes digital camera 102 and cradle assembly 104. Cradle assembly 104 includes cradle 106, which receives camera 102, allowing attachment of the cradle to the camera. Cradle assembly 104 includes power connector 108 and data connector 110, which provide power and data connections to camera 102 during the recharging, data transfer and voice recognition processes. Power is supplied to power connector 108 by power supply 112 via power cable 114. Power supply 112 may be a wall-mounted device, an automotive power adapter, or a battery-powered device. Data may be transferred via data cable 116, which connects to data connector 110, and which provides communicative connection to an external device, such as a personal computer 119, or to a communication device, such as wireless system 120, cable modem 122, asymmetric digital subscriber line (ADSL) modem 124, local area network interface device 126, integrated services digital network (ISDN) interface device 128, or voice line modem 130. Wireless system 120 includes a modem and wireless transceiver communicatively connected to a wireless network. The recharging, data transfer and voice recognition processes are performed when the camera is returned to the cradle after pictures are taken and voice annotations are recorded.
In one embodiment, communication devices 120-130 provide direct access to destination computer system or server 132 over the Internet 134. In another embodiment, communication devices 120-130 provide access to an intermediate system 136. The intermediate system may be a server or other computer system and is used to improve the convenience and speed of data transfers from camera 102.
Alternatively, cradle 106 may not be used. Rather, power connector 108 and data connector 110 may be directly attached to camera 102. The connectors may be attached separately or combined in a single assembly.
A digital camera 102, according to the present invention, is shown in FIG. 2. Digital camera 102 includes an image sensing apparatus 201, which receives light comprising an image and outputs digital image data representing the image. Image sensing apparatus 201 typically includes a lens 202, which focuses the image onto image sensor 204. Image sensor 204, which is typically a charge-coupled device (CCD) or a complementary metal oxide semiconductor (CMOS) device, outputs a signal representing the image to A/D converter 206, which converts it to digital image data by digitizing the signal, and outputs the digital image data to microprocessor 208. Digital camera 102 also includes sound sensing apparatus 209, which receives sounds, such as speech and outputs digital sound data representing the sound. Microphone 210 senses sounds, typically spoken words, and outputs a signal representing the sensed sounds to A/D converter 212, which digitizes the signal and outputs the digital sound data to microprocessor 208. Microprocessor 208 stores the digital image and sound data in memory 214. Memory 214 is typically semiconductor memory, such as RAM or flash memory. Memory 214 may be built-in to camera 102 or memory 214 may be removable and non-volatile, such as flash memory cards, or may also be disk storage, such as a floppy disk or other removable media drive, or a hard drive in or attached to digital camera 102.
Digital camera 102 includes I/O adapter 216, which includes connector 217, for transferring data into or out of the camera via data connector 110 and data cable 116. Digital camera 102 also includes power supply 218, which includes a battery, regulating and recharging circuitry and connector 219. This allows digital camera 102 to be powered by power supply 112 via power cable 114 and power connector 108. Other well-known components, such as viewfinder, shutter switch, etc., are not shown.
Microprocessor 208 stores image data for each picture taken in image data block 220 in memory 214. The image data in block 220 is typically compressed to save memory space. Microprocessor 208 stores the recorded voice (speech) data associated with each stored image in recorded voice data block 222. Typically, the recorded voice data is also compressed. Text data associated with each stored image is also stored in memory 214 in recognized text annotation data block 223. The stored text data is generated by performing voice recognition on the recorded voice data, as described below.
It will be seen that any sound may be recorded and stored by digital camera 102, not just speech. The recorded sound will be stored in memory 214 in recorded voice data block 222. The recorded sound will be treated as recorded voice data and voice recognition will be attempted on the recorded sound. In this situation, voice recognition will fail, causing digital camera 102 to recognize that the recorded sound is not voice data. The recorded sound will then be treated not as voice data, but simply as recorded sound data.
In one embodiment, the voice recognition is performed by voice recognition unit 224 using voice recognition data 225. Typically, voice recognition is performed using a digital signal processor (DSP). Use of a DSP allows real-time or near-real time recognition, at significant expense. However, real-time voice recognition is not necessary in the present invention, since recognition is not performed until the camera has been returned to the cradle. Thus, in another embodiment of the present invention, voice recognition unit 224 is not used and voice recognition is performed by microprocessor 208 executing voice recognition routines 226, using voice recognition data 225. This embodiment does not provide real-time recognition, but saves the expense of voice recognition unit 224.
The output of the voice recognition process is text data, which is stored in recognized text annotation data block 223.
Digital camera 102 also includes software routines which are executed by microprocessor 208. Image/voice capture routines 228 control the process of taking digital photographs, recording voice annotations and compressing and storing the data in image data block 220 and recorded sound data block 222. Voice recognition routines 226 control the process of recognizing the voice annotations stored in recorded sound data block 222, generating text annotations and storing them in recognized text annotation data block 223. Data transfer routines 230 control the process of transferring data from digital camera 102.
Voice recognition data 225 is typically stored in RAM built-in to digital camera 102. However, voice recognition data 225 may be stored in removable memory, so that the camera may be customized to recognize particular voices or languages. Software routines 226-230 are typically stored in nonvolatile memory, such as ROM or flash memory.
Digital camera system 100 is operated as shown in FIG. 3. In step 302, the camera is removed from cradle 106. In step 304, the camera is used to take one or more pictures and to record one or more voice annotations. Microprocessor 208 executes image/voice capture routines 228 in order to take each picture, compress the image data, and store the image data in image data block 220 in memory 214. Likewise, microprocessor 208 executes image/voice capture routines 228 in order to record each voice annotation, compress the voice data, and store the voice data in recorded voice data block 222 in memory 214.
Camera 102 may be used to take pictures and record voice annotations until the completion of a picture-taking session. A picture-taking session may be completed because memory 214 has become full, because the battery charge has become low, or because the user has taken the desired pictures. At the completion of the session, in step 306, camera 102 is placed in cradle 106, which causes attachment of both power connector 108 and data connector 110 to camera 102. If cradle 106 is not used, then, at a minimum, power connector 108 must be attached to camera 102. Typically, data connector 110 is also connected at this time, but that is not required.
Microprocessor 208 detects that camera 102 has been provided with external power. The detection may be accomplished by any well-known technique. For example, power supply circuitry 218 may detect the presence of external power on power connector 219 and signal microprocessor 208. Other well-known techniques may also be used.
Upon detecting that camera 102 has been provided with external power, in step 308, microprocessor 208 executes voice recognition routines 226 in order to translate the stored voice annotations to text. The details of the voice recognition routines depend upon the embodiment of digital camera. In an embodiment that includes voice recognition unit 224, microprocessor 208 signals unit 224 to begin voice recognition. Voice recognition unit 224 then translates the stored voice annotations to text using voice recognition data 225 and stores the recognized text in block 223. When voice recognition is completed, voice recognition unit 224 signals completion to microprocessor 208.
In an embodiment that does not include voice recognition unit 224, voice recognition routines 226 include code that cause microprocessor 208 to itself perform the translation of the stored voice annotations to text using voice recognition data 225. Microprocessor 208 also stores the recognized text block 223.
When voice recognition is completed, in step 310, microprocessor 208 transfers the stored image and text data to an attached device via data cable 116, if data connector 110 is attached to camera 102. If data connector 102 is not attached, camera 102 can store the image and text data for later transfer. Alternatively, if memory 214 is removable, the image and text data may be transferred by removing memory 214. The attached device is typically a personal computer or workstation, but may be a local or wide-area network, a server, a mainframe or mini-computer, a communication device, etc.
Voice recognition annotation may be further enhanced by combination with information that modifies the associated annotation. The modifying information may be specified by the user of the camera by manipulating a menu displayed by the camera or by speaking keywords that are recognized as such by the camera. For example, an annotation may be specified as being a description of the picture associated with the annotation, the name of the place depicted, the time the picture was taken, the names of persons depicted, etc. The user may enter information specifying the name, address, e-mail address, etc. of a recipient for each picture of group of pictures. The user may likewise enter different description, place, name, etc. information for each recipient of each picture or group of pictures.
An exemplary format of data stored in memory 214 is shown in FIG. 4. In this example, the image data from each picture taken is stored as a block of image data. For example, the image data from picture 1 is stored in block 402, and the image data from picture N is stored in block 404. All blocks of image data 402-404 are stored contiguously. The recorded voice data associated with each picture taken is stored as a block of recorded voice data. For example, the recorded voice data from the voice annotation associated with picture 1 is stored in block 406, and the recorded voice data from the voice annotation associated with picture N is stored in block 408. All blocks of recorded voice data 406-408 are stored contiguously. The translated text annotation data associated with each picture taken is stored as a block of text data. For example, the translated text annotation data associated with picture 1 is stored in block 410, and the translated text annotation data associated with picture N is stored in block 412. All blocks of translated text annotation data 410-412 are stored contiguously.
Another exemplary format of data stored in memory 214 is shown in FIG. 5. As in FIG. 4, the image data from each picture, the recorded voice data associated with each picture and the translated text annotation data associated with each picture are each stored as blocks of data. For example, the image data from picture 1 is stored as block 502, the recorded voice data associated with picture 1 is stored as block 504 and the translated text data associated with picture 1 is stored as block 506. However, in this example, the image data from a picture is stored contiguously with the recorded voice data and the translated text data associated with the picture. Thus, blocks 502, 504 and 506, which are all associated with picture 1, are stored contiguously. Likewise, block 508, 510 and 512, which are all associated with picture N, are stored contiguously.
FIGS. 4 and 5 are only two examples of data storage formats that may be used. Any other format that maintains the association among the image data, the recorded voice data and the translated text data may be used as well. For example, a well-known file system may be used.
Although specific embodiments of the present invention have been described, it will be understood by those of skill in the art that there are other embodiments that are equivalent to the described embodiments. Accordingly, it is to be understood that the invention is not to be limited by the specific illustrated embodiments, but only by the scope of the appended claims.

Claims (27)

1. A digital camera comprising:
an image sensing apparatus operable to receive light comprising an image and output digital image data representing the image as a picture;
a digital memory including first, second, third, and fourth storage areas within the memory;
digital image data stored in the first storage area of the digital memory;
a sound sensing apparatus operable to receive a sound and output sound data representing the sound, wherein the sound is speech and the sound data is voice data;
voice data stored in the second storage area of the digital memory;
text data stored in the third storage area of the digital memory;
a voice recognition apparatus operable to access the second storage area, translate the stored voice data into text data and store the text data in the third storage area, when the digital camera is provided with external power ; and
image, voice and text data of a picture stored in contiguous locations in the fourth storage area of the digital memory.
2. The digital camera of claim 1, further comprising
an I/O adapter operable to access the first memory and the third memory and transmit the stored image data and the stored text data, when the digital camera is communicatively connected to an external device.
3. The digital camera of claim 1, wherein the image data represents a picture, the voice data represents a voice annotation associated with the picture, and the text data is a text annotation associated with the picture.
4. The digital camera of claim 3, further comprising information that modifies the text annotation.
5. The digital camera of claim 1, further comprises comprising:
a microprocessor within the camera programmed to perform image capture routines, voice recording routines, voice recognition routines and text routines within the microprocessor .
6. The digital camera of claim 5, wherein the microprocessor is further operable to execute data transfer routines.
7. The digital camera of claim 1, wherein external power and communications connections are provided by a cradle assembly for recharging, initiating voice recognition processes and connections to external networks and systems.
8. A method of operating a digital camera comprising the steps of:
receiving light comprising an image and outputting digital image data representing the image;
storing the image data as a picture in a first storage area of a digital memory;
receiving a sound and outputting sound data representing the sound, wherein the sound is speech and the sound data is voice data;
storing the voice data in a second storage area of the digital memory;
translating the stored voice data into text data, when the digital camera is supplied with external power ;
storing the text data in a third storage area of the digital memory; and
storing the image, voice and text data of each picture in contiguous locations in a fourth storage area of the digital memory.
9. The method of claim 8, further comprising the step of:
transmitting the stored image data and the stored text data, when the digital camera is communicatively connected to an external device.
10. The method of claim 8, wherein the image data represents a picture, the voice data represents a voice annotation associated with the picture, and the text data is a text annotation associated with the picture.
11. The digital camera method of claim 10, further comprising information that modifies the text annotation.
12. The method of claim 8 further comprising:
performing in a microprocessor within the camera image capture routines, voice recording routines, voice recognition routines and text routines programmed within the microprocessor .
13. The method of claim 12, wherein the microprocessor is further operable to execute data transfer routines.
14. The method of claim 8, further comprising the step of:
providing external power and communications connections with a cradle assembly for recharging, initiating voice recognition processes and connections to external networks and systems.
15. A digital camera comprising:
means for receiving light comprising an image and outputting digital image data representing the image as a picture;
a digital memory having first, second, third and fourth storage areas within the digital memory
means for storing the image data in the first storage area of the digital memory;
means for receiving a sound and outputting sound data representing the sound, wherein the sound is speech and the sound data is voice data;
means for storing the voice data in the second storage area of the digital memory;
means for translating the stored recorded voice data into text data, when the digital camera is supplied with external power ;
means for storing text data in the third storage area of the digital memory; and
means for storing image, voice and text data of each picture in contiguous locations in the fourth storage area of the digital memory.
16. The digital camera of claim 15, further comprising:
means for transmitting the stored image data and the stored text data, when the digital camera is communicatively connected to an external device.
17. The digital camera of claim 15, wherein the image data represents a picture, the voice data represents a voice annotation associated with the picture, and the text data is a text annotation associated with the picture.
18. The digital camera of claim 17, further comprising information that modifies the text annotation.
19. The digital camera of claim 15 comprising:
a microprocessor within the camera programmed to perform image capture routines, voice recording routines, voice recognition routines and text routines within the microprocessor .
20. The digital camera of claim 19, wherein the microprocessor is further operable to execute data transfer routines.
21. The digital camera of claim 15, further comprising:
means for providing external power and communications for recharging, initiating voice recognition processes and connections to external networks and systems.
22. The digital camera of claim 1, wherein the voice recognition apparatus is operable to access the second storage area, translate the stored voice data into text data and store the text data in the third storage area when the digital camera is provided with external power.
23. The digital camera of claim 5, further comprising a ROM or flash memory for storing the image capture routines, voice recording routines, and text routines.
24. The method of claim 8, wherein the stored voice data is translated into text data when the digital camera is supplied with external power.
25. The method of claim 12, further comprising storing the image capture routines, voice recording routines, and text routines in a ROM or flash memory.
26. The digital camera of claim 15, wherein the means for translating translates the stored voice data into text data when the digital camera is provided with external power.
27. The digital camera of claim 19, further comprising a means for storing the image capture routines, voice recording routines, and text routines.
US11/392,923 1998-12-16 2006-03-28 Digital camera with voice recognition annotation Expired - Lifetime USRE41602E1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US11/392,923 USRE41602E1 (en) 1998-12-16 2006-03-28 Digital camera with voice recognition annotation

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/213,313 US6721001B1 (en) 1998-12-16 1998-12-16 Digital camera with voice recognition annotation
US11/392,923 USRE41602E1 (en) 1998-12-16 2006-03-28 Digital camera with voice recognition annotation

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US09/213,313 Reissue US6721001B1 (en) 1998-12-16 1998-12-16 Digital camera with voice recognition annotation

Publications (1)

Publication Number Publication Date
USRE41602E1 true USRE41602E1 (en) 2010-08-31

Family

ID=22794600

Family Applications (2)

Application Number Title Priority Date Filing Date
US09/213,313 Ceased US6721001B1 (en) 1998-12-16 1998-12-16 Digital camera with voice recognition annotation
US11/392,923 Expired - Lifetime USRE41602E1 (en) 1998-12-16 2006-03-28 Digital camera with voice recognition annotation

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US09/213,313 Ceased US6721001B1 (en) 1998-12-16 1998-12-16 Digital camera with voice recognition annotation

Country Status (2)

Country Link
US (2) US6721001B1 (en)
JP (1) JP3272336B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8462231B2 (en) 2011-03-14 2013-06-11 Mark E. Nusbaum Digital camera with real-time picture identification functionality

Families Citing this family (76)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7016595B1 (en) * 1999-05-28 2006-03-21 Nikon Corporation Television set capable of controlling external device and image storage controlled by television set
JP3777922B2 (en) * 1999-12-09 2006-05-24 コニカミノルタフォトイメージング株式会社 Digital imaging apparatus, image processing system including the same, image processing apparatus, digital imaging method, and recording medium
US6542295B2 (en) * 2000-01-26 2003-04-01 Donald R. M. Boys Trinocular field glasses with digital photograph capability and integrated focus function
US8345105B2 (en) * 2000-03-06 2013-01-01 Sony Corporation System and method for accessing and utilizing ancillary data with an electronic camera device
JP2001333378A (en) * 2000-03-13 2001-11-30 Fuji Photo Film Co Ltd Image processor and printer
JP4124402B2 (en) * 2000-03-31 2008-07-23 株式会社リコー Image input device
US6965403B2 (en) * 2000-10-16 2005-11-15 Canon Kabushiki Kaisha External storage device for image pickup apparatus, control method therefor, image pickup apparatus and control method therefor
US7032182B2 (en) * 2000-12-20 2006-04-18 Eastman Kodak Company Graphical user interface adapted to allow scene content annotation of groups of pictures in a picture database to promote efficient database browsing
JP4434502B2 (en) * 2001-01-19 2010-03-17 富士フイルム株式会社 Digital camera
JP2002305677A (en) * 2001-04-06 2002-10-18 Sony Corp Digital camera
US7656426B2 (en) * 2001-04-06 2010-02-02 Sony Corporation Digital camera and data transfer method from a record medium
JP2002359761A (en) * 2001-05-31 2002-12-13 Asahi Optical Co Ltd Cradle for digital camera
US7075579B2 (en) * 2001-06-05 2006-07-11 Eastman Kodak Company Docking station assembly for transmitting digital files
JP4812190B2 (en) * 2001-06-20 2011-11-09 オリンパス株式会社 Image file device
US20040201681A1 (en) * 2001-06-21 2004-10-14 Jack Chen Multimedia data file producer combining image and sound information together in data file
US7158175B2 (en) * 2001-11-30 2007-01-02 Eastman Kodak Company System including a digital camera and a docking unit for coupling to the internet
GB2383247A (en) * 2001-12-13 2003-06-18 Hewlett Packard Co Multi-modal picture allowing verbal interaction between a user and the picture
GB0129787D0 (en) * 2001-12-13 2002-01-30 Hewlett Packard Co Method and system for collecting user-interest information regarding a picture
US20030133015A1 (en) * 2001-12-17 2003-07-17 Jackel Lawrence David Web-connected interactive digital camera
US20030204403A1 (en) * 2002-04-25 2003-10-30 Browning James Vernard Memory module with voice recognition system
US7398209B2 (en) * 2002-06-03 2008-07-08 Voicebox Technologies, Inc. Systems and methods for responding to natural language speech utterance
US8064650B2 (en) 2002-07-10 2011-11-22 Hewlett-Packard Development Company, L.P. File management of digital images using the names of people identified in the images
US7843495B2 (en) * 2002-07-10 2010-11-30 Hewlett-Packard Development Company, L.P. Face recognition in a digital imaging system accessing a database of people
US7693720B2 (en) 2002-07-15 2010-04-06 Voicebox Technologies, Inc. Mobile systems and methods for responding to natural language speech utterance
KR100458642B1 (en) * 2002-09-19 2004-12-03 삼성테크윈 주식회사 Method for managing data files within portable digital apparatus, utilizing representative voice
FR2844935B1 (en) * 2002-09-25 2005-01-28 Canon Kk TRANSCODING DIGITAL DATA
US20040085454A1 (en) * 2002-11-04 2004-05-06 Ming-Zhen Liao Digital camera capable of transforming the audio input to its picture immediately into a readable illustration and transmitting same
US7324943B2 (en) * 2003-10-02 2008-01-29 Matsushita Electric Industrial Co., Ltd. Voice tagging, voice annotation, and speech recognition for portable devices with optional post processing
US20050114131A1 (en) * 2003-11-24 2005-05-26 Kirill Stoimenov Apparatus and method for voice-tagging lexicon
JP4018678B2 (en) * 2004-08-13 2007-12-05 キヤノン株式会社 Data management method and apparatus
US20060092291A1 (en) * 2004-10-28 2006-05-04 Bodie Jeffrey C Digital imaging system
US7627638B1 (en) * 2004-12-20 2009-12-01 Google Inc. Verbal labels for electronic messages
JP4396511B2 (en) * 2004-12-20 2010-01-13 ソニー株式会社 Printing system
JP2006197115A (en) * 2005-01-12 2006-07-27 Fuji Photo Film Co Ltd Imaging device and image output device
FR2881910B1 (en) * 2005-02-09 2007-05-25 Eastman Kodak Co SHOOTING EQUIPMENT AND IMAGE TRANSMISSION METHOD BY LOCAL NETWORK
US7640160B2 (en) 2005-08-05 2009-12-29 Voicebox Technologies, Inc. Systems and methods for responding to natural language speech utterance
US7620549B2 (en) 2005-08-10 2009-11-17 Voicebox Technologies, Inc. System and method of supporting adaptive misrecognition in conversational speech
US7949529B2 (en) 2005-08-29 2011-05-24 Voicebox Technologies, Inc. Mobile systems and methods of supporting natural language human-machine interactions
EP1934971A4 (en) 2005-08-31 2010-10-27 Voicebox Technologies Inc Dynamic speech sharpening
US7529772B2 (en) * 2005-09-27 2009-05-05 Scenera Technologies, Llc Method and system for associating user comments to a scene captured by a digital imaging device
US7697827B2 (en) 2005-10-17 2010-04-13 Konicek Jeffrey C User-friendlier interfaces for a camera
US8467672B2 (en) * 2005-10-17 2013-06-18 Jeffrey C. Konicek Voice recognition and gaze-tracking for a camera
US20070250526A1 (en) * 2006-04-24 2007-10-25 Hanna Michael S Using speech to text functionality to create specific user generated content metadata for digital content files (eg images) during capture, review, and/or playback process
US8375283B2 (en) * 2006-06-20 2013-02-12 Nokia Corporation System, device, method, and computer program product for annotating media files
US8301995B2 (en) * 2006-06-22 2012-10-30 Csr Technology Inc. Labeling and sorting items of digital data by use of attached annotations
US8073681B2 (en) 2006-10-16 2011-12-06 Voicebox Technologies, Inc. System and method for a cooperative conversational voice user interface
US8396280B2 (en) * 2006-11-29 2013-03-12 Honeywell International Inc. Apparatus and method for inspecting assets in a processing or other environment
US7818176B2 (en) 2007-02-06 2010-10-19 Voicebox Technologies, Inc. System and method for selecting and presenting advertisements based on natural language processing of voice-based input
US8438214B2 (en) * 2007-02-23 2013-05-07 Nokia Corporation Method, electronic device, computer program product, system and apparatus for sharing a media object
US20090002497A1 (en) * 2007-06-29 2009-01-01 Davis Joel C Digital Camera Voice Over Feature
US8059882B2 (en) * 2007-07-02 2011-11-15 Honeywell International Inc. Apparatus and method for capturing information during asset inspections in a processing or other environment
JP5144424B2 (en) * 2007-10-25 2013-02-13 キヤノン株式会社 Imaging apparatus and information processing method
US20090107212A1 (en) * 2007-10-30 2009-04-30 Honeywell International Inc. Process field instrument with integrated sensor unit and related system and method
US8140335B2 (en) 2007-12-11 2012-03-20 Voicebox Technologies, Inc. System and method for providing a natural language voice user interface in an integrated voice navigation services environment
US8385588B2 (en) * 2007-12-11 2013-02-26 Eastman Kodak Company Recording audio metadata for stored images
CN101971262A (en) * 2007-12-21 2011-02-09 皇家飞利浦电子股份有限公司 Method and apparatus for playing pictures
US9305548B2 (en) 2008-05-27 2016-04-05 Voicebox Technologies Corporation System and method for an integrated, multi-modal, multi-device natural language voice services environment
US8589161B2 (en) * 2008-05-27 2013-11-19 Voicebox Technologies, Inc. System and method for an integrated, multi-modal, multi-device natural language voice services environment
US9383225B2 (en) * 2008-06-27 2016-07-05 Honeywell International Inc. Apparatus and method for reading gauges and other visual indicators in a process control system or other data collection system
US8941740B2 (en) * 2008-09-05 2015-01-27 Honeywell International Inc. Personnel field device for process control and other systems and related method
US8326637B2 (en) 2009-02-20 2012-12-04 Voicebox Technologies, Inc. System and method for processing multi-modal device interactions in a natural language voice services environment
US9171541B2 (en) 2009-11-10 2015-10-27 Voicebox Technologies Corporation System and method for hybrid processing in a natural language voice services environment
US9502025B2 (en) 2009-11-10 2016-11-22 Voicebox Technologies Corporation System and method for providing a natural language content dedication service
US9247306B2 (en) * 2012-05-21 2016-01-26 Intellectual Ventures Fund 83 Llc Forming a multimedia product using video chat
US20140078331A1 (en) * 2012-09-15 2014-03-20 Soundhound, Inc. Method and system for associating sound data with an image
CN104683683A (en) * 2013-11-29 2015-06-03 英业达科技有限公司 System for shooting images and method thereof
FR3014675A1 (en) * 2013-12-12 2015-06-19 Oreal METHOD FOR EVALUATING AT LEAST ONE CLINICAL FACE SIGN
US9984457B2 (en) 2014-03-26 2018-05-29 Sectra Ab Automated grossing image synchronization and related viewers and workstations
US9898459B2 (en) 2014-09-16 2018-02-20 Voicebox Technologies Corporation Integration of domain information into state transitions of a finite state transducer for natural language processing
US9626703B2 (en) 2014-09-16 2017-04-18 Voicebox Technologies Corporation Voice commerce
CN107003999B (en) 2014-10-15 2020-08-21 声钰科技 System and method for subsequent response to a user's prior natural language input
US10431214B2 (en) 2014-11-26 2019-10-01 Voicebox Technologies Corporation System and method of determining a domain and/or an action related to a natural language input
US10614799B2 (en) 2014-11-26 2020-04-07 Voicebox Technologies Corporation System and method of providing intent predictions for an utterance prior to a system detection of an end of the utterance
US10331784B2 (en) 2016-07-29 2019-06-25 Voicebox Technologies Corporation System and method of disambiguating natural language processing requests
US10489633B2 (en) 2016-09-27 2019-11-26 Sectra Ab Viewers and related methods, systems and circuits with patch gallery user interfaces
CN113113043B (en) * 2021-04-09 2023-01-13 中国工商银行股份有限公司 Method and device for converting voice into image

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5546145A (en) * 1994-08-30 1996-08-13 Eastman Kodak Company Camera on-board voice recognition
US5602458A (en) * 1993-08-16 1997-02-11 Eastman Kodak Company Rechargeable camera having operational inhibit of a flash unit power storage circuit during recharging
US5692225A (en) * 1994-08-30 1997-11-25 Eastman Kodak Company Voice recognition of recorded messages for photographic printers
US5737491A (en) * 1996-06-28 1998-04-07 Eastman Kodak Company Electronic imaging system capable of image capture, local wireless transmission and voice recognition
US5940121A (en) * 1997-02-20 1999-08-17 Eastman Kodak Company Hybrid camera system with electronic album control
US6031526A (en) * 1996-08-08 2000-02-29 Apollo Camera, Llc Voice controlled medical text and image reporting system
US6084630A (en) * 1991-03-13 2000-07-04 Canon Kabushiki Kaisha Multimode and audio data compression
US6181883B1 (en) * 1997-06-20 2001-01-30 Picostar, Llc Dual purpose camera for VSC with conventional film and digital image capture modules
US6469738B1 (en) * 1997-02-26 2002-10-22 Sanyo Electric Co., Ltd. Frames allowable to be shot in a digital still camera

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6084630A (en) * 1991-03-13 2000-07-04 Canon Kabushiki Kaisha Multimode and audio data compression
US5602458A (en) * 1993-08-16 1997-02-11 Eastman Kodak Company Rechargeable camera having operational inhibit of a flash unit power storage circuit during recharging
US5546145A (en) * 1994-08-30 1996-08-13 Eastman Kodak Company Camera on-board voice recognition
US5692225A (en) * 1994-08-30 1997-11-25 Eastman Kodak Company Voice recognition of recorded messages for photographic printers
US5737491A (en) * 1996-06-28 1998-04-07 Eastman Kodak Company Electronic imaging system capable of image capture, local wireless transmission and voice recognition
US6031526A (en) * 1996-08-08 2000-02-29 Apollo Camera, Llc Voice controlled medical text and image reporting system
US5940121A (en) * 1997-02-20 1999-08-17 Eastman Kodak Company Hybrid camera system with electronic album control
US6469738B1 (en) * 1997-02-26 2002-10-22 Sanyo Electric Co., Ltd. Frames allowable to be shot in a digital still camera
US6181883B1 (en) * 1997-06-20 2001-01-30 Picostar, Llc Dual purpose camera for VSC with conventional film and digital image capture modules

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8462231B2 (en) 2011-03-14 2013-06-11 Mark E. Nusbaum Digital camera with real-time picture identification functionality

Also Published As

Publication number Publication date
US6721001B1 (en) 2004-04-13
JP2000184258A (en) 2000-06-30
JP3272336B2 (en) 2002-04-08

Similar Documents

Publication Publication Date Title
USRE41602E1 (en) Digital camera with voice recognition annotation
US5737491A (en) Electronic imaging system capable of image capture, local wireless transmission and voice recognition
US7831598B2 (en) Data recording and reproducing apparatus and method of generating metadata
US20070236583A1 (en) Automated creation of filenames for digital image files using speech-to-text conversion
JP2000232599A (en) Digital camera, operating method, recording medium read by computer, computer system and automatic digital photograph
US8462231B2 (en) Digital camera with real-time picture identification functionality
JP2000194533A (en) Voice command annotating method
JP2005518742A (en) Portable data storage device and image recording device connectable directly to computer USB port
CN111564157A (en) Conference record optimization method, device, equipment and storage medium
JP2006270263A (en) Photographing system
JP2005202651A (en) Information processing apparatus, information processing method, recording medium with program recorded thereon, and control program
JP2004147325A (en) System and method for associating information with captured image
US6804652B1 (en) Method and apparatus for adding captions to photographs
JP2005346259A (en) Information processing device and information processing method
JP2007199908A (en) Emoticon input apparatus
JP2001142760A (en) Semi-conductor storage device
JP2010147716A (en) Terminal, method and program for processing information
JP2003309798A (en) Electronic apparatus provided with imaging function, picture data output system, and picture data output method
JP4295540B2 (en) Audio recording method and apparatus, digital camera, and image reproduction method and apparatus
US7460738B1 (en) Systems, methods and devices for determining and assigning descriptive filenames to digital images
JP2006166434A (en) Portable data storage device and image recording device directly connectable to computer usb port
JPH09200668A (en) Image pickup device
JP2001045178A (en) Image transmission method, image transmission system, electronic camera and image transmitter
JP2003169243A (en) Cradle, and cradle system
JP2010193207A (en) Mobile information terminal, image information management method, and image information management program

Legal Events

Date Code Title Description
FPAY Fee payment

Year of fee payment: 8

FPAY Fee payment

Year of fee payment: 12