US20130298016A1 - Multi-cursor transcription editing - Google Patents

Multi-cursor transcription editing Download PDF

Info

Publication number
US20130298016A1
US20130298016A1 US13/934,527 US201313934527A US2013298016A1 US 20130298016 A1 US20130298016 A1 US 20130298016A1 US 201313934527 A US201313934527 A US 201313934527A US 2013298016 A1 US2013298016 A1 US 2013298016A1
Authority
US
United States
Prior art keywords
text
audio
cursor
portions
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US13/934,527
Inventor
Benjamin Chigier
Edward A. Brody
Daniel Edward Chernin
Roger S. Zimmerman
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nuance Communications Inc
Original Assignee
Nuance Communications Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nuance Communications Inc filed Critical Nuance Communications Inc
Priority to US13/934,527 priority Critical patent/US20130298016A1/en
Publication of US20130298016A1 publication Critical patent/US20130298016A1/en
Assigned to ESCRIPTION, INC. reassignment ESCRIPTION, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BRODY, EDWARD, CHEMIN, DANIEL, CHIGIER, BENJAMIN, ZIMMERMAN, ROGER S.
Assigned to NUANCE COMMUNICATIONS, INC. reassignment NUANCE COMMUNICATIONS, INC. MERGER (SEE DOCUMENT FOR DETAILS). Assignors: ESCRIPTION, INC.
Assigned to NUANCE COMMUNICATIONS, INC. reassignment NUANCE COMMUNICATIONS, INC. NUNC PRO TUNC ASSIGNMENT (SEE DOCUMENT FOR DETAILS). Assignors: ESCRIPTION, INC.
Abandoned legal-status Critical Current

Links

Images

Classifications

    • G06F17/24
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/221Announcement of recognition results

Definitions

  • the records include patient notes that reflect information that a doctor or other person adds to a patient record after a given diagnosis, patient interaction, lab test or the like.
  • dictation voice mailbox when a doctor completes an interaction with a patient, the doctor calls a dictation voice mailbox, and dictates the records of the interaction with the patient.
  • the voice mailbox is later accessed by a medical transcriptionist who listens to the audio and transcribes the audio into a text record.
  • the playback of the audio data from the voice mailbox may be controlled by the transcriptionist through a set of foot pedals that mimic the action of the “forward”, “play”, and “rewind” buttons on a tape player. Should a transcriptionist hear an unfamiliar word, the standard practice is to stop the audio playback and look up the word in a printed dictionary.
  • the medical transcriptionist's time is less costly for the hospital than the doctor's time, and the medical transcriptionist is typically much more familiar with the computerized record-keeping systems than the doctor is, so this system offers a significant overall cost saving to the hospital.
  • Expedient processing of doctor's dictation is often desirable so that records can be passed between one part of a healthcare institution and another (such as from Radiology to Surgery), or so that records can be passed to another institution if the next step in a patient's care requires that the patient be moved to another facility.
  • accuracy of medical transcriptions is of paramount importance. A mistake in a medical transcription could mean the difference between life and death.
  • transcribing doctor's orders for such procedures as chemotherapy and radiation therapy for cancer patients an elaborate system of double-checking by separate people is standard to mitigate risk.
  • the invention provides a device for use by a transcriptionist in a transcription editing system for editing transcriptions dictated by speakers, the device including, in combination, a monitor configured to display visual text of transcribed dictations, an audio mechanism configured to cause playback of portions of an audio file associated with a dictation, and a cursor-control module coupled to the audio mechanism and to the monitor and configured to cause the monitor to display multiple cursors in the text.
  • the cursor-control module is configured to cause the monitor to display multiple cursors in the text that indicate different functionality.
  • the cursor-control module is configured to cause the monitor to display an audio cursor accentuating a portion of the text, the audio cursor accentuating different text as the audio file is played using the audio mechanism, and a text cursor indicative of a position in the text where editing commands will be implemented.
  • the audio cursor comprises at least one of a rectangular box surrounding text corresponding to a portion of the audio file, a rectangular box surrounding a line of text, a vertical line, an inverse-video portion of the monitor, and bolding of a portion of the text.
  • the cursor-control module is configured to determine wherein to cause the monitor to display the audio cursor by using a token-alignment file that associates portions of the audio file with portions of the text.
  • the cursor-control module is configured to move at least one of the audio cursor and the text cursor to a location of the other of the text cursor and the audio cursor, respectively.
  • the audio mechanism is configured to determine and play a portion of the audio file corresponding to text at the location of the audio cursor when the audio cursor is moved to the location of the text cursor.
  • the device further includes a change-recording apparatus configured to record changes made to the text and associate the changes with portions of the audio file whereby the recorded changes can be used to adapt speech recognition apparatus in accordance with the changed text and the associated portions of the audio file.
  • the invention provides a computer program product residing on a computer-readable medium and including computer-readable instructions for causing a computer to display visual text of transcribed dictations, cause playback of portions of an audio file associated with a dictation, and cause the monitor to display multiple cursors in the text.
  • Implementations of the invention may include one or more of the following features.
  • the instructions are configured to cause the monitor to display an audio cursor accentuating a portion of the text with the audio cursor accentuating different text as the audio file is played, and a text cursor indicative of a position in the text where editing commands will be implemented.
  • the cursor-control module is configured to determine where to cause the monitor to display the audio cursor by using a token-alignment file that associates portions of the audio file with portions of the text.
  • the computer program product further includes instructions for causing the computer to move at least one of the audio cursor and the text cursor to a location of the other of the text cursor and the audio cursor, respectively.
  • the computer program product further includes instructions for causing the computer to determine and cause playing of a portion of the audio file corresponding to text at the location of the audio cursor when the audio cursor is moved to the location of the text cursor.
  • the computer program product further includes instructions for causing the computer to record changes made to the text and associate the changes with portions of the audio file whereby the recorded changes can be used to adapt speech recognition apparatus in accordance with the changed text and the associated portions of the audio file.
  • the invention provides a method of processing text transcribed from an audio file, the method including displaying text of a transcribed dictation on a monitor, playing portions of an audio file associated with the dictation, displaying an audio cursor in the text on the monitor, the audio cursor accentuating a portion of the text with the audio cursor accentuating different text as the audio file is played, and displaying a text cursor in the text on the monitor, the text cursor being indicative of a position in the text where editing commands will be implemented.
  • Implementations of the invention may include one or more of the following features.
  • the method further includes using a token-alignment file that associates portions of the audio file with portions of the text to determine where to display the audio cursor.
  • the method further includes moving at least one of the audio cursor and the text cursor to a location of the other of the text cursor and the audio cursor, respectively, in response to receiving a corresponding command.
  • the method further includes playing of a portion of the audio file corresponding to text at the location of the audio cursor if the audio cursor is moved to the location of the text cursor.
  • the method further includes recording changes made to the text, and associating the changes with portions of the audio file.
  • the method further includes using the recorded changes to adapt speech recognition apparatus in accordance with the changed text and the associated portions of the audio file.
  • the invention provides a method of processing a recorded dictation, the method including analyzing the recorded dictation in accordance with speech models to convert the recorded dictation to a draft text, storing the draft text, and producing and recording a token-alignment file that associates portions of the draft text with portions of the audio file, the token-alignment file including tokens at least some of which are indicative of portions of the draft text, the tokens indicating beginnings and ends of portions of the recorded dictation associated with the portions of the draft text such that the portions of the recorded dictation are associated with corresponding portions of the draft text even if the corresponding portions of the draft text, if spoken, do not correspond identically to the corresponding portions of the recorded dictation.
  • Implementations of the invention may include one or more of the following features.
  • Producing and recording the token-alignment file includes producing and recording tokens for which there is no corresponding draft text.
  • the method further includes receiving a revised text associated with the recorded dictation, and using indicia of differences between the revised text and the draft text and the associated recorded dictation to modify the speech models for converting other recorded dictations to other draft texts.
  • Various aspects of the invention may provide one or more of the following capabilities.
  • the cost of medical transcription can be reduced and/or the accuracy of medical transcription increased.
  • the expediency and turn-around time of medical transcription can be improved.
  • Editing of transcriptions can be performed faster than with previous techniques.
  • Transcribed text can be edited during playback of transcribed audio. Text other than that associated with audio currently being played can be edited without stopping playback of audio associated with a text document.
  • Transcribed text can be selected and its corresponding audio played, e.g., regardless of a current portion of audio being played or having last been played.
  • Transcriptionist productivity can be improved. Transcriptionist fatigue can be reduced.
  • FIG. 1 is a simplified diagram of a system for transcribing dictations and editing corresponding transcriptions.
  • FIG. 2 is a simplified block diagram of an editing device of the system shown in FIG. 1 .
  • FIGS. 3-5 are portions of a transcribed document showing exemplary embodiments of audio and text cursors.
  • FIG. 6 is a block flow diagram of a process of producing and editing a transcription.
  • FIG. 7 is a block flow diagram of a process of reviewing a draft transcribed document.
  • FIG. 8 is a block flow diagram of a process of editing the draft transcribed document.
  • Embodiments of the invention can provide multiple cursors for use in editing text documents each of which is associated with a digital audio signal of speech to be transcribed.
  • An audio cursor is provided that highlights text associated with corresponding audio being played.
  • the audio cursor tracks the audio signal to help the transcriptionist follow along visually with the text as the associated audio plays.
  • a text cursor can be manipulated independently of the audio cursor by a transcriptionist.
  • the text cursor indicates the location of editing to the transcribed text, e.g., through a keyboard.
  • the text cursor can be positioned and edits to the text made and/or the audio cursor made to coincide with the text cursor and have the corresponding audio played.
  • a transcriptionist can process multi-modal inputs and reduce the amount of time the transcriptionist would use to review and revise draft documents using previous techniques. Other embodiments are within the scope of the invention.
  • a system 10 for transcribing audio and editing transcribed audio includes a speaker/person 12 , a communications network, 14 , a voice mailbox system 16 , and administrative console 18 , an editing device 20 , a communications network 22 , a database server 24 , a communications network 26 , and an automatic transcription device 30 .
  • the network 14 is preferably a public switched telephone network (PSTN) although other networks, including packet-switched networks could be used, e.g., if the speaker 12 uses an Internet phone for dictation.
  • the network 22 is preferably a packet-switched network such as the global packet-switched network known as the Internet.
  • the network 26 is preferably a packet-switched, local area network (LAN). Other types of networks may be used, however, for the networks 14 , 22 , 26 , or any or all of the networks 14 , 22 , 26 may be eliminated, e.g., if items shown in FIG. 1 are combined or eliminated.
  • the voice mailbox system 16 , the administrative console 18 , and the editing device 20 are situated “off site” from the database server 24 and the automatic transcription device 30 .
  • These systems/devices 16 , 18 , 20 could be located “on site,” and communications between them took place, e.g., over a local area network.
  • the network 14 is configured to convey dictation from the speaker 12 to the voice mailbox system 16 .
  • the speaker 12 dictates into an audio transducer such as a telephone, and the transduced audio is transmitted over the telephone network 14 into the voice mailbox system 16 , such as the IntelliscriptTM product made by eScriptionTM of Needham, Mass.
  • the speaker 12 may, however, use means other than a standard telephone for creating a digital audio file for each dictation.
  • the speaker 12 may dictate into a handheld PDA device, that includes its own digitization mechanism for storing the audio file.
  • the speaker 12 may use a standard “dictation station,” such as those provided by many vendors. Still other devices may be used by the speaker 12 for dictating, and possibly digitizing the dictation, and sending it to the voice mailbox system 16 .
  • the voice mailbox system 16 is configured to digitize audio from the speaker 12 to produce a digital audio file of the dictation.
  • the system 16 may use the IntelliscriptTM product made by eScription.
  • the voice mailbox system 16 is further configured to prompt the speaker 12 to enter an identification code and a worktype code.
  • the speaker 12 can enter the codes, e.g., by pressing buttons on a telephone to send DTMF tones, or by speaking the codes into the telephone.
  • the system 16 may provide speech recognition to convert the spoken codes into a digital identification code and a digital worktype code.
  • the mailbox system 16 is further configured to store the identifying code and the worktype code in association with the dictation.
  • the system 16 preferably prompts the speaker 12 to provide the worktype code at least for each dictation related to the medical field.
  • the worktype code designates a category of work to which the dictation pertains, e.g., for medical applications this could include Office Note, Consultation, Operative Note, Discharge Summary, Radiology report, etc.
  • the voice mailbox system 16 is further configured to transmit the digital audio file and speaker identification code over the network 22 to the database server 24 for storage. This transmission is accomplished by the system 16 product using standard network transmission protocols communicating with the database server 24 .
  • the database server 24 is configured to store the incoming data from the voice mailbox system 16 , as well as from other sources.
  • the database server 24 may include the EditScript ServerTM database product from eScription.
  • Software of the database server is configured to produce a database record for the dictation, including a file pointer to the digital audio data, and a field containing the identification code for the speaker 12 . If the audio and identifying data are stored on a PDA, the PDA may be connected to a computer running the HandiScriptTM software product made by eScription that will perform the data transfer and communication with the database server 24 to enable a database record to be produced for the dictation.
  • a “servlet” application 32 that includes an in-memory cached representation of recent database entries.
  • the servlet 32 is configured to service requests from the voice mailbox system 16 , the automatic transcription device, the editing device 20 , and the administrative console 18 , reading from the database when the servlet's cache does not contain the required information.
  • the servlet 32 includes a separate software module that helps ensure that the servlet's cache is synchronized with the contents of the database. This helps allow the database to be off-loaded of much of the real-time data-communication and to grow to be much larger than otherwise possible. For simplicity, however, the below discussion does not refer to the servlet, but all database access activities may be realized using the servlet application 32 as an intermediary.
  • the automatic transcription device 30 may access the database 40 in the database server 24 over the data network 26 for transcribing the stored dictation.
  • the automatic transcription device 30 uses an automatic speech recognition (ASR) device (e.g., software) to produce a draft transcription for the dictation.
  • ASR automatic speech recognition
  • An example of ASR technology is the AutoScriptTM product made by eScription, that also uses the speaker and, optionally, worktype identifying information to access speaker and speaker-worktype dependent ASR models with which to perform the transcription.
  • the device 30 transmits the draft transcription over the data network 26 to the database server 24 for storage in the database and to be accessed, along with the digital audio file, by the editing device 20 .
  • the device 30 is further configured to affect the presentation of the draft transcription.
  • the device 30 as part of speech recognition or as part of post-processing after speech recognition, can add or change items affecting document presentation such as formats, abbreviations, and other text features.
  • the device 30 includes a speech recognizer and may also include a post-processor for performing operations in addition to the speech recognition, although the speech recognizer itself may perform some or all of these additional functions.
  • the transcription device 30 is further configured to produce a token-alignment file that synchronizes the audio with the corresponding text.
  • This file comprises a set of token records, with each record preferably containing a token, a begin index, and an end index.
  • the token comprises a character or a sequence of characters that are to appear on the screen during a word-processing session, or one or more sounds that may or may not appear as text on a screen.
  • a begin index comprises an array reference into the audio file corresponding to the place in the audio file where the corresponding token begins.
  • the end index comprises an array reference into the digital audio file corresponding to the point in the audio file where the corresponding token ends. As an alternative, the end index may not exist separately, with it being assumed that the starting point of the next token (the next begin index) is also the ending point of the previous token.
  • the transcription device 30 can store the token-alignment file in the database 40 .
  • the token-alignment file may contain further information, such as a display indicator and/or a playback indicator.
  • the display indicator's value indicates whether the corresponding token is to be displayed, e.g., on a computer monitor, while the transcription is being edited.
  • non-displayed tokens can help facilitate editing of the transcription while maintaining synchronization between on-screen tokens and the digital audio file.
  • a speaker may use an alias, e.g., for a heading, and standard heading (e.g., Physical Examination) may be displayed while the words actually spoken by the speaker (e.g., “On exam today”) are audibly played but not displayed as text (hidden).
  • the playback indicator's value indicates whether the corresponding token has audio associated with the token.
  • the playback indicator can also help facilitate editing the transcription while maintaining synchronization between on-screen tokens and the digital audio file.
  • the playback indicator's value may be adjusted dynamically during audio playback, e.g., by input from the transcriptionist. The adjustment may, e.g., cause audio associated with corresponding tokens (e.g., hesitation words) to be skipped partially or entirely, that may help increase the transcriptionist's productivity.
  • the tokens stored in the token-alignment file may or may not correspond to words.
  • a token may represent one or more characters that appear on a display during editing of the transcription, or sounds that occur in the audio file.
  • the written transcription may have a different form and/or format than the exact words that were spoken by the person 12 .
  • a token may represent conventional words such as “the,” “patient,” or “esophagogastroduodenoscopy,” multiple words, partial words, abbreviations or acronyms, numbers, dates, sounds (e.g., a cough, a yawn, a bell), absence of sound (silence), etc.
  • the speaker 12 may say “USA” and the automatic transcription device 30 may interpret and expand this into “United States of America.”
  • the token is “United States of America” and the begin index would point to the beginning of the audio signal for “USA” and, if the token-alignment file uses end indexes, the end index would point to the end of the audio signal “USA.”
  • the speaker 12 might say “April 2 of last year,” and the text might appear on the display as “04/02/2003.”
  • the tokens can synchronize the text “04/02/2003” with the audio of “April 2 of last year.”
  • the speaker 12 might say “miles per hour” while the text is displayed as “MPH.”
  • Tokens preferably have variable lengths, with different tokens having different length
  • the token-alignment file provides an environment with many features. Items may appear on a screen but not have any audio signal associated with them (e.g., implicit titles and headings). Items may have audio associated with them and may appear on the screen but may not appear as words (e.g., numeric tokens such as “120/88”). Items may have audio associated with them, appear on the screen, and appear as words contained in the audio (e.g., “the patient showed delayed recovery”). Multiple words may appear on the screen corresponding to audio that is an abbreviated form of what appears on the screen (e.g., “United States of America” may be displayed corresponding to audio of “USA”). Items may have audio associated with them but not have corresponding symbols appear on the screen (e.g., a cough, an ending salutation such as “that's all,” commands or instructions to the transcriptionist such as “start a new paragraph,” etc.).
  • the editing device 20 is configured to be used by a transcriptionist to access and edit the draft transcription stored in the database of the database server 24 .
  • the editing device 20 includes a computer (e.g., display, keyboard, mouse, monitor, memory, and a processor, etc.), an attached foot-pedal, and appropriate software such as the EditScriptTM software product made by eScription.
  • the transcriptionist can request a dictation job by, e.g., clicking on an on-screen icon.
  • the request is serviced by the database server 24 , that finds the dictation for the transcriptionist, and transmits the corresponding audio file and the draft transcription text file.
  • the transcriptionist edits the draft using the editing device 20 and sends the edited transcript back to the database server 24 .
  • the transcriptionist can click on an on-screen icon button to instruct the editing device 20 to send the final edited document to the database server 24 via the network 22 , along with a unique identifier for the transcriptionist.
  • the database in the server 24 contains, for each dictation: a speaker identifier, a transcriptionist identifier, a file pointer to the digital audio signal, and a file pointer to the edited text document.
  • the edited text document can be transmitted directly to a customer's medical record system or accessed over the data network 22 from the database by the administrative console 18 .
  • the console 18 may include an administrative console software product such as EmonTM made by eScription.
  • components of the editing device 20 include a database interaction module 40 , a user interface 42 , a word processor module 44 , an audio playback module 46 , an audio file pointer 48 , a cursor module 50 , a monitor 52 , and an audio device 54 .
  • a computer implementing portions of the editing device 20 includes a processor and memory that stores appropriate computer-readable, computer-executable software code instructions that can cause the processor to execute appropriate instructions for performing functions described.
  • the monitor 52 and audio device 54 e.g., speakers, are physical components while the other components shown in FIG. 2 are functional components that may be implemented with software, hardware, etc., or combinations thereof.
  • the audio playback device 46 such as a SoundBlaster® card, is attached to the audio output transducer 54 such as speakers or headphones.
  • the transcriptionist can use the audio device 54 (e.g., headphones or a speaker) to listen to audio and can view the monitor 52 to see the corresponding text.
  • the transcriptionist can use the foot pedal 66 , the keyboard 62 , and/or the mouse 64 to control the audio playback.
  • the database interaction, audio playback, and editing of the draft transcription is accomplished by means of the appropriate software such as the EditScript ClientTM software product made by eScription.
  • the editing software is loaded on the editing device computer 20 and configured appropriately for interaction with other components of the editing device 20 .
  • the editing software can use a standard word processing software library, such as that provided with Microsoft Word®, in order to load, edit and save documents corresponding to each dictation.
  • the editing software includes the database interaction module 40 , the user interface module 42 , the word processing module 44 , the audio playback module 46 , the audio file pointer adjustment module 48 , and the multi-cursor control module 50 .
  • the control module 50 regulates the interaction between the interface module 42 and the word processor 44 , the audio playback module 46 , and the audio file pointer 48 .
  • the control module 50 regulates the flow of actions relating to processing of a transcription, including playing audio and providing cursors in the transcribed text, as discussed below especially with respect to FIG. 7 .
  • the user interface module 42 controls the activity of the other modules and includes keyboard detection 56 , mouse detection 58 , and foot pedal detection 60 sub-modules for processing input from a keyboard 62 , a mouse 64 , and a foot-pedal 66 .
  • the foot pedal 66 is a standard transcription foot pedal and is connected to the editing device computer through the computer's serial port.
  • the foot pedal 66 preferably includes a “fast forward” portion and a “rewind” portion.
  • the transcriptionist can request a job from the database by selecting on-screen icon with the mouse 64 .
  • the user interface module 42 interprets this mouse click and invokes the database interaction module 40 to request the next job from the database.
  • the database server 24 ( FIG. 1 ) responds by transmitting the audio data file, the draft transcription file, and the token-alignment file to the user interaction module 42 .
  • the editing software can initialize a word-processing session by loading the draft text into the word processing module 44 .
  • the audio playback module 46 is configured to play the audio file stored in the database. For initial playback, the module 46 plays the audio file sequentially. The playback module 46 can, however, jump to audio corresponding to an indicated portion of the transcription and begin playback from the indicated location. The location may be indicated by a transcriptionist using appropriate portions of the editing device 20 such as the keyboard 62 , or the mouse 64 as discussed below. For playback that starts at an indicated location, the playback module 46 uses the token-alignment file to determine the location in the audio file corresponding to the indicated transcription text. Since many audio playback programs play audio in fixed-sized sections (called “frames”), the audio playback module 46 may convert the indicated begin index to the nearest preceding frame for playback.
  • frames fixed-sized sections
  • an audio device 54 may play only frames of 128 bytes in length.
  • the audio playback module uses the token-alignment file to find the nearest prior starting frame that is a multiple of 128 bytes from the beginning of the audio file.
  • the starting point for audio playback may not correspond precisely to the selected text in the transcription.
  • the transcriptionist can review and edit a document by appropriately controlling portions of the editing device 20 .
  • the transcriptionist can regulate the playback using the foot pedal 66 , and listen to the audio corresponding to the text as played by the playback module 46 and converted to sound by the audio device 54 . Further, the transcriptionist can move a cursor to a desired portion of the display of the monitor 52 using the keyboard 62 and/or mouse 64 , and can make edits at the location of the cursor using the keyboard 62 and/or mouse 64 .
  • the user interface module 42 can service hardware interrupts from all three of its sub-modules 56 , 58 , 60 .
  • the transcriptionist can use the foot pedal 66 to indicate to that the audio should be “rewound,” or “fast-forwarded” to a different time point in the dictation.
  • These foot-pedal presses are serviced as hardware interrupts by the user interaction module 42 .
  • Most standard key presses and on-document mouse-clicks are sent to the word processing module 44 to perform the document editing functions indicated and to update the monitor display.
  • Some user interaction may be directed to the audio-playback oriented modules 46 , 48 , 50 , e.g., cursor control, audio position control, and/or volume control.
  • the transcriptionist may indicate that editing is complete by clicking another icon. In response to such an indication, the final text file is sent through the database interaction module 42 to the database server 24 .
  • the cursor module 50 is configured to provide an audio cursor 70 and a text cursor 72 on the monitor 52 in conjunction with the display of the draft transcription 74 for editing by the transcriptionist.
  • the cursor module 50 provides the cursors 70 and 72 independently.
  • the audio cursor 70 under the control of the cursor module 50 , tracks the text in the document 74 as the corresponding audio is played to help the transcriptionist follow along in the text 74 with the corresponding audio.
  • the audio cursor 70 moves in conjunction with the audio, as linked to the text 74 by the token-alignment file, to help the transcriptionist follow the text 74 corresponding to the currently-played audio.
  • the audio cursor 70 may take a variety of different forms. For example, as shown in FIG. 3 , the audio cursor provides a box 76 around the text of the token corresponding to the audio presently being played.
  • the box 76 may also take a variety of forms to distinguish it from other portions of the document 74 , such as a rectangular outline of the box 76 , and/or a solid box (e.g., inverse video), and may be of a variety of colors such as red against black letters on a white background.
  • the audio cursor 70 may be a box 78 that highlights the entire line (or lines) of text that includes the text of the token corresponding to the audio currently being played.
  • the text cursor 72 could be a box 80 , e.g., of a single character in width.
  • a text cursor 73 indicates other possible features of a text cursor, including that a text cursor can highlight an entire word and can be positioned within text highlighted by the audio cursor 70 .
  • FIG. 4 illustrates that more than two cursors could be provided.
  • the audio cursor 70 could be a vertical line cursor 82 that highlights text, e.g., the beginning of the text of the token currently being played, or the beginning of the line of text including the token currently being played.
  • Other possibilities include using highlighting capabilities or bold characters to transiently emphasize a word, series of words, or line(s) of text.
  • Still other forms of the audio cursor 70 may be used.
  • the audio cursor 70 is precisely aligned with the currently-played audio, but the cursor 70 may approximate the audio, e.g., with groups of words or one or more entire lines of text being indicated by the audio cursor 70 .
  • the text cursor 72 provided by the cursor module 50 indicates the current location for editing in the document 74 .
  • the transcriptionist can manipulate the keyboard 62 and/or mouse 64 to control the location of the text cursor 74 .
  • the cursor 74 indicates where editing will occur, e.g., addition of text through the keyboard 62 , deletion of text, alteration of formatting, insertion of paragraph or page breaks, etc.
  • the transcriptionist can edit the document using the text cursor 72 in standard fashion.
  • the text cursor 72 in combination with the audio cursor 70 provides for multi-tasking by the transcriptionist. To make edits, the transcriptionist positions the text cursor 72 in standard fashion and makes the desired change(s).
  • Edits to the text 74 can be made without losing synchronization with the audio. Changes to the text 74 are tracked, with records being made of which characters or other edits are inserted and where, and which characters or other features (e.g., editing, page breaks, etc.) are removed.
  • the word processor 44 implements a track-changes feature, maintaining the original document and storing indications of changes.
  • the track-changes feature implemented by the word processor 44 produces a file of changes (e.g., textual, formatting, etc.) to the original text 74 .
  • the information regarding these changes especially text changes such as different expansions of abbreviations, different spellings, etc., may be used to adapt the speech recognizer 30 .
  • the file of changes provides a useful tool for continuous learning/improvement of speech models used for speech recognition by the automatic transcription device 30 .
  • the text cursor 72 may be used to change the location of the audio cursor 70 , and thus the audio currently played through commands, e.g., from the keyboard 62 and/or the mouse 64 , implemented by the cursor control module 50 . Movement to a different part of the audio is typically implemented by the audio file pointer module 48 by incrementing or decrementing a pointer into the digital audio file. The location of the audio cursor 70 and thus the current audio for playback, however, may be changed using the text cursor 72 . The transcriptionist can position the text cursor 72 to the desired portion of the text 74 for audio playback and actuate appropriate commands.
  • the transcriptionist may use one or more hot keys (e.g., a sequence of keys) and/or one or more mouse clicks (e.g., on screen icons) to cause the audio cursor 70 to move to the position of the text cursor 72 , with the audio file pointer being adjusted accordingly.
  • the correct position in the audio file is determined by the audio file pointer module 48 by finding the corresponding token in the token-alignment file.
  • the corresponding token may be a nearest, preferably preceding, token that is associated with text in the document 74 .
  • the audio file pointer module 48 uses track-changes information from the word processor 44 to determine the appropriate token. The module 48 determines that the text at the position of the text cursor 72 is not in the token-alignment file, and finds the token in the token-alignment file that is nearest, and preferably preceding, the inserted text using information regarding the original document from the track-changes information.
  • the text cursor 72 may also be moved to the position of the audio cursor 70 .
  • one or more hot keys and/or one or more mouse clicks can be used to cause the text cursor 72 to jump from its current position to a position at, adjacent, or near the position of the audio cursor 70 .
  • the transcriptionist can cause the text cursor 72 to jump to the location of the audio cursor 70 to quickly position the text cursor 72 for editing of the desired text.
  • the text cursor 72 can highlight the text highlighted by the audio cursor 70 such that text entered by the transcriptionist will overwrite the highlighted text, obviating deletion of the text by the transcriptionist and thereby saving time.
  • a process 90 for producing and editing a transcription of speech using the system 10 includes the stages shown.
  • the process 90 is exemplary only and not limiting.
  • the process 90 may be altered, e.g., by having stages added, removed, or rearranged.
  • the speaker 12 dictates desired speech to be converted to text.
  • the speaker can use, e.g., a hand-held device such as a personal digital assistant, to dictate audio that is transmitted over the network 14 to the voice mailbox 16 .
  • the audio is stored in the voice mailbox 16 as an audio file.
  • the audio file is transmitted over the network 22 to the database server 24 and is stored in the database 40 .
  • the automatic transcription device 30 transcribes the audio file.
  • the device 30 accesses and retrieves the audio file from the database 40 through the LAN 26 .
  • a speech recognizer of the device 30 analyzes the audio file in accordance with speech models to produce a draft text document 74 from the audio file and store the draft document 74 in the database 40 .
  • the device 30 also produces a corresponding token-alignment file that includes the draft document 74 and associates portions of the audio file with the transcribed text of the document 74 .
  • the device 30 stores the token-alignment file in the database 40 via the LAN 26 .
  • the transcriptionist reviews and edits the transcribed draft document 74 as appropriate.
  • the transcriptionist uses the editing device 20 to access the database 40 and retrieve the audio file and the token-alignment file that includes the draft text document 74 .
  • the transcriptionist plays the audio file and reviews the corresponding text as highlighted or otherwise indicated by the audio cursor 70 and makes desired edits using the text cursor 72 .
  • the reviewing of this stage is detailed below with respect to FIG. 7 .
  • the word processor 44 produces and stores track-changes information in response to edits made by the transcriptionist.
  • the track-changes information is provided to the automatic transcription device 30 for use in improving the speech models used by the speech recognizer of the device 30 by analyzing the transcribed draft text and what revisions were made by the transcriptionist.
  • the models can be adjusted so that the next time the speech recognizer analyzes speech that was edited by the transcriptionist, the recognizer will transcribe the same or similar audio to the edited text instead of the draft text previously provided.
  • the word processor provides a final, revised text document as edited by the transcriptionist. This final document can be stored in the database 40 and provided via the network 22 to interested parties, e.g., the speaker that dictated the audio file.
  • a process 110 for reviewing the draft transcribed document 74 , stage 86 of FIG. 6 , using the editing device 20 includes the stages shown.
  • the process 110 is exemplary only and not limiting.
  • the process 110 may be altered, e.g., by having stages added, removed, or rearranged.
  • a token in the token-alignment file is obtained.
  • the next token in the file is obtained in the normal course of audio playback in the absence of transcriptionist input. If, however, the transcriptionist causes a change in the location of the audio cursor, then the token corresponding to the new location of the audio cursor is obtained.
  • the text most nearly associated with the current token is located.
  • This text may be text associated with a token adjacent to the current token, e.g., if the current token does not have text directly associated with it (e.g., a cough). Text entered by the transcriptionist is ignored in determining the most-nearly-associated text.
  • the cursor control module 50 displays the audio cursor 70 to accentuate the text determined to be most nearly associated with the current token.
  • the control module 50 draws the audio cursor 70 to highlight the text, e.g., drawing the cursor 70 around, near, etc., the determined text.
  • the location of the text corresponding to tokens may be determined dynamically as the token-alignment file is stepped through in order to display the audio cursor 70 .
  • locations e.g., within a document or on a screen
  • the locations can be re-calculated for added or removed text (on the fly when the text is changed, after changes are made, in response to a re-determine command, etc.).
  • Other alternatives are also possible.
  • the audio file pointer module 48 determines the position in the audio file corresponding to the current token.
  • the module 48 uses the token-alignment file and the selected token to find the location in the audio file corresponding to the current token.
  • the audio file pointer module 48 selects a portion of the audio file for playback.
  • the module 48 selects a frame of audio associated with the token for submission to the audio playback module 46 .
  • the audio playback module 46 controls playback of the selected audio frame.
  • the module 46 provides control signals to the audio device 54 to audibly play the corresponding audio for the transcriptionist to hear.
  • a process 130 for editing the draft transcribed document 74 , stage 86 of FIG. 6 , using the editing device 20 includes the stages shown.
  • the process 130 is exemplary only and not limiting.
  • the process 130 may be altered, e.g., by having stages added, removed, or rearranged.
  • the transcriptionist positions the text cursor 72 as desired for editing of the document 74 .
  • the transcriptionist can move the text cursor 72 independently of the audio cursor 74 , e.g., using the keyboard 62 and/or mouse 64 .
  • the transcriptionist may also, or alternatively, move the text cursor 72 dependent upon the audio cursor 70 by causing the text cursor 72 to move to, or near to, the position of the audio cursor 70 .
  • the audio corresponding to the location of the text cursor 72 is played if the audio cursor 70 is synched to the text cursor 72 . If the transcriptionist causes the audio cursor 70 to move to the location of the text cursor 72 , then the audio for the new location of the audio cursor 70 is preferably played to assist the transcriptionist determine whether edits to the text are desired.
  • desired edits to the text 74 at the location of the text cursor 72 are made by the transcriptionist.
  • edits can be made as indicated by the transcriptionist (e.g., using the keyboard 62 ) and implemented by the word processor 44 .
  • the audio may continue to play while changes are being made at the location of the text cursor 72 .
  • the transcriptionist may, however, stop the audio playback using, e.g., the foot pedal 66 , keyboard commands, etc.
  • the audio playback may be managed independently of editing of the text 74 .

Abstract

A device, for use by a transcriptionist in a transcription editing system for editing transcriptions dictated by speakers, includes, in combination, a monitor configured to display visual text of transcribed dictations, an audio mechanism configured to cause playback of portions of an audio file associated with a dictation, and a cursor-control module coupled to the audio mechanism and to the monitor and configured to cause the monitor to display multiple cursors in the text.

Description

    BACKGROUND OF THE INVENTION
  • Healthcare costs in the United States account for a significant share of the GNP. The affordability of healthcare is of great concern to many Americans. Technological innovations offer an important leverage to reduce healthcare costs.
  • Many Healthcare institutions require doctors to keep accurate and detailed records concerning diagnosis and treatment of patients. Motivation for keeping such records include government regulations (such as Medicare and Medicaid regulations), desire for the best outcome for the patient, and mitigation of liability. The records include patient notes that reflect information that a doctor or other person adds to a patient record after a given diagnosis, patient interaction, lab test or the like.
  • Record keeping can be a time-consuming task, and the physician's time is valuable. The time required for a physician to hand-write or type patient notes can represent a significant expense. Verbal dictation of patient notes offers significant time savings to physicians, and is becoming increasingly prevalent in modern healthcare organizations.
  • Over time, a significant industry has evolved around the transcription of medical dictation. Several companies produce special-purpose voice mailbox systems for storing medical dictation. These centralized systems hold voice mailboxes for a large number of physicians, each of whom can access a voice mailbox by dialing a phone number and putting in his or her identification code. These dictation voice mailbox systems are typically purchased or shared by healthcare institutions. Prices can be over $100,000 per voice mailbox system. Even at these prices, these centralized systems save healthcare institutions vast sums of money over the cost of maintaining records in a more distributed fashion.
  • Using today's voice mailbox medical dictation systems, when a doctor completes an interaction with a patient, the doctor calls a dictation voice mailbox, and dictates the records of the interaction with the patient. The voice mailbox is later accessed by a medical transcriptionist who listens to the audio and transcribes the audio into a text record. The playback of the audio data from the voice mailbox may be controlled by the transcriptionist through a set of foot pedals that mimic the action of the “forward”, “play”, and “rewind” buttons on a tape player. Should a transcriptionist hear an unfamiliar word, the standard practice is to stop the audio playback and look up the word in a printed dictionary.
  • The medical transcriptionist's time is less costly for the hospital than the doctor's time, and the medical transcriptionist is typically much more familiar with the computerized record-keeping systems than the doctor is, so this system offers a significant overall cost saving to the hospital.
  • Expedient processing of doctor's dictation is often desirable so that records can be passed between one part of a healthcare institution and another (such as from Radiology to Surgery), or so that records can be passed to another institution if the next step in a patient's care requires that the patient be moved to another facility. In addition to being timely, accuracy of medical transcriptions is of paramount importance. A mistake in a medical transcription could mean the difference between life and death. In transcribing doctor's orders for such procedures as chemotherapy and radiation therapy for cancer patients, an elaborate system of double-checking by separate people is standard to mitigate risk.
  • SUMMARY OF THE INVENTION
  • In general, in an aspect, the invention provides a device for use by a transcriptionist in a transcription editing system for editing transcriptions dictated by speakers, the device including, in combination, a monitor configured to display visual text of transcribed dictations, an audio mechanism configured to cause playback of portions of an audio file associated with a dictation, and a cursor-control module coupled to the audio mechanism and to the monitor and configured to cause the monitor to display multiple cursors in the text.
  • Implementations of the invention may include one or more of the following features. The cursor-control module is configured to cause the monitor to display multiple cursors in the text that indicate different functionality. The cursor-control module is configured to cause the monitor to display an audio cursor accentuating a portion of the text, the audio cursor accentuating different text as the audio file is played using the audio mechanism, and a text cursor indicative of a position in the text where editing commands will be implemented. The audio cursor comprises at least one of a rectangular box surrounding text corresponding to a portion of the audio file, a rectangular box surrounding a line of text, a vertical line, an inverse-video portion of the monitor, and bolding of a portion of the text. The cursor-control module is configured to determine wherein to cause the monitor to display the audio cursor by using a token-alignment file that associates portions of the audio file with portions of the text. The cursor-control module is configured to move at least one of the audio cursor and the text cursor to a location of the other of the text cursor and the audio cursor, respectively. The audio mechanism is configured to determine and play a portion of the audio file corresponding to text at the location of the audio cursor when the audio cursor is moved to the location of the text cursor. The device further includes a change-recording apparatus configured to record changes made to the text and associate the changes with portions of the audio file whereby the recorded changes can be used to adapt speech recognition apparatus in accordance with the changed text and the associated portions of the audio file.
  • In general, in another aspect, the invention provides a computer program product residing on a computer-readable medium and including computer-readable instructions for causing a computer to display visual text of transcribed dictations, cause playback of portions of an audio file associated with a dictation, and cause the monitor to display multiple cursors in the text.
  • Implementations of the invention may include one or more of the following features. The instructions are configured to cause the monitor to display an audio cursor accentuating a portion of the text with the audio cursor accentuating different text as the audio file is played, and a text cursor indicative of a position in the text where editing commands will be implemented. The cursor-control module is configured to determine where to cause the monitor to display the audio cursor by using a token-alignment file that associates portions of the audio file with portions of the text. The computer program product further includes instructions for causing the computer to move at least one of the audio cursor and the text cursor to a location of the other of the text cursor and the audio cursor, respectively. The computer program product further includes instructions for causing the computer to determine and cause playing of a portion of the audio file corresponding to text at the location of the audio cursor when the audio cursor is moved to the location of the text cursor. The computer program product further includes instructions for causing the computer to record changes made to the text and associate the changes with portions of the audio file whereby the recorded changes can be used to adapt speech recognition apparatus in accordance with the changed text and the associated portions of the audio file.
  • In general, in another aspect, the invention provides a method of processing text transcribed from an audio file, the method including displaying text of a transcribed dictation on a monitor, playing portions of an audio file associated with the dictation, displaying an audio cursor in the text on the monitor, the audio cursor accentuating a portion of the text with the audio cursor accentuating different text as the audio file is played, and displaying a text cursor in the text on the monitor, the text cursor being indicative of a position in the text where editing commands will be implemented.
  • Implementations of the invention may include one or more of the following features. The method further includes using a token-alignment file that associates portions of the audio file with portions of the text to determine where to display the audio cursor. The method further includes moving at least one of the audio cursor and the text cursor to a location of the other of the text cursor and the audio cursor, respectively, in response to receiving a corresponding command. The method further includes playing of a portion of the audio file corresponding to text at the location of the audio cursor if the audio cursor is moved to the location of the text cursor. The method further includes recording changes made to the text, and associating the changes with portions of the audio file. The method further includes using the recorded changes to adapt speech recognition apparatus in accordance with the changed text and the associated portions of the audio file.
  • In general, in another aspect, the invention provides a method of processing a recorded dictation, the method including analyzing the recorded dictation in accordance with speech models to convert the recorded dictation to a draft text, storing the draft text, and producing and recording a token-alignment file that associates portions of the draft text with portions of the audio file, the token-alignment file including tokens at least some of which are indicative of portions of the draft text, the tokens indicating beginnings and ends of portions of the recorded dictation associated with the portions of the draft text such that the portions of the recorded dictation are associated with corresponding portions of the draft text even if the corresponding portions of the draft text, if spoken, do not correspond identically to the corresponding portions of the recorded dictation.
  • Implementations of the invention may include one or more of the following features. Producing and recording the token-alignment file includes producing and recording tokens for which there is no corresponding draft text. The method further includes receiving a revised text associated with the recorded dictation, and using indicia of differences between the revised text and the draft text and the associated recorded dictation to modify the speech models for converting other recorded dictations to other draft texts.
  • Various aspects of the invention may provide one or more of the following capabilities. The cost of medical transcription can be reduced and/or the accuracy of medical transcription increased. The expediency and turn-around time of medical transcription can be improved. Editing of transcriptions can be performed faster than with previous techniques. Transcribed text can be edited during playback of transcribed audio. Text other than that associated with audio currently being played can be edited without stopping playback of audio associated with a text document. Transcribed text can be selected and its corresponding audio played, e.g., regardless of a current portion of audio being played or having last been played. Transcriptionist productivity can be improved. Transcriptionist fatigue can be reduced.
  • These and other capabilities of the invention, along with the invention itself, will be more fully understood after a review of the following figures, detailed description, and claims.
  • BRIEF DESCRIPTION OF THE FIGURES
  • FIG. 1 is a simplified diagram of a system for transcribing dictations and editing corresponding transcriptions.
  • FIG. 2 is a simplified block diagram of an editing device of the system shown in FIG. 1.
  • FIGS. 3-5 are portions of a transcribed document showing exemplary embodiments of audio and text cursors.
  • FIG. 6 is a block flow diagram of a process of producing and editing a transcription.
  • FIG. 7 is a block flow diagram of a process of reviewing a draft transcribed document.
  • FIG. 8 is a block flow diagram of a process of editing the draft transcribed document.
  • DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
  • Embodiments of the invention can provide multiple cursors for use in editing text documents each of which is associated with a digital audio signal of speech to be transcribed. An audio cursor is provided that highlights text associated with corresponding audio being played. The audio cursor tracks the audio signal to help the transcriptionist follow along visually with the text as the associated audio plays. A text cursor can be manipulated independently of the audio cursor by a transcriptionist. The text cursor indicates the location of editing to the transcribed text, e.g., through a keyboard. The text cursor can be positioned and edits to the text made and/or the audio cursor made to coincide with the text cursor and have the corresponding audio played. Using embodiments of the invention, a transcriptionist can process multi-modal inputs and reduce the amount of time the transcriptionist would use to review and revise draft documents using previous techniques. Other embodiments are within the scope of the invention.
  • Referring to FIG. 1, a system 10 for transcribing audio and editing transcribed audio includes a speaker/person 12, a communications network, 14, a voice mailbox system 16, and administrative console 18, an editing device 20, a communications network 22, a database server 24, a communications network 26, and an automatic transcription device 30. Here, the network 14 is preferably a public switched telephone network (PSTN) although other networks, including packet-switched networks could be used, e.g., if the speaker 12 uses an Internet phone for dictation. The network 22 is preferably a packet-switched network such as the global packet-switched network known as the Internet. The network 26 is preferably a packet-switched, local area network (LAN). Other types of networks may be used, however, for the networks 14, 22, 26, or any or all of the networks 14, 22, 26 may be eliminated, e.g., if items shown in FIG. 1 are combined or eliminated.
  • Preferably, the voice mailbox system 16, the administrative console 18, and the editing device 20 are situated “off site” from the database server 24 and the automatic transcription device 30. These systems/ devices 16, 18, 20, however, could be located “on site,” and communications between them took place, e.g., over a local area network. Similarly, it is possible to locate the automatic transcription device 30 off-site, and have the device 30 communicate with the database server 24 over the 22.
  • The network 14 is configured to convey dictation from the speaker 12 to the voice mailbox system 16. Preferably, the speaker 12 dictates into an audio transducer such as a telephone, and the transduced audio is transmitted over the telephone network 14 into the voice mailbox system 16, such as the Intelliscript™ product made by eScription™ of Needham, Mass. The speaker 12 may, however, use means other than a standard telephone for creating a digital audio file for each dictation. For example, the speaker 12 may dictate into a handheld PDA device, that includes its own digitization mechanism for storing the audio file. Or, the speaker 12 may use a standard “dictation station,” such as those provided by many vendors. Still other devices may be used by the speaker 12 for dictating, and possibly digitizing the dictation, and sending it to the voice mailbox system 16.
  • The voice mailbox system 16 is configured to digitize audio from the speaker 12 to produce a digital audio file of the dictation. For example, the system 16 may use the Intelliscript™ product made by eScription.
  • The voice mailbox system 16 is further configured to prompt the speaker 12 to enter an identification code and a worktype code. The speaker 12 can enter the codes, e.g., by pressing buttons on a telephone to send DTMF tones, or by speaking the codes into the telephone. The system 16 may provide speech recognition to convert the spoken codes into a digital identification code and a digital worktype code. The mailbox system 16 is further configured to store the identifying code and the worktype code in association with the dictation. The system 16 preferably prompts the speaker 12 to provide the worktype code at least for each dictation related to the medical field. The worktype code designates a category of work to which the dictation pertains, e.g., for medical applications this could include Office Note, Consultation, Operative Note, Discharge Summary, Radiology report, etc.
  • The voice mailbox system 16 is further configured to transmit the digital audio file and speaker identification code over the network 22 to the database server 24 for storage. This transmission is accomplished by the system 16 product using standard network transmission protocols communicating with the database server 24.
  • The database server 24 is configured to store the incoming data from the voice mailbox system 16, as well as from other sources. The database server 24 may include the EditScript Server™ database product from eScription. Software of the database server is configured to produce a database record for the dictation, including a file pointer to the digital audio data, and a field containing the identification code for the speaker 12. If the audio and identifying data are stored on a PDA, the PDA may be connected to a computer running the HandiScript™ software product made by eScription that will perform the data transfer and communication with the database server 24 to enable a database record to be produced for the dictation.
  • Preferably, all communication with the database server 24 is intermediated by a “servlet” application 32 that includes an in-memory cached representation of recent database entries. The servlet 32 is configured to service requests from the voice mailbox system 16, the automatic transcription device, the editing device 20, and the administrative console 18, reading from the database when the servlet's cache does not contain the required information. The servlet 32 includes a separate software module that helps ensure that the servlet's cache is synchronized with the contents of the database. This helps allow the database to be off-loaded of much of the real-time data-communication and to grow to be much larger than otherwise possible. For simplicity, however, the below discussion does not refer to the servlet, but all database access activities may be realized using the servlet application 32 as an intermediary.
  • The automatic transcription device 30 may access the database 40 in the database server 24 over the data network 26 for transcribing the stored dictation. The automatic transcription device 30 uses an automatic speech recognition (ASR) device (e.g., software) to produce a draft transcription for the dictation. An example of ASR technology is the AutoScript™ product made by eScription, that also uses the speaker and, optionally, worktype identifying information to access speaker and speaker-worktype dependent ASR models with which to perform the transcription. The device 30 transmits the draft transcription over the data network 26 to the database server 24 for storage in the database and to be accessed, along with the digital audio file, by the editing device 20.
  • The device 30 is further configured to affect the presentation of the draft transcription. The device 30, as part of speech recognition or as part of post-processing after speech recognition, can add or change items affecting document presentation such as formats, abbreviations, and other text features. The device 30 includes a speech recognizer and may also include a post-processor for performing operations in addition to the speech recognition, although the speech recognizer itself may perform some or all of these additional functions.
  • The transcription device 30 is further configured to produce a token-alignment file that synchronizes the audio with the corresponding text. This file comprises a set of token records, with each record preferably containing a token, a begin index, and an end index. The token comprises a character or a sequence of characters that are to appear on the screen during a word-processing session, or one or more sounds that may or may not appear as text on a screen. A begin index comprises an array reference into the audio file corresponding to the place in the audio file where the corresponding token begins. The end index comprises an array reference into the digital audio file corresponding to the point in the audio file where the corresponding token ends. As an alternative, the end index may not exist separately, with it being assumed that the starting point of the next token (the next begin index) is also the ending point of the previous token. The transcription device 30 can store the token-alignment file in the database 40.
  • The token-alignment file may contain further information, such as a display indicator and/or a playback indicator. The display indicator's value indicates whether the corresponding token is to be displayed, e.g., on a computer monitor, while the transcription is being edited. Using non-displayed tokens can help facilitate editing of the transcription while maintaining synchronization between on-screen tokens and the digital audio file. For example, a speaker may use an alias, e.g., for a heading, and standard heading (e.g., Physical Examination) may be displayed while the words actually spoken by the speaker (e.g., “On exam today”) are audibly played but not displayed as text (hidden). The playback indicator's value indicates whether the corresponding token has audio associated with the token. Using the playback indicator can also help facilitate editing the transcription while maintaining synchronization between on-screen tokens and the digital audio file. The playback indicator's value may be adjusted dynamically during audio playback, e.g., by input from the transcriptionist. The adjustment may, e.g., cause audio associated with corresponding tokens (e.g., hesitation words) to be skipped partially or entirely, that may help increase the transcriptionist's productivity.
  • The tokens stored in the token-alignment file may or may not correspond to words. Instead, a token may represent one or more characters that appear on a display during editing of the transcription, or sounds that occur in the audio file. Thus, the written transcription may have a different form and/or format than the exact words that were spoken by the person 12. For example, a token may represent conventional words such as “the,” “patient,” or “esophagogastroduodenoscopy,” multiple words, partial words, abbreviations or acronyms, numbers, dates, sounds (e.g., a cough, a yawn, a bell), absence of sound (silence), etc. For example, the speaker 12 may say “USA” and the automatic transcription device 30 may interpret and expand this into “United States of America.” In this example, the token is “United States of America” and the begin index would point to the beginning of the audio signal for “USA” and, if the token-alignment file uses end indexes, the end index would point to the end of the audio signal “USA.” As another example, the speaker 12 might say “April 2 of last year,” and the text might appear on the display as “04/02/2003.” The tokens, however, can synchronize the text “04/02/2003” with the audio of “April 2 of last year.” As another example, the speaker 12 might say “miles per hour” while the text is displayed as “MPH.” Using the tokens, the speech recognizer 30, or a post-processor in or separate from the device 30, may alter, expand, contract, and/or format the spoken words when converting to text without losing the audio synchronization. Tokens preferably have variable lengths, with different tokens having different lengths.
  • The token-alignment file provides an environment with many features. Items may appear on a screen but not have any audio signal associated with them (e.g., implicit titles and headings). Items may have audio associated with them and may appear on the screen but may not appear as words (e.g., numeric tokens such as “120/88”). Items may have audio associated with them, appear on the screen, and appear as words contained in the audio (e.g., “the patient showed delayed recovery”). Multiple words may appear on the screen corresponding to audio that is an abbreviated form of what appears on the screen (e.g., “United States of America” may be displayed corresponding to audio of “USA”). Items may have audio associated with them but not have corresponding symbols appear on the screen (e.g., a cough, an ending salutation such as “that's all,” commands or instructions to the transcriptionist such as “start a new paragraph,” etc.).
  • The editing device 20 is configured to be used by a transcriptionist to access and edit the draft transcription stored in the database of the database server 24. The editing device 20 includes a computer (e.g., display, keyboard, mouse, monitor, memory, and a processor, etc.), an attached foot-pedal, and appropriate software such as the EditScript™ software product made by eScription. The transcriptionist can request a dictation job by, e.g., clicking on an on-screen icon. The request is serviced by the database server 24, that finds the dictation for the transcriptionist, and transmits the corresponding audio file and the draft transcription text file. The transcriptionist edits the draft using the editing device 20 and sends the edited transcript back to the database server 24. For example, to end the editing the transcriptionist can click on an on-screen icon button to instruct the editing device 20 to send the final edited document to the database server 24 via the network 22, along with a unique identifier for the transcriptionist. With the data sent from the editing device 20, the database in the server 24 contains, for each dictation: a speaker identifier, a transcriptionist identifier, a file pointer to the digital audio signal, and a file pointer to the edited text document.
  • The edited text document can be transmitted directly to a customer's medical record system or accessed over the data network 22 from the database by the administrative console 18. The console 18 may include an administrative console software product such as Emon™ made by eScription.
  • Referring to FIG. 2, components of the editing device 20, e.g., a computer, include a database interaction module 40, a user interface 42, a word processor module 44, an audio playback module 46, an audio file pointer 48, a cursor module 50, a monitor 52, and an audio device 54. A computer implementing portions of the editing device 20 includes a processor and memory that stores appropriate computer-readable, computer-executable software code instructions that can cause the processor to execute appropriate instructions for performing functions described. The monitor 52 and audio device 54, e.g., speakers, are physical components while the other components shown in FIG. 2 are functional components that may be implemented with software, hardware, etc., or combinations thereof. The audio playback device 46, such as a SoundBlaster® card, is attached to the audio output transducer 54 such as speakers or headphones. The transcriptionist can use the audio device 54 (e.g., headphones or a speaker) to listen to audio and can view the monitor 52 to see the corresponding text. The transcriptionist can use the foot pedal 66, the keyboard 62, and/or the mouse 64 to control the audio playback. The database interaction, audio playback, and editing of the draft transcription is accomplished by means of the appropriate software such as the EditScript Client™ software product made by eScription. The editing software is loaded on the editing device computer 20 and configured appropriately for interaction with other components of the editing device 20. The editing software can use a standard word processing software library, such as that provided with Microsoft Word®, in order to load, edit and save documents corresponding to each dictation.
  • The editing software includes the database interaction module 40, the user interface module 42, the word processing module 44, the audio playback module 46, the audio file pointer adjustment module 48, and the multi-cursor control module 50. The control module 50 regulates the interaction between the interface module 42 and the word processor 44, the audio playback module 46, and the audio file pointer 48. The control module 50 regulates the flow of actions relating to processing of a transcription, including playing audio and providing cursors in the transcribed text, as discussed below especially with respect to FIG. 7. The user interface module 42 controls the activity of the other modules and includes keyboard detection 56, mouse detection 58, and foot pedal detection 60 sub-modules for processing input from a keyboard 62, a mouse 64, and a foot-pedal 66. The foot pedal 66 is a standard transcription foot pedal and is connected to the editing device computer through the computer's serial port. The foot pedal 66 preferably includes a “fast forward” portion and a “rewind” portion.
  • The transcriptionist can request a job from the database by selecting on-screen icon with the mouse 64. The user interface module 42 interprets this mouse click and invokes the database interaction module 40 to request the next job from the database. The database server 24 (FIG. 1) responds by transmitting the audio data file, the draft transcription file, and the token-alignment file to the user interaction module 42. With this information, the editing software can initialize a word-processing session by loading the draft text into the word processing module 44.
  • The audio playback module 46 is configured to play the audio file stored in the database. For initial playback, the module 46 plays the audio file sequentially. The playback module 46 can, however, jump to audio corresponding to an indicated portion of the transcription and begin playback from the indicated location. The location may be indicated by a transcriptionist using appropriate portions of the editing device 20 such as the keyboard 62, or the mouse 64 as discussed below. For playback that starts at an indicated location, the playback module 46 uses the token-alignment file to determine the location in the audio file corresponding to the indicated transcription text. Since many audio playback programs play audio in fixed-sized sections (called “frames”), the audio playback module 46 may convert the indicated begin index to the nearest preceding frame for playback. For example, an audio device 54 may play only frames of 128 bytes in length. In this example, the audio playback module uses the token-alignment file to find the nearest prior starting frame that is a multiple of 128 bytes from the beginning of the audio file. Thus, the starting point for audio playback may not correspond precisely to the selected text in the transcription.
  • The transcriptionist can review and edit a document by appropriately controlling portions of the editing device 20. The transcriptionist can regulate the playback using the foot pedal 66, and listen to the audio corresponding to the text as played by the playback module 46 and converted to sound by the audio device 54. Further, the transcriptionist can move a cursor to a desired portion of the display of the monitor 52 using the keyboard 62 and/or mouse 64, and can make edits at the location of the cursor using the keyboard 62 and/or mouse 64.
  • While the transcriptionist is editing the document, the user interface module 42 can service hardware interrupts from all three of its sub-modules 56, 58, 60. The transcriptionist can use the foot pedal 66 to indicate to that the audio should be “rewound,” or “fast-forwarded” to a different time point in the dictation. These foot-pedal presses are serviced as hardware interrupts by the user interaction module 42. Most standard key presses and on-document mouse-clicks are sent to the word processing module 44 to perform the document editing functions indicated and to update the monitor display. Some user interaction, however, may be directed to the audio-playback oriented modules 46, 48, 50, e.g., cursor control, audio position control, and/or volume control. The transcriptionist may indicate that editing is complete by clicking another icon. In response to such an indication, the final text file is sent through the database interaction module 42 to the database server 24.
  • Referring also to FIG. 3, the cursor module 50 is configured to provide an audio cursor 70 and a text cursor 72 on the monitor 52 in conjunction with the display of the draft transcription 74 for editing by the transcriptionist. The cursor module 50 provides the cursors 70 and 72 independently.
  • The audio cursor 70, under the control of the cursor module 50, tracks the text in the document 74 as the corresponding audio is played to help the transcriptionist follow along in the text 74 with the corresponding audio. The audio cursor 70 moves in conjunction with the audio, as linked to the text 74 by the token-alignment file, to help the transcriptionist follow the text 74 corresponding to the currently-played audio. In order to highlight the text 74, the audio cursor 70 may take a variety of different forms. For example, as shown in FIG. 3, the audio cursor provides a box 76 around the text of the token corresponding to the audio presently being played. The box 76 may also take a variety of forms to distinguish it from other portions of the document 74, such as a rectangular outline of the box 76, and/or a solid box (e.g., inverse video), and may be of a variety of colors such as red against black letters on a white background. As another example, referring to FIG. 4, the audio cursor 70 may be a box 78 that highlights the entire line (or lines) of text that includes the text of the token corresponding to the audio currently being played. The text cursor 72 could be a box 80, e.g., of a single character in width. A text cursor 73 indicates other possible features of a text cursor, including that a text cursor can highlight an entire word and can be positioned within text highlighted by the audio cursor 70. Further, FIG. 4 illustrates that more than two cursors could be provided. As another example, referring to FIG. 5, the audio cursor 70 could be a vertical line cursor 82 that highlights text, e.g., the beginning of the text of the token currently being played, or the beginning of the line of text including the token currently being played. Other possibilities include using highlighting capabilities or bold characters to transiently emphasize a word, series of words, or line(s) of text. Still other forms of the audio cursor 70 may be used. Preferably, the audio cursor 70 is precisely aligned with the currently-played audio, but the cursor 70 may approximate the audio, e.g., with groups of words or one or more entire lines of text being indicated by the audio cursor 70.
  • The text cursor 72 provided by the cursor module 50 indicates the current location for editing in the document 74. The transcriptionist can manipulate the keyboard 62 and/or mouse 64 to control the location of the text cursor 74. The cursor 74 indicates where editing will occur, e.g., addition of text through the keyboard 62, deletion of text, alteration of formatting, insertion of paragraph or page breaks, etc. The transcriptionist can edit the document using the text cursor 72 in standard fashion. The text cursor 72 in combination with the audio cursor 70, however, provides for multi-tasking by the transcriptionist. To make edits, the transcriptionist positions the text cursor 72 in standard fashion and makes the desired change(s).
  • Edits to the text 74 can be made without losing synchronization with the audio. Changes to the text 74 are tracked, with records being made of which characters or other edits are inserted and where, and which characters or other features (e.g., editing, page breaks, etc.) are removed. Preferably, the word processor 44 implements a track-changes feature, maintaining the original document and storing indications of changes.
  • The track-changes feature implemented by the word processor 44 produces a file of changes (e.g., textual, formatting, etc.) to the original text 74. The information regarding these changes, especially text changes such as different expansions of abbreviations, different spellings, etc., may be used to adapt the speech recognizer 30. In conjunction with the synchronization information provided by the automatic transcription device 30 by means of the token-alignment file, the file of changes provides a useful tool for continuous learning/improvement of speech models used for speech recognition by the automatic transcription device 30.
  • The text cursor 72 may be used to change the location of the audio cursor 70, and thus the audio currently played through commands, e.g., from the keyboard 62 and/or the mouse 64, implemented by the cursor control module 50. Movement to a different part of the audio is typically implemented by the audio file pointer module 48 by incrementing or decrementing a pointer into the digital audio file. The location of the audio cursor 70 and thus the current audio for playback, however, may be changed using the text cursor 72. The transcriptionist can position the text cursor 72 to the desired portion of the text 74 for audio playback and actuate appropriate commands. For example, the transcriptionist may use one or more hot keys (e.g., a sequence of keys) and/or one or more mouse clicks (e.g., on screen icons) to cause the audio cursor 70 to move to the position of the text cursor 72, with the audio file pointer being adjusted accordingly. The correct position in the audio file is determined by the audio file pointer module 48 by finding the corresponding token in the token-alignment file. The corresponding token may be a nearest, preferably preceding, token that is associated with text in the document 74. Thus, if the transcriptionist attempts to position the audio cursor 70 in text that was added after speech recognition, e.g., added by the transcriptionist, then the audio file pointer module 48 uses track-changes information from the word processor 44 to determine the appropriate token. The module 48 determines that the text at the position of the text cursor 72 is not in the token-alignment file, and finds the token in the token-alignment file that is nearest, and preferably preceding, the inserted text using information regarding the original document from the track-changes information.
  • The text cursor 72 may also be moved to the position of the audio cursor 70. For example, one or more hot keys and/or one or more mouse clicks can be used to cause the text cursor 72 to jump from its current position to a position at, adjacent, or near the position of the audio cursor 70. Thus, for example, if the transcriptionist hears audio and recognizes that the highlighted corresponding text should be edited, then the transcriptionist can cause the text cursor 72 to jump to the location of the audio cursor 70 to quickly position the text cursor 72 for editing of the desired text. Preferably, the text cursor 72 can highlight the text highlighted by the audio cursor 70 such that text entered by the transcriptionist will overwrite the highlighted text, obviating deletion of the text by the transcriptionist and thereby saving time.
  • In operation, referring to FIG. 6, with further reference to FIGS. 1-3, a process 90 for producing and editing a transcription of speech using the system 10 includes the stages shown. The process 90, however, is exemplary only and not limiting. The process 90 may be altered, e.g., by having stages added, removed, or rearranged.
  • At stage 92, the speaker 12 dictates desired speech to be converted to text. The speaker can use, e.g., a hand-held device such as a personal digital assistant, to dictate audio that is transmitted over the network 14 to the voice mailbox 16. The audio is stored in the voice mailbox 16 as an audio file. The audio file is transmitted over the network 22 to the database server 24 and is stored in the database 40.
  • At stage 94, the automatic transcription device 30 transcribes the audio file. The device 30 accesses and retrieves the audio file from the database 40 through the LAN 26. A speech recognizer of the device 30 analyzes the audio file in accordance with speech models to produce a draft text document 74 from the audio file and store the draft document 74 in the database 40. The device 30 also produces a corresponding token-alignment file that includes the draft document 74 and associates portions of the audio file with the transcribed text of the document 74. The device 30 stores the token-alignment file in the database 40 via the LAN 26.
  • At stage 96, the transcriptionist reviews and edits the transcribed draft document 74 as appropriate. The transcriptionist uses the editing device 20 to access the database 40 and retrieve the audio file and the token-alignment file that includes the draft text document 74. The transcriptionist plays the audio file and reviews the corresponding text as highlighted or otherwise indicated by the audio cursor 70 and makes desired edits using the text cursor 72. The reviewing of this stage is detailed below with respect to FIG. 7. The word processor 44 produces and stores track-changes information in response to edits made by the transcriptionist.
  • At stage 98, the track-changes information is provided to the automatic transcription device 30 for use in improving the speech models used by the speech recognizer of the device 30 by analyzing the transcribed draft text and what revisions were made by the transcriptionist. The models can be adjusted so that the next time the speech recognizer analyzes speech that was edited by the transcriptionist, the recognizer will transcribe the same or similar audio to the edited text instead of the draft text previously provided. At stage 100, the word processor provides a final, revised text document as edited by the transcriptionist. This final document can be stored in the database 40 and provided via the network 22 to interested parties, e.g., the speaker that dictated the audio file.
  • Referring to FIG. 7, with further reference to FIGS. 1-3 and 6, a process 110 for reviewing the draft transcribed document 74, stage 86 of FIG. 6, using the editing device 20 includes the stages shown. The process 110, however, is exemplary only and not limiting. The process 110 may be altered, e.g., by having stages added, removed, or rearranged.
  • At stage 112, a token in the token-alignment file is obtained. The next token in the file is obtained in the normal course of audio playback in the absence of transcriptionist input. If, however, the transcriptionist causes a change in the location of the audio cursor, then the token corresponding to the new location of the audio cursor is obtained.
  • At stage 114, the text most nearly associated with the current token is located. This text may be text associated with a token adjacent to the current token, e.g., if the current token does not have text directly associated with it (e.g., a cough). Text entered by the transcriptionist is ignored in determining the most-nearly-associated text.
  • At stage 116, the cursor control module 50 displays the audio cursor 70 to accentuate the text determined to be most nearly associated with the current token. The control module 50 draws the audio cursor 70 to highlight the text, e.g., drawing the cursor 70 around, near, etc., the determined text. The location of the text corresponding to tokens may be determined dynamically as the token-alignment file is stepped through in order to display the audio cursor 70. Alternatively, locations (e.g., within a document or on a screen) for tokens can be determined before stepping through the token-alignment file to play back the audio (e.g., upon loading of the token-alignment file). In this alternative, the locations can be re-calculated for added or removed text (on the fly when the text is changed, after changes are made, in response to a re-determine command, etc.). Other alternatives are also possible.
  • At stage 118, the audio file pointer module 48 determines the position in the audio file corresponding to the current token. The module 48 uses the token-alignment file and the selected token to find the location in the audio file corresponding to the current token.
  • At stage 120, the audio file pointer module 48 selects a portion of the audio file for playback. The module 48 selects a frame of audio associated with the token for submission to the audio playback module 46.
  • At stage 122, the audio playback module 46 controls playback of the selected audio frame. The module 46 provides control signals to the audio device 54 to audibly play the corresponding audio for the transcriptionist to hear.
  • Referring to FIG. 8, with further reference to FIGS. 1-3 and 6-7, a process 130 for editing the draft transcribed document 74, stage 86 of FIG. 6, using the editing device 20 includes the stages shown. The process 130, however, is exemplary only and not limiting. The process 130 may be altered, e.g., by having stages added, removed, or rearranged.
  • At stage 132, the transcriptionist positions the text cursor 72 as desired for editing of the document 74. The transcriptionist can move the text cursor 72 independently of the audio cursor 74, e.g., using the keyboard 62 and/or mouse 64. The transcriptionist may also, or alternatively, move the text cursor 72 dependent upon the audio cursor 70 by causing the text cursor 72 to move to, or near to, the position of the audio cursor 70.
  • At stage 134, the audio corresponding to the location of the text cursor 72 is played if the audio cursor 70 is synched to the text cursor 72. If the transcriptionist causes the audio cursor 70 to move to the location of the text cursor 72, then the audio for the new location of the audio cursor 70 is preferably played to assist the transcriptionist determine whether edits to the text are desired.
  • At stage 136, desired edits to the text 74 at the location of the text cursor 72 are made by the transcriptionist. With the text cursor 72 placed as desired, edits can be made as indicated by the transcriptionist (e.g., using the keyboard 62) and implemented by the word processor 44. The audio may continue to play while changes are being made at the location of the text cursor 72. The transcriptionist may, however, stop the audio playback using, e.g., the foot pedal 66, keyboard commands, etc. The audio playback may be managed independently of editing of the text 74.
  • Other embodiments are within the scope and spirit of the appended claims. For example, due to the nature of software, functions described above can be implemented using software, hardware, firmware, hardwiring, or combinations of any of these. Features implementing functions may also be physically located at various positions, including being distributed such that portions of functions are implemented at different physical locations. Further, while two cursors were discussed above, more than two cursors could be employed and implemented by the cursor control module 50. For example, there could be an audio cursor and multiple text cursors, e.g., one controlled by the mouse 64 and one controlled by the keyboard 62. Other arrangements and numbers of cursors could be implemented.

Claims (23)

What is claimed is:
1. A device for use by a transcriptionist in a transcription editing system for editing transcriptions dictated by speakers, the device comprising, in combination:
a monitor configured to display visual text of transcribed dictations;
an audio mechanism configured to cause playback of portions of an audio file associated with a dictation; and
a cursor-control module coupled to the audio mechanism and to the monitor and configured to cause the monitor to display multiple cursors in the text.
2. The device of claim 1 wherein the cursor-control module is configured to cause the monitor to display multiple cursors in the text that indicate different functionality.
3. The device of claim 2 wherein the cursor-control module is configured to cause the monitor to display:
an audio cursor accentuating a portion of the text, the audio cursor accentuating different text as the audio file is played using the audio mechanism; and
a text cursor indicative of a position in the text where editing commands will be implemented.
4. The device of claim 3 wherein the audio cursor comprises at least one of a rectangular box surrounding text corresponding to a portion of the audio file, a rectangular box surrounding a line of text, a vertical line, an inverse-video portion of the monitor, and bolding of a portion of the text.
5. The device of claim 3 wherein the cursor-control module is configured to determine wherein to cause the monitor to display the audio cursor by using a token-alignment file that associates portions of the audio file with portions of the text.
6. The device of claim 3 wherein the cursor-control module is configured to move at least one of the audio cursor and the text cursor to a location of the other of the text cursor and the audio cursor, respectively.
7. The device of claim 6 wherein the audio mechanism is configured to determine and play a portion of the audio file corresponding to text at the location of the audio cursor when the audio cursor is moved to the location of the text cursor.
8. The device of claim 1 further comprising a change-recording apparatus configured to record changes made to the text and associate the changes with portions of the audio file whereby the recorded changes can be used to adapt speech recognition apparatus in accordance with the changed text and the associated portions of the audio file.
9. A computer program product residing on a computer-readable medium and comprising computer-readable instructions for causing a computer to:
display visual text of transcribed dictations;
cause playback of portions of an audio file associated with a dictation; and
cause the monitor to display multiple cursors in the text.
10. The computer program product of claim 9 wherein the instructions are configured to cause the monitor to display:
an audio cursor accentuating a portion of the text with the audio cursor accentuating different text as the audio file is played; and
a text cursor indicative of a position in the text where editing commands will be implemented.
11. The computer program product of claim 10 wherein the cursor-control module is configured to determine where to cause the monitor to display the audio cursor by using a token-alignment file that associates portions of the audio file with portions of the text.
12. The computer program product of claim 10 further comprising instructions for causing the computer to move at least one of the audio cursor and the text cursor to a location of the other of the text cursor and the audio cursor, respectively.
13. The computer program product of claim 12 further comprising instructions for causing the computer to determine and cause playing of a portion of the audio file corresponding to text at the location of the audio cursor when the audio cursor is moved to the location of the text cursor.
14. The computer program product of claim 9 further comprising instructions for causing the computer to record changes made to the text and associate the changes with portions of the audio file whereby the recorded changes can be used to adapt speech recognition apparatus in accordance with the changed text and the associated portions of the audio file.
15. A method of processing text transcribed from an audio file, the method comprising:
displaying text of a transcribed dictation on a monitor;
playing portions of an audio file associated with the dictation;
displaying an audio cursor in the text on the monitor, the audio cursor accentuating a portion of the text with the audio cursor accentuating different text as the audio file is played; and
displaying a text cursor in the text on the monitor, the text cursor being indicative of a position in the text where editing commands will be implemented.
16. The method of claim 15 further comprising using a token-alignment file that associates portions of the audio file with portions of the text to determine where to display the audio cursor.
17. The method of claim 15 further comprising moving at least one of the audio cursor and the text cursor to a location of the other of the text cursor and the audio cursor, respectively, in response to receiving a corresponding command.
18. The method of claim 17 further comprising playing of a portion of the audio file corresponding to text at the location of the audio cursor if the audio cursor is moved to the location of the text cursor.
19. The method of claim 15 further comprising:
recording changes made to the text; and
associating the changes with portions of the audio file.
20. The method of claim 19 further comprising using the recorded changes to adapt speech recognition apparatus in accordance with the changed text and the associated portions of the audio file.
21. A method of processing a recorded dictation, the method comprising:
analyzing the recorded dictation in accordance with speech models to convert the recorded dictation to a draft text;
storing the draft text; and
producing and recording a token-alignment file that associates portions of the draft text with portions of the audio file, the token-alignment file including tokens at least some of which are indicative of portions of the draft text, the tokens indicating beginnings and ends of portions of the recorded dictation associated with the portions of the draft text such that the portions of the recorded dictation are associated with corresponding portions of the draft text even if the corresponding portions of the draft text, if spoken, do not correspond identically to the corresponding portions of the recorded dictation.
22. The method of claim 21 wherein producing and recording the token-alignment file includes producing and recording tokens for which there is no corresponding draft text.
23. The method of claim 21 further comprising:
receiving a revised text associated with the recorded dictation; and
using indicia of differences between the revised text and the draft text and the associated recorded dictation to modify the speech models for converting other recorded dictations to other draft texts.
US13/934,527 2004-06-02 2013-07-03 Multi-cursor transcription editing Abandoned US20130298016A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US13/934,527 US20130298016A1 (en) 2004-06-02 2013-07-03 Multi-cursor transcription editing

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US10/859,889 US8504369B1 (en) 2004-06-02 2004-06-02 Multi-cursor transcription editing
US13/934,527 US20130298016A1 (en) 2004-06-02 2013-07-03 Multi-cursor transcription editing

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US10/859,889 Continuation US8504369B1 (en) 2004-06-02 2004-06-02 Multi-cursor transcription editing

Publications (1)

Publication Number Publication Date
US20130298016A1 true US20130298016A1 (en) 2013-11-07

Family

ID=48876463

Family Applications (2)

Application Number Title Priority Date Filing Date
US10/859,889 Active - Reinstated 2027-11-04 US8504369B1 (en) 2004-06-02 2004-06-02 Multi-cursor transcription editing
US13/934,527 Abandoned US20130298016A1 (en) 2004-06-02 2013-07-03 Multi-cursor transcription editing

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US10/859,889 Active - Reinstated 2027-11-04 US8504369B1 (en) 2004-06-02 2004-06-02 Multi-cursor transcription editing

Country Status (1)

Country Link
US (2) US8504369B1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150331600A1 (en) * 2014-05-15 2015-11-19 Samsung Electronics Co., Ltd. Operating method using an input control object and electronic device supporting the same
US20170124045A1 (en) * 2015-11-02 2017-05-04 Microsoft Technology Licensing, Llc Generating sound files and transcriptions for use in spreadsheet applications
CN107967097A (en) * 2017-12-05 2018-04-27 北京小米移动软件有限公司 Method for editing text and device
US20190121532A1 (en) * 2017-10-23 2019-04-25 Google Llc Method and System for Generating Transcripts of Patient-Healthcare Provider Conversations
US20190179892A1 (en) * 2017-12-11 2019-06-13 International Business Machines Corporation Cognitive presentation system and method
US10503824B2 (en) 2015-11-02 2019-12-10 Microsoft Technology Licensing, Llc Video on charts
US11093696B2 (en) * 2018-04-13 2021-08-17 Young Seok HWANG Playable text editor and editing method therefor
US11443646B2 (en) * 2017-12-22 2022-09-13 Fathom Technologies, LLC E-Reader interface system with audio and highlighting synchronization for digital books
US11670291B1 (en) * 2019-02-22 2023-06-06 Suki AI, Inc. Systems, methods, and storage media for providing an interface for textual editing through speech

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7958443B2 (en) 2003-02-28 2011-06-07 Dictaphone Corporation System and method for structuring speech recognized text into a pre-selected document format
US8200487B2 (en) 2003-11-21 2012-06-12 Nuance Communications Austria Gmbh Text segmentation and label assignment with user interaction by means of topic specific language models and topic-specific label statistics
JP5133678B2 (en) * 2007-12-28 2013-01-30 株式会社ベネッセコーポレーション Video playback system and control method thereof
US11138363B2 (en) * 2009-04-15 2021-10-05 Gary Siegel Computerized method and computer program for displaying and printing markup
CN102314874A (en) * 2010-06-29 2012-01-11 鸿富锦精密工业(深圳)有限公司 Text-to-voice conversion system and method
US9645986B2 (en) 2011-02-24 2017-05-09 Google Inc. Method, medium, and system for creating an electronic book with an umbrella policy
US9141404B2 (en) 2011-10-24 2015-09-22 Google Inc. Extensible framework for ereader tools
US9031493B2 (en) 2011-11-18 2015-05-12 Google Inc. Custom narration of electronic books
US9311286B2 (en) * 2012-01-25 2016-04-12 International Business Machines Corporation Intelligent automatic expansion/contraction of abbreviations in text-based electronic communications
US20130268826A1 (en) * 2012-04-06 2013-10-10 Google Inc. Synchronizing progress in audio and text versions of electronic books
US9047356B2 (en) 2012-09-05 2015-06-02 Google Inc. Synchronizing multiple reading positions in electronic books
WO2015094158A1 (en) 2013-12-16 2015-06-25 Hewlett-Packard Development Company, L.P. Determining preferred communication explanations using record-relevancy tiers
WO2015100172A1 (en) * 2013-12-27 2015-07-02 Kopin Corporation Text editing with gesture control and natural speech
US9953646B2 (en) 2014-09-02 2018-04-24 Belleau Technologies Method and system for dynamic speech recognition and tracking of prewritten script
GB2553960A (en) 2015-03-13 2018-03-21 Trint Ltd Media generating and editing system
US10579743B2 (en) 2016-05-20 2020-03-03 International Business Machines Corporation Communication assistant to bridge incompatible audience
KR20180012464A (en) * 2016-07-27 2018-02-06 삼성전자주식회사 Electronic device and speech recognition method thereof
US10445052B2 (en) 2016-10-04 2019-10-15 Descript, Inc. Platform for producing and delivering media content
US10564817B2 (en) * 2016-12-15 2020-02-18 Descript, Inc. Techniques for creating and presenting media content
US11568231B2 (en) * 2017-12-08 2023-01-31 Raytheon Bbn Technologies Corp. Waypoint detection for a contact center analysis system
JP2022533310A (en) 2019-04-09 2022-07-22 ジャイブワールド, エスピーシー A system and method for simultaneously expressing content in a target language in two forms and improving listening comprehension of the target language
US10839039B1 (en) 2019-12-12 2020-11-17 Capital One Services, Llc Webpage accessibility compliance

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5857212A (en) * 1995-07-06 1999-01-05 Sun Microsystems, Inc. System and method for horizontal alignment of tokens in a structural representation program editor
US5875448A (en) * 1996-10-08 1999-02-23 Boys; Donald R. Data stream editing system including a hand-held voice-editing apparatus having a position-finding enunciator
US6076059A (en) * 1997-08-29 2000-06-13 Digital Equipment Corporation Method for aligning text with audio signals
US20020095291A1 (en) * 2001-01-12 2002-07-18 International Business Machines Corporation Method for incorporating multiple cursors in a speech recognition system

Family Cites Families (81)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3676856A (en) 1970-08-11 1972-07-11 Ron Manly Automatic editing system and method
US3648249A (en) 1970-12-08 1972-03-07 Ibm Audio-responsive visual display system incorporating audio and digital information segmentation and coordination
US4701130A (en) 1985-01-11 1987-10-20 Access Learning Technology Corporation Software training system
US4637797A (en) 1985-01-11 1987-01-20 Access Learning Technology Corporation Software training system
US5146439A (en) 1989-01-04 1992-09-08 Pitney Bowes Inc. Records management system having dictation/transcription capability
US5519808A (en) 1993-03-10 1996-05-21 Lanier Worldwide, Inc. Transcription interface for a word processing station
US5369704A (en) 1993-03-24 1994-11-29 Engate Incorporated Down-line transcription system for manipulating real-time testimony
US5602982A (en) 1994-09-23 1997-02-11 Kelly Properties, Inc. Universal automated training and testing software system
US5812882A (en) 1994-10-18 1998-09-22 Lanier Worldwide, Inc. Digital dictation system having a central station that includes component cards for interfacing to dictation stations and transcription stations and for processing and storing digitized dictation segments
US5960447A (en) 1995-11-13 1999-09-28 Holt; Douglas Word tagging and editing system for speech recognition
US5911485A (en) 1995-12-11 1999-06-15 Unwired Planet, Inc. Predictive data entry method for a keypad
US5898830A (en) 1996-10-17 1999-04-27 Network Engineering Software Firewall providing enhanced network security and user transparency
US5748888A (en) 1996-05-29 1998-05-05 Compaq Computer Corporation Method and apparatus for providing secure and private keyboard communications in computer systems
EP0811906B1 (en) 1996-06-07 2003-08-27 Hewlett-Packard Company, A Delaware Corporation Speech segmentation
US5664896A (en) 1996-08-29 1997-09-09 Blumberg; Marvin R. Speed typing apparatus and method
US5875429A (en) 1997-05-20 1999-02-23 Applied Voice Recognition, Inc. Method and apparatus for editing documents through voice recognition
US5974413A (en) 1997-07-03 1999-10-26 Activeword Systems, Inc. Semantic user interface
US6141011A (en) 1997-08-04 2000-10-31 Starfish Software, Inc. User interface methodology supporting light data entry for microprocessor device having limited user input
DE69806780T2 (en) 1997-09-25 2003-03-13 Tegic Communications Inc SYSTEM FOR SUPPRESSING AMBIANCE IN A REDUCED KEYBOARD
US6195637B1 (en) 1998-03-25 2001-02-27 International Business Machines Corp. Marking and deferring correction of misrecognition errors
US6064965A (en) 1998-09-02 2000-05-16 International Business Machines Corporation Combined audio playback in speech recognition proofreader
US6338038B1 (en) 1998-09-02 2002-01-08 International Business Machines Corp. Variable speed audio playback in speech recognition proofreader
US6457031B1 (en) 1998-09-02 2002-09-24 International Business Machines Corp. Method of marking previously dictated text for deferred correction in a speech recognition proofreader
US6374225B1 (en) 1998-10-09 2002-04-16 Enounce, Incorporated Method and apparatus to prepare listener-interest-filtered works
US6122614A (en) 1998-11-20 2000-09-19 Custom Speech Usa, Inc. System and method for automating transcription services
US6363342B2 (en) * 1998-12-18 2002-03-26 Matsushita Electric Industrial Co., Ltd. System for developing word-pronunciation pairs
US6415256B1 (en) * 1998-12-21 2002-07-02 Richard Joseph Ditzik Integrated handwriting and speed recognition systems
US6802041B1 (en) 1999-01-20 2004-10-05 Perfectnotes Corporation Multimedia word processor
US20030004724A1 (en) * 1999-02-05 2003-01-02 Jonathan Kahn Speech recognition program mapping tool to align an audio file to verbatim text
US6961699B1 (en) * 1999-02-19 2005-11-01 Custom Speech Usa, Inc. Automated transcription system and method using two speech converting instances and computer-assisted correction
US6434523B1 (en) 1999-04-23 2002-08-13 Nuance Communications Creating and editing grammars for speech recognition graphically
US6611802B2 (en) 1999-06-11 2003-08-26 International Business Machines Corporation Method and system for proofreading and correcting dictated text
US6370503B1 (en) * 1999-06-30 2002-04-09 International Business Machines Corp. Method and apparatus for improving speech recognition accuracy
JP2001043062A (en) 1999-07-27 2001-02-16 Nec Corp Personal computer, volume control method thereof, and recording medium
US6704709B1 (en) * 1999-07-28 2004-03-09 Custom Speech Usa, Inc. System and method for improving the accuracy of a speech recognition program
US6865258B1 (en) 1999-08-13 2005-03-08 Intervoice Limited Partnership Method and system for enhanced transcription
US6542091B1 (en) 1999-10-01 2003-04-01 Wayne Allen Rasanen Method for encoding key assignments for a data input device
JP2003518266A (en) 1999-12-20 2003-06-03 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ Speech reproduction for text editing of speech recognition system
US7082615B1 (en) 2000-03-31 2006-07-25 Intel Corporation Protecting software environment in isolated execution
US6912498B2 (en) 2000-05-02 2005-06-28 Scansoft, Inc. Error correction in speech recognition by correcting text around selected area
ATE480100T1 (en) * 2000-06-09 2010-09-15 British Broadcasting Corp GENERATION OF SUBTITLES FOR MOVING IMAGES
US7624356B1 (en) 2000-06-21 2009-11-24 Microsoft Corporation Task-sensitive methods and systems for displaying command sets
US6950994B2 (en) 2000-08-31 2005-09-27 Yahoo! Inc. Data list transmutation and input mapping
US7236932B1 (en) * 2000-09-12 2007-06-26 Avaya Technology Corp. Method of and apparatus for improving productivity of human reviewers of automatically transcribed documents generated by media conversion systems
US6993246B1 (en) 2000-09-15 2006-01-31 Hewlett-Packard Development Company, L.P. Method and system for correlating data streams
US6975985B2 (en) * 2000-11-29 2005-12-13 International Business Machines Corporation Method and system for the automatic amendment of speech recognition vocabularies
CA2328566A1 (en) 2000-12-15 2002-06-15 Ibm Canada Limited - Ibm Canada Limitee System and method for providing language-specific extensions to the compare facility in an edit system
US7735021B2 (en) 2001-02-16 2010-06-08 Microsoft Corporation Shortcut system for use in a mobile electronic device and method thereof
ATE317583T1 (en) 2001-03-29 2006-02-15 Koninkl Philips Electronics Nv TEXT EDITING OF RECOGNIZED LANGUAGE WITH SIMULTANEOUS PLAYBACK
JP5093966B2 (en) 2001-03-29 2012-12-12 ニュアンス コミュニケーションズ オーストリア ゲーエムベーハー Alignment of voice cursor and text cursor during editing
US6834264B2 (en) * 2001-03-29 2004-12-21 Provox Technologies Corporation Method and apparatus for voice dictation and document production
US20030007018A1 (en) 2001-07-09 2003-01-09 Giovanni Seni Handwriting user interface for personal digital assistants and the like
US7152213B2 (en) 2001-10-04 2006-12-19 Infogation Corporation System and method for dynamic key assignment in enhanced user interface
CN1312657C (en) 2001-10-12 2007-04-25 皇家飞利浦电子股份有限公司 Speech recognition device to mark parts of a recognized text
DE60211197T2 (en) 2001-10-31 2007-05-03 Koninklijke Philips Electronics N.V. METHOD AND DEVICE FOR THE CONVERSION OF SPANISHED TEXTS AND CORRECTION OF THE KNOWN TEXTS
US7196691B1 (en) 2001-11-14 2007-03-27 Bruce Martin Zweig Multi-key macros to speed data input
US7292975B2 (en) 2002-05-01 2007-11-06 Nuance Communications, Inc. Systems and methods for evaluating speaker suitability for automatic speech recognition aided transcription
US7236931B2 (en) 2002-05-01 2007-06-26 Usb Ag, Stamford Branch Systems and methods for automatic acoustic speaker adaptation in computer-assisted transcription systems
US6986106B2 (en) 2002-05-13 2006-01-10 Microsoft Corporation Correction widget
EP1514251A2 (en) 2002-05-24 2005-03-16 SMTM Technologies LLC Method and system for skills-based testing and trainning
US7260529B1 (en) 2002-06-25 2007-08-21 Lengen Nicholas D Command insertion system and method for voice recognition applications
US7137076B2 (en) 2002-07-30 2006-11-14 Microsoft Corporation Correcting recognition results associated with user input
US6763320B2 (en) 2002-08-15 2004-07-13 International Business Machines Corporation Data input device for individuals with limited hand function
US7206938B2 (en) 2002-09-24 2007-04-17 Imagic Software, Inc. Key sequence rhythm recognition system and method
US20080034218A1 (en) 2002-09-24 2008-02-07 Bender Steven L Key sequence rhythm guidance recognition system and method
US7016844B2 (en) 2002-09-26 2006-03-21 Core Mobility, Inc. System and method for online transcription services
US7515903B1 (en) * 2002-10-28 2009-04-07 At&T Mobility Ii Llc Speech to message processing
US7580838B2 (en) 2002-11-22 2009-08-25 Scansoft, Inc. Automatic insertion of non-verbalized punctuation
US7516070B2 (en) * 2003-02-19 2009-04-07 Custom Speech Usa, Inc. Method for simultaneously creating audio-aligned final and verbatim text with the assistance of a speech recognition program as may be useful in form completion using a verbal entry method
US7958443B2 (en) * 2003-02-28 2011-06-07 Dictaphone Corporation System and method for structuring speech recognized text into a pre-selected document format
US7107397B2 (en) * 2003-05-29 2006-09-12 International Business Machines Corporation Magnetic tape data storage system buffer management
GB2405728A (en) 2003-09-03 2005-03-09 Business Integrity Ltd Punctuation of automated documents
WO2005086005A1 (en) 2004-03-05 2005-09-15 Secure Systems Limited Partition access control system and method for controlling partition access
US7382359B2 (en) 2004-06-07 2008-06-03 Research In Motion Limited Smart multi-tap text input
US7508324B2 (en) 2004-08-06 2009-03-24 Daniel Suraqui Finger activated reduced keyboard and a method for performing text input
US20060176283A1 (en) 2004-08-06 2006-08-10 Daniel Suraqui Finger activated reduced keyboard and a method for performing text input
KR100713128B1 (en) 2004-11-08 2007-05-02 주식회사 비젯 Device and System for preventing virus
EP1864455A2 (en) 2005-03-29 2007-12-12 Glowpoint, Inc. Video communication call authorization
FI20050561A0 (en) 2005-05-26 2005-05-26 Nokia Corp Processing of packet data in a communication system
US20070143857A1 (en) 2005-12-19 2007-06-21 Hazim Ansari Method and System for Enabling Computer Systems to Be Responsive to Environmental Changes
US9904809B2 (en) 2006-02-27 2018-02-27 Avago Technologies General Ip (Singapore) Pte. Ltd. Method and system for multi-level security initialization and configuration

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5857212A (en) * 1995-07-06 1999-01-05 Sun Microsystems, Inc. System and method for horizontal alignment of tokens in a structural representation program editor
US5875448A (en) * 1996-10-08 1999-02-23 Boys; Donald R. Data stream editing system including a hand-held voice-editing apparatus having a position-finding enunciator
US6076059A (en) * 1997-08-29 2000-06-13 Digital Equipment Corporation Method for aligning text with audio signals
US20020095291A1 (en) * 2001-01-12 2002-07-18 International Business Machines Corporation Method for incorporating multiple cursors in a speech recognition system
US6963840B2 (en) * 2001-01-12 2005-11-08 International Business Machines Corporation Method for incorporating multiple cursors in a speech recognition system

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150331600A1 (en) * 2014-05-15 2015-11-19 Samsung Electronics Co., Ltd. Operating method using an input control object and electronic device supporting the same
US20170124045A1 (en) * 2015-11-02 2017-05-04 Microsoft Technology Licensing, Llc Generating sound files and transcriptions for use in spreadsheet applications
US9934215B2 (en) * 2015-11-02 2018-04-03 Microsoft Technology Licensing, Llc Generating sound files and transcriptions for use in spreadsheet applications
US11630947B2 (en) 2015-11-02 2023-04-18 Microsoft Technology Licensing, Llc Compound data objects
US11321520B2 (en) 2015-11-02 2022-05-03 Microsoft Technology Licensing, Llc Images on charts
US11106865B2 (en) 2015-11-02 2021-08-31 Microsoft Technology Licensing, Llc Sound on charts
US10503824B2 (en) 2015-11-02 2019-12-10 Microsoft Technology Licensing, Llc Video on charts
US10579724B2 (en) 2015-11-02 2020-03-03 Microsoft Technology Licensing, Llc Rich data types
US11080474B2 (en) 2015-11-02 2021-08-03 Microsoft Technology Licensing, Llc Calculations on sound associated with cells in spreadsheets
US10997364B2 (en) 2015-11-02 2021-05-04 Microsoft Technology Licensing, Llc Operations on sound files associated with cells in spreadsheets
US10990266B2 (en) 2017-10-23 2021-04-27 Google Llc Method and system for generating transcripts of patient-healthcare provider conversations
US10719222B2 (en) * 2017-10-23 2020-07-21 Google Llc Method and system for generating transcripts of patient-healthcare provider conversations
US20190121532A1 (en) * 2017-10-23 2019-04-25 Google Llc Method and System for Generating Transcripts of Patient-Healthcare Provider Conversations
US11442614B2 (en) * 2017-10-23 2022-09-13 Google Llc Method and system for generating transcripts of patient-healthcare provider conversations
US11650732B2 (en) 2017-10-23 2023-05-16 Google Llc Method and system for generating transcripts of patient-healthcare provider conversations
CN107967097A (en) * 2017-12-05 2018-04-27 北京小米移动软件有限公司 Method for editing text and device
US10657202B2 (en) * 2017-12-11 2020-05-19 International Business Machines Corporation Cognitive presentation system and method
US20190179892A1 (en) * 2017-12-11 2019-06-13 International Business Machines Corporation Cognitive presentation system and method
US11443646B2 (en) * 2017-12-22 2022-09-13 Fathom Technologies, LLC E-Reader interface system with audio and highlighting synchronization for digital books
US11657725B2 (en) 2017-12-22 2023-05-23 Fathom Technologies, LLC E-reader interface system with audio and highlighting synchronization for digital books
US11093696B2 (en) * 2018-04-13 2021-08-17 Young Seok HWANG Playable text editor and editing method therefor
US11670291B1 (en) * 2019-02-22 2023-06-06 Suki AI, Inc. Systems, methods, and storage media for providing an interface for textual editing through speech

Also Published As

Publication number Publication date
US8504369B1 (en) 2013-08-06

Similar Documents

Publication Publication Date Title
US8504369B1 (en) Multi-cursor transcription editing
US11704434B2 (en) Transcription data security
US11586808B2 (en) Insertion of standard text in transcription
US11650732B2 (en) Method and system for generating transcripts of patient-healthcare provider conversations
US11894140B2 (en) Interface for patient-provider conversation and auto-generation of note or summary
US9632992B2 (en) Transcription editing
US7516070B2 (en) Method for simultaneously creating audio-aligned final and verbatim text with the assistance of a speech recognition program as may be useful in form completion using a verbal entry method
US8280735B2 (en) Transcription data extraction
US7979281B2 (en) Methods and systems for creating a second generation session file
US20060190249A1 (en) Method for comparing a transcribed text file with a previously created file
US20080255837A1 (en) Method for locating an audio segment within an audio file
US7274775B1 (en) Transcription playback speed setting

Legal Events

Date Code Title Description
AS Assignment

Owner name: ESCRIPTION, INC., MASSACHUSETTS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ZIMMERMAN, ROGER S.;CHEMIN, DANIEL;BRODY, EDWARD;AND OTHERS;REEL/FRAME:035863/0215

Effective date: 20080310

Owner name: NUANCE COMMUNICATIONS, INC., MASSACHUSETTS

Free format text: MERGER;ASSIGNOR:ESCRIPTION, INC.;REEL/FRAME:035944/0507

Effective date: 20080407

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

AS Assignment

Owner name: NUANCE COMMUNICATIONS, INC., MASSACHUSETTS

Free format text: NUNC PRO TUNC ASSIGNMENT;ASSIGNOR:ESCRIPTION, INC.;REEL/FRAME:055042/0246

Effective date: 20210126