Meeting Assistant

Meeting Assistant (MA) is designed to help a user in meeting environments. It is focused on the collection of speech in meeting with basic natural language discourse and dialog understanding. Outputs from this component can be integrated with other delivered systems and research efforts to perform efforts such as: (1) indexing the meeting speech for later retrieval or (2) performing further language processing given new processing techniques or additional background knowledge.

 

Overview

Professionals spend a great deal of time in meetings. Unfortunately, much valuable information from meetings is lost due to inadequate information capture during the meetings. Meeting Assistant (MA) can greatly boost productivity by seamlessly capturing and organizing the content of a meeting: what was said, who said it, and what was written. MA captures the audio and other artifacts from a meeting. From this, it creates an integrated transcript that serves as a permanent record, which can be searched and quoted like other textual documents, enabling search-engine analysis of archived meetings. MA benefits both individuals who need a refresher on meetings that they attended and individuals who are seeking insight on meetings that they did not attend.

MA is offered in two configurations:

  • Configuration I (Mercury) is the end-to-end Meeting Assistant system. Mercury provides recording of meeting artifacts (such as chat sessions, shared documents, drawings, etc.) and speech transcription for multiple participants segregated by the speaker. The transcribed text is linked to the corresponding audio. The results can be accessed via a web browser.
  • Configuration II (Transcriber) provides transcription of independently recorded speech files using the Mercury speech recognition APIs. In essence, Transcriber provides a stand-alone software service for the transcription of speech from meetings. Unlike Mercury (MA Configuration I), the transcriptions by the Transcriber do not segregate by speaker.

Transcripts are generated for off-line perusal, typically in four times the duration of the meeting. MA builds on technologies spanning speech recognition, natural language understanding, discourse understanding, and multiparty dialogue understanding.

Prerequisites

Note: MA configurations I and II require a software module that provides automatic speech recognition (ASR) according to specified speech recognition XML-RPC APIs . To date, MA has been used exclusively with the ASR provided by SRI’s Decipher software. Decipher is proprietary software, which can be bundled with MA via special licensing agreements. Interested users may inquire about a license by contacting support@pal.sri.com. Other ASR software may be used if wrapped appropriately to provide similar XML-RPC interfaces.

MA Configuration I (Mercury)

The Mercury Client requires:

    • Windows XP
    • Java SE 6
    • Flash-enabled web browser (to listen to audio from the meeting via the browser)
    • High-speed internet connection to the Mercury Server and Meeting Browser
    • A close-talking microphone for collecting meeting speech (e.g., Sennheiser PC25 or PC35) is strongly suggested

Server prerequisites are outlined in the MA Framework server setup guide (MA Framework Server Installation Guide).

MA Configuration II (Transcriber)

The uploaded speech needs to be in a zip file that contains one or more “*.wav” files for processing. The .wav files must be Microsoft PCM WAVE format with 16 Khz Mono, 16 bit, RIFF (little-endian) data. Other audio formats will need to be converted to this format.

Limitations

MA Configuration I (Mercury)

  • The Mercury client focuses on close-talking speech recognition. If other speech-collection procedures are used (far-field microphones, for example), then the software performing the speech recognition will likely need to have its models adjusted.
  • The Mercury client software can support only 20—25 locations concurrently in a single meeting (using hold-to-talk mode). This should not be confused with only supporting 20—25 users in a single meeting. Many users can concurrently be in the same location (e.g., “Conference Room”). Because each of them hears the same mix from the server, the overhead for adding more users to a location is small.
  • The Meeting Browser transmits username/password in plain text over HTTP instead of using the more secure HTTPS. Similarly, the XML-RPC control API has no built-in security, so it should run in a network that independently provides security (e.g., a closed network or behind an appropriate firewall).

MA Configuration II (Transcriber)

  • Even if multiple files submitted are associated with the same meeting (e.g., each file is a separate user’s audio track), the transcribed output will not be merged but will remain as separately transcribed files. This is because the “.wav” file alone does not include the necessary timing information to perform proper alignment.
Object code: DISTAR 14459 – Approved for Public Release, Distribution Unlimited
User guide, API and examples: DISTAR 16637 – Approved for Public Release, Distribution Unlimited