Control of ASR and TTS Servers BOF (cats) Thursday, March 21 at 1300-1500 ================================ CHAIRS: Eric Burger David Oran Mailing Lists: Send "subscribe mrcp" in the body to "majordomo@snowshore.com". Archive: http://flyingfox.snowshore.com/mrcp_archive/maillist.html. Agenda: o Agenda Bashing o Requirements/Problem Statement / Overview / Q&A o Open Issues o Scope of the work o What are special security issues for ASR/TTS control? o Draft Charter Discussion Description: This BOF will examine protocols to support distributed media processing of audio streams. There are multiple IETF protocols for establishment and termination of media sessions (SIP, SDP), and media record and playback (RTSP). The focus of this BOF is to develop protocols to support Automated Speech Recognition (ASR) and rendering text into audio, a.k.a. Text-to-Speech (TTS). The BOF will only focus on the distributed control of ASR and TTS servers. Many multimedia applications can benefit from having ASR and TTS processing available as a distributed, network resource. To date, there are a number of proprietary protocols for ASR and TTS control in the net. Several IETF drafts have been floated as well. The existing solutions (under the name Media Resource Control Protocol) have some serious deficiencies. In particular, they mix the semantics of existing protocols yet are close enough to other protocols as to be confusing to the implementer. The confusion reflects possible unclarity of the requirements and architecture, so this BOF (and potential working group) will start from a careful discussion of scope and requirements. The proposed work will not include distributed speech recognition (DSR), as exemplified by the ETSI Aurora project. The work proposed for ASR/TTS is part of a control plane for speech applications, and hence complements data plane work such as DSR. The latter seeks to improve recognition performance while reducing bandwidth requirements, compared to using conventional vocoders. The control protocol is in fact insensitive to the choice of data plane encodings for recognition and speech synthesis, in the same way that signaling protocols such as SIP and H.323 work identically for any data plane media encoding. The proposed working group will develop an informational RFC detailing the architecture and requirements for distributed ASR and TTS control. The working group will then examine existing media-related protocols, especially RTSP, for suitability as a protocol for carriage of ASR and TTS control. Then, the working group will propose extensions to existing protocols or the development of new protocols, as appropriate, to meet the requirements specified in the informational RFC. The protocol will assume RTP carriage of media. Assuming session-oriented media transport, the protocol will use SDP to describe the session. The proposed work will not re-create functionality available in other protocols, such as SIP or SDP. The working group will bring any requirements for changes of existing protocols, with the possible exception of RTSP, to the appropriate IETF working group for consideration. This working group will explore modifications to RTSP, if required, but must participate in the current revision work on RTSP to assure that a reasonable framework exists for those changes. IETF working groups to be coordinated with are SIPPING, MMUSIC and AVT, as well as ensuring architectural consistency with VPIM and other applications working groups in a more general sense. In addition, the proposed working group will coordinate so that there is consistency (or at least good dialogue) with speech-related efforts by WC3, such as their Multimodal Interaction activity. The intention is to disband the work group within one year of chartering. Required Reading for the BOF: draft-burger-mrcp-reqts-00.txt Additional Strawman Documents: draft-shanmugham-mrcp-01.txt draft-robinson-mrcp-sip-00.txt