Thursday, April 16, 2009

Skipping Spare Information in Multimodal Inputs during Multimodal Input Fusion

by Sun, Y., Shi, Y., Chen, F. and Chung, V.

Summary:

The article is based on the assumption that multimodal user interfaces that include natural communication modalities would enable the vast usage of new technologies. A technique called Multimodal Input Fusion (MMIF) is described as the combination of inputs from different modalities and interpreting them semantically. It is described that these inputs are symbol strings that constitute of individual recognized input elements from different modalities. As a whole, these symbol strings add up to multimodal utterance as defined by the authors.

The motivation to address the spare information problem has came to surface during the pilot study of Sun et al.'s previous attempt for discovering a flexible MMIF technique. It was observed that some inputs from different modalities do not have any semantic relation with some other inputs when present in the same multimodal utterance. This suggested the presence of spare information in multimodal inputs, which made it hard on the system while making semantic interpretations. Therefore, they proposed a MMIF approach that will leave out the spare information present, and make semantic interpretations of the rest of the inputs combined. They go on and explain the structure of the proposed multimodal input parser as a decomposition of the multimodal utterance into sub-utterances through a grammer called Multimodal Combinatory Catergory Grammar. After this decomposition, a matrix is created for recombination of the sub-utterances and the results with maximum symbols have priority going into the Dialogue Management Module, which is a decision module that outputs meaningful utterance when it is found.

Discussion:

The authors' approach for skipping spare information through the proposed parsing of multimodal utterances and recombining to meaningful ones seems to be an effective way for deriving semantic information. It is absolutely needed for multimodal computer-human interaction that is natural. However, when deciding for causes of spare information it seems that the authors did not take the possible pre-perceptions of users about the system or the cognitive load imposed upon them into account. When these have not been investigated throughly it seems that with increasing modalities included in the system, spare information would increase as well. This will seriously effect the recombination and decision process that is derived from a mutrix of all sub-utterances that is recognized by the individual input modality's engine.

Citation:

by Sun, Y., Shi, Y., Chen, F. and Chung, V. Skipping Spare Information in Multimodal Inputs during Multimodal Input Fusion. In IUI '09: Proceedings of the Intelligent User Interfaces Conference, Sanibel Islands, Florida, USA, 2009.

A Multimodal Interface for Road Design

by Blessing, A., Sezgin, T. M., Arandjelovic, R. and Robinson, P.

Summary:

The authors begin the article by stating that designing with a computer can be made more interactive and natural with speech and sketch-based interfaces. They wanted to show this in a representative interface for road design that will be used either for actual road design or road design for simulators. The common parts for these two include designing the physical aspects of the road such as the road itself, the signs, the place of the signs, etc. It is proposed that these can be specified with the sketch-based system. While, on the other hand, the design of the traffic behavior within a simulation could be achieved through speech.

After conducting necessary interviews with experts on road design, the authors came up with Multimodal Intelligent Road Design Assistant (MIRA) for their intended speech and sketch-based system. The mapping of features of the system to modalities has been done in order to meet the desired affordances, which are put out by the experts. For example, it is suggested that behavior of an object or a group of objects can be defined with speech, together by a traditional WIMP modality. Or, designing the road can be done with a pen-based modality. Furthermore, it is stated that the modalities has been used in a complementary fashion. While some features could both be mapped to speech or sketch-based modalities, the authors have chosen one of the two in order not to overload a given modality.

After dealing with mapping features to modalities, the authors continue with giving information about speech and sketch-based recognition engines. Finally, they provide the pilot testing and evaluation and finish the article by putting out the related and future work.

Discussion:

Overall, the article shows that a system with natural modalities such as speech and sketching is capable of making the design process more interactive. One interesting point about the pilot test indicates that the capabilities of the speech interfaces so-far have a negative effect on the users' part while trying to communicate with the system. A video that shows the usage of speech system has been shown to overcome this negative effect and it succeeded.

The article lacks a proper method for the assignment of modalities to the different capabilities of the system and this proves itself in pilot study. The mapping has been done in an intuitive manner, which lead to some recognition errors during editing operations. Some short words for editing such as "undo", "redo", etc. had been misrecognized due to the speech engines' high sensitivity for noise. The authors needed to change the intended modality for these operations to pen-based in order to prevent some serious results that may frustrate the end-user.

Citation:

Blessing, A., Sezgin, T.M., Arandjelovic, R. and Robinson, P. A Multimodal Interface for Road Design. In IUI '09: Proceedings of the Intelligent User Interfaces Workshop on Sketch Recognition, Sanibel Island, Florida, USA, 2009.