Izidor Mlakar, Zdravko Kačič, Matej Rojc



A Corpus for Analyzing Linguistic and Paralinguistic Features in Multi-Speaker Spontaneous Conversations – EVA Corpus

This study is a part of an ongoing effort in order to empirically investigate in detail relations between verbal and co-verbal behavior expressed during multi-speaker highly spontaneous and affective faceto- face conversations. The main motivation for this study is to be able to create natural co-verbal resources for automatic synthesis of highly natural co-verbal behavior on general un-annotated text and expressed through embodied conversational agents. The presented study utilizes a highly multimodal approach that investigate several linguistic levels, such as: paragraphs, sentences, sentence types, words, POS tags, and prosodic features such as phrase breaks, prominence, durations, and F0, as well as functional and formal annotation of co-verbal behavior, such as: collocutor’s role (speaker, listener), semiotic classification of behavior, emotions, facial expressions, head movement, gaze, and hand gestures. The EVA corpus in this way represents a valuable resource for the algorithm for synthesizing co-verbal behavior primarily focused on gestures and semiotic intent. The EVA corpus in the presented form represent a rich empirical resource for performing several studies of complex conversational phenomena that are present in highly spontaneous face-to-face conversations, especially those related to multimodal expression of information, emotions, and communicative and noncommunicative role of co-verbal expressions. In the paper also, the proposed annotation scheme and annotation procedure are presented. Preliminary studies of phenomena regarding emotions within conversations on the EVA corpus have been conducted and presented in the paper


multiparty dialog, informal conversation, multimodal corpora, linguistic and paralinguistic features, verbal and non-verbal interaction


