Izidor Mlakar, Zdravko Kačič, Matej Rojc



A Corpus for Analyzing Linguistic and Paralinguistic Features in Multi-Speaker Spontaneous Conversations – EVA Corpus

pdf PDF


This study is a part of an ongoing effort in order to empirically investigate in detail relations between verbal and co-verbal behavior expressed during multi-speaker highly spontaneous and affective faceto- face conversations. The main motivation for this study is to be able to create natural co-verbal resources for automatic synthesis of highly natural co-verbal behavior on general un-annotated text and expressed through embodied conversational agents. The presented study utilizes a highly multimodal approach that investigate several linguistic levels, such as: paragraphs, sentences, sentence types, words, POS tags, and prosodic features such as phrase breaks, prominence, durations, and F0, as well as functional and formal annotation of co-verbal behavior, such as: collocutor’s role (speaker, listener), semiotic classification of behavior, emotions, facial expressions, head movement, gaze, and hand gestures. The EVA corpus in this way represents a valuable resource for the algorithm for synthesizing co-verbal behavior primarily focused on gestures and semiotic intent. The EVA corpus in the presented form represent a rich empirical resource for performing several studies of complex conversational phenomena that are present in highly spontaneous face-to-face conversations, especially those related to multimodal expression of information, emotions, and communicative and noncommunicative role of co-verbal expressions. In the paper also, the proposed annotation scheme and annotation procedure are presented. Preliminary studies of phenomena regarding emotions within conversations on the EVA corpus have been conducted and presented in the paper


multiparty dialog, informal conversation, multimodal corpora, linguistic and paralinguistic features, verbal and non-verbal interaction


[1] Allwood, J. (2013). A framework for studying human multimodal communication. Coverbal Synchrony in Human-Machine Interaction, 17.

[2] Allwood, J., Cerrato, L., Jokinen, K., Navarretta, C., Paggio, P. (2007). The MUMIN coding scheme for the annotation of feedback, turn management and sequencing phenomena. J. of Language Resources and Evaluation 41(3), 273–287.

[3] Bergmann, K., Kopp, S. (2010).Systematicity and Idiosyncrasy in Iconic Gesture Use: Empirical Analysis and Computational Modeling. In: Kopp, S., Wachsmuth, I. (eds.) GW 2009. LNCS, vol. 5934, pp. 182–194. Springer, Heidelberg (2010).

[4] Bozkurt, E., Yemez, Y., & Erzin, E. (2016). Multimodal analysis of speech and arm motion for prosody-driven synthesis of beat gestures. Speech Communication, 85, 29- 42.

[5] Caridakis, G., Wagner, J., Raouzaiou, A., Lingenfelser, F., Karpouzis, K., & Andre, E. (2013). A cross-cultural, multimodal, affective corpus for gesture expressivity analysis. Journal on Multimodal User Interfaces, 7(1-2), 121-134.

[6] Chen, C. L., & Herbst, P. (2013). The interplay among gestures, discourse, and diagrams in students’ geometrical reasoning. Educational Studies in Mathematics, 83(2), 285-307.

[7] Colletta, J. M., Guidetti, M., Capirci, O., Cristilli, C., Demir, O. E., Kunene-Nicolas, R. N., & Levine, S. (2015). Effects of age and language on co-speech gesture production: an investigation of French, American, and Italian children's narratives. Journal of child language, 42(1), 122-145.

[8] Duncan, S. D., Cassell, J., & Levy, E. T. (Eds.). (2007). Gesture and the dynamic dimension of language: Essays in honor of David McNeill (Vol. 1). John Benjamins Publishing.

[9] El-Assady, M., Hautli-Janisz, A., Gold, V., Butt, M., Holzinger, K., & Keim, D. (2017). Interactive visual analysis of transcribed multi-party discourse. Proceedings of ACL 2017, System Demonstrations, 49-54.

[10] Esposito, A., Vassallo, J., Esposito, A. M., & Bourbakis, N. (2015, November). On the Amount of Semantic Information Conveyed by Gestures. In Tools with Artificial Intelligence (ICTAI), 2015 IEEE 27th International Conference on (pp. 660-667). IEEE.

[11] Fitzpatrick, E. (Ed.). (2007). Corpus linguistics beyond the word: corpus research from phrase to discourse (Vol. 60).

[12] Jokinen, K., & Pelachaud, C., (2013). From Annotation to Multimodal Behavior. In Coverbal Synchrony in Human-Machine Interaction, Rojc, M. & Campbell, N., eds., Crc Press, 2013, ISBN: 978-1-4665-9825-6.

[13] Keltner, D., & Cordaro, D. T. (2017). Understanding Multimodal Emotional Expressions. The science of facial expression, 1798.

[14] Laycraft, K. C. (2014). Creativity As An Order Through Emotions: A Study of Creative Adolescents and Young Adults. BookBaby.

[15] Li, Y., Tao, J., Chao, L., Bao, W., & Liu, Y. (2016). CHEAVD: a Chinese natural emotional audio–visual database. Journal of Ambient Intelligence and Humanized Computing, 1-12.

[16] Lin, Y. L. (2017). Co-occurrence of speech and gestures: A multimodal corpus linguistic approach to intercultural interaction. Journal of Pragmatics, 117, 155- 167.

[17] Martin, J. C., Caridakis, G., Devillers, L., Karpouzis, K., & Abrilian, S. (2009). Manual annotation and automatic image processing of multimodal emotional behaviors: validating the annotation of TV interviews. Personal and Ubiquitous Computing, 13(1), 69-76.

[18] Matsuyama, Y., Akiba, I., Fujie, S., & Kobayashi, T. (2015). Four-participant group conversation: A facilitation robot controlling engagement density as the fourth participant. Computer Speech & Language, 33(1), 1-24.

[19] McNeill, D., 2005. Gesture and Thought, University of Chicago Press.

[20] McNeill, D. (2015). Why we gesture: The surprising role of hand movements in communication. Cambridge University Press.

[21] Mlakar, I., & Rojc, M. (2012). Capturing form of non-verbal conversational behavior for recreation on synthetic conversational agent EVA. WSEAS Trans. Comput.

[Print ed.], 11(7), 218-226.

[22] Mlakar, I., Kačič, Z., & Rojc, M. (2012). Form-oriented annotation for building a functionally independent dictionary of synthetic movement. Cognitive Behavioural Systems, 251-265.

[23] Paggio, P., & Navarretta, C. (2016). The Danish NOMCO corpus: multimodal interaction in first acquaintance conversations. Language Resources and Evaluation, 1-32.

[24] Poggi, I. (2007). Hands, mind, face and body: A goal and belief view of multimodal communication. Berlin: Weidler.

[25] Rojc, M., Mlakar, I. (2016). An expressive conversational-behavior generation model for advanced interaction within multimodal user interfaces, (Computer Science, Technology and Applications). New York: Nova Science Publishers, Inc., cop. XIV, p. 234 str. ISBN 978-1-63482-955-7. ISBN 978-1-63484-084-2.

[26] Rojc, M., Mlakar, I., & Kačič, Z. (2017). The TTS-driven affective embodied conversational agent EVA, based on a novel conversational-behavior generation algorithm. Engineering Applications of Artificial Intelligence, 57, 80-104.

[27] Sloetjes, H., & Wittenburg, P., (2008). Annotation by category – ELAN and ISO DCR. In: Proceedings othe 6th International Conference on Language Resources and Evaluation (LREC 2008).

[28] Verdonik, D., Kosem, I., Vitez, A. Z., Krek, S., & Stabej, M. (2013). Compilation, transcription and usage of a reference speech corpus: The case of the Slovene corpus GOS. Language resources and evaluation, 47(4), 1031-1048.

[29] Wagner, P., Malisz, Z., & Kopp, S. (2014). Gesture and speech in interaction: An overview. Speech Communication, 57, 209- 232.

[30] Zhang, Z., Girard, J. M., Wu, Y., Zhang, X., Liu, P., Ciftci, U., & Cohn, J. F. (2016). Multimodal spontaneous emotion corpus for human behavior analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3438-3446).

[31] Haidt, J. (2000). The Positive emotion of elevation.

[32] Seligman, M. E., & Csikszentmihalyi, M. (2014). Positive psychology: An introduction. In Flow and the foundations of positive psychology (pp. 279-298). Springer Netherlands.

Cite this paper

Izidor Mlakar, Zdravko Kačič, Matej Rojc. (2017) A Corpus for Analyzing Linguistic and Paralinguistic Features in Multi-Speaker Spontaneous Conversations – EVA Corpus. International Journal of Computers, 2, 136-145


Copyright © 2017 Author(s) retain the copyright of this article.
This article is published under the terms of the Creative Commons Attribution License 4.0