Unpublished manuscripts

Affective multimedia databases – state of the art (3)

With regard to implementation, affective multimedia databases have very simple structures usually consisting of a file repository and a manifesto file that describes the content of the repository. The manifesto is a text, Microsoft Excel or comma-separated values (CSV) formatted file with attributes such as unique identifier, semantic descriptor and eliciting affect for each multimedia document in the repository. Through the combination of unique folder paths and unique file names the stimuli get their Uniform Resource Identificators (URIs) which allows their retrieval from affective multimedia databases. In the case of IAPS the unique identifier is the name of the stimulus file, e.g. 5200.jpg, 5201.jpg etc. Therefore, contemporary affective multimedia databases are neither relation databases nor XML databases – they are just document repositories with a description document in a provisional format which is sometimes only human-readable.

Some affective multimedia databases like GAPED, BioID faces database, NimStim, or KDEF even do not have a describing file. In these databases semantic and affective stimuli content must be implicitly deduced using stimuli paths. This is the simplest possible database structure but only minimally sufficient in conveying the stimuli content and meaning to a subject. With such databases it is expected that the user must look over all stimuli and select them manually. If the database does not contain many stimuli per single level of semantic taxonomy manual document retrieval from these databases is still manageable. For example, GAPED – on of the the latest affective multimedia databases developed – has 730 negative, neutral and positive pictures with up to 159 different stimuli in a single named folder.

In terms of knowledge management affective multimedia databases contain only data about the stimuli themselves and do not describe semantics induced by the stimuli. If a stimulus video shows a dog barking the database does not specify concepts “animal”, ”dog”, ”to bark”, ”dog barking” etc. Affective multimedia databases do not contain knowledge taxonomies or any other information about concepts present in stimuli; they only state the most dominating semantic label present in stimuli. These databases do not provide descriptions of semantic labels nor reference them to knowledge bases such as DBPedia. It is entirely up to the database expert to integrate an affective multimedia database in a larger system and find the most appropriate knowledge base with a reasoning service to infer the meaning of multimedia semantic descriptors.

Affective multimedia databases – state of the art (2)

Apart from databases in the table, two more important sources of affective text corpora exist: WordNet-Affect and SemEval.

These two databases are not specifically designed for emotion elicitation experiments although they can be indi-rectly used in this domain. WordNet-Affect is taxonomy of over two hundred emo-tions and emotion-related terms. It is aligned with WordNet concepts and could be used to classify text labels in semantic and affective domains. SemEval is a set of different lexicons for word sense disambiguation semantic analysis and senti-ment evaluation, which were collaboratively developed by different researchers dur-ing a series of yearly evaluation workshops.

Some SemEval data such as SemCor and the previously mentioned SemEval can be used to support emotion elicitation with annotated multimedia. It should be noted that the advanced sentiment analysis is concerned with extraction of discrete emotion states such as “happiness”, “sadness” or “anger” from text documents. Therefore, sentiment classification could also be considered as a subtype of emotion classification.

As mentioned before multimedia stimuli are described with at least one of the two emotion theories: categorical or dimensional. The dimensional theories of emotion propose that affective meaning can be well characterized by a small number of dimensions. Dimensions are chosen on their ability to statistically characterize subjective emotional ratings with the least number of dimensions possible. These dimensions generally include one bipolar or two unipolar dimensions that represent positivity and negativity and have been labeled in various ways, such as valence or pleasure. Also usually included is a dimension that captures intensity, arousal, or energy level. In contrast to the dimensional theories, categorical theories claim that the dimensional models, particularly those using only two or three dimensions, do not accurately reflect the neural systems underlying emotional responses. Instead, supporters of these theories propose that there are a number of emotions that are universal across cultures and have an evolutionary and biological basis. Which discrete emotions are included in these theories is a point of contention, as is the choice of which dimensions to include in the dimensional models. Most supporters of discrete emotion theories agree that at least the five primary emotions of happiness, sadness, anger, fear and disgust should be included.

Dimensional and categorical theories of affect can both effectively describe emotion in digital systems but are not mutually exclusive. Many researchers who predominately use the dimensional model regard the positive and negative valence systems as appetitive and defensive systems, with arousal representing the intensity of activation within each system. It has been experimentally proved that visual stimuli from the IAPS produce different responses in skin conductance, startle reflex and heart rate depending on emotion category. Also, some categorical approaches already incorporate intensity or arousal into their models. With these empirical overlaps in theories of emotion, visual stimuli previously only characterized according to a single theory have now been characterized according to the complimentary emotion theory, including IAPS, IADS and ANEW. Annotations according to both theories of affect are useful for several reasons, predominantly because they providing a more complete characterization of stimuli affect. Additionally, apart from theories of emotion based on discrete categories or dimensions, numerous other paradigms exist for description of sentiments, appraisals, action tendencies and categorization of emotion states.

Affective multimedia databases – state of the art (1)

Contemporary affective multimedia databases are simple repositories of audiovisual multimedia documents such as pictures, sounds, text, videos etc. with described general semantics and emotion content.

Two important features distinguish affective multimedia databases from other multimedia repositories:

  1. purpose of multimedia documents, and;
  2. emotion representation of multimedia documents.

Multimedia documents in affective multimedia databases are aimed at inducing or stimulating emotions in exposed subjects. As such they are usually referred to as stimuli. All multimedia documents (i.e. stimuli) in affective multimedia databases have specified semantic and emotional content. Two predominant theories used to describe emotion are the discrete category model and the dimensional model of affect.

All databases have all been characterized according to at least one of these models. Other distinctive attributes of affective multimedia databases are: prevalent usage in psychological, psychophysiological and neuroscience research, interaction with domain tests and questionnaires, ethical concerns, harder and diversified demands on document retrieval, necessity for richer and more accurate semantics descriptors, semantic gap solution, scalability, interoperability, virtual reality hardware support and an advanced user interface for less computer proficient users with separate visualization systems for stimuli exposure supervisors and subjects.
The International Affective Picture System (IAPS) and the International Af-fective Digital Sounds system (IADS) are two of the most cited tools in the area of affective stimulation.

These databases were created with three goals in mind:

  1. Better experimental control of emotional stimuli;
  2. Increasing the ability of cross-study comparisons of results;
  3. Facilitating direct replication of undertaken studies.

The same standardization principles are shared among other similar affective mul-timedia databases. Apart from the IAPS and IADS the most important readily available affective multimedia databases available are Geneva Affective PicturE Database (GAPED), Nencki Affective Pictures System (NAPS), Dataset for Emotion Analysis using eeg, Physiological and video signals (DEAP), NimStim Face Stimulus Set, Pictures of Facial Affect (POFA), Karolinska Directed Emotional Faces (KDEF), Japanese and Caucasian Facial Expressions of Emotion and Neutral Faces (JACFEE and JACNeuF), The CAlifornia Facial expressions of Emotion (CAFE), Yale Face Database, Yale Face Database B, Japanese Female Facial Expression (JAFFE) Database, Facial Expressions and Emotion Database (FEED), Radboud Faces Database (RaFD), Affective Norms for English Words (ANEW), Affective Norms for English Texts (ANET) and SentiWordNet.

Additional audio-visual affective multimedia databases with category or dimensional emotion annotations are listed here. As can be seen in Table 1 facial expression databases are by far the most numerous modality among affective multimedia databases. Although facial expression databases are employed in emotion elicitation, these databases are also often used for face recognition and face detection. All three types of databases are commonly called face databases. A more detailed overview of these databases is available from (Gross2005).

Table 1. The list of the most often used collections of audiovisual stimuli. Some datasets had multiple revisions in the designated time frame. Owner refers to an institution that distributes a specific dataset.

Name Modality Owner Created
IAPS Picture University of Florida, The Center for the Study of Emotion and Attention 1997-2008
GAPED Picture University of Geneva, Swiss Center for Affective Sciences 2011
NAPS Picture Nencki Institute of Experimental Biol-ogy, Polish Academy of Sciences 2013
IADS Sound University of Florida, The Center for the Study of Emotion and Attention 1999
DEAP Video Queen Mary University of London 2012
NimStim Facial expression The MacArthur Foundation Research Network on Early Experience and Brain Development 2009
POFA Facial expression Paul Ekman Group 1976-1993
KDEF Facial expression Karolinska Institutet, Department of Clinical Neuroscience, Section of Psychology 1998
JACFEE/JACNeuF Facial expression Humintell 1988
CAFE Facial expression University of California 2001
Yale Face Database Facial expression Yale University 1997
Yale Face Database B Facial expression Yale University 2001
JAFFE Facial expression Kyushu University, Psychology De-partment 1998
ANEW Text University of Florida, The Center for the Study of Emotion and Attention 1999
ANET Text University of Florida, The Center for the Study of Emotion and Attention 1999-2007

Formal representation of emotions in computer systems

From the perspective of computer systems emotions are difficult to work with. Knowledge about emotion is often uncertain, either because of unavailable or missing data, or due to an uncertainty and unreliability due to errors in the measurement of emotions. Also, computer models of emotion are just being developed and have not yet reached a level of maturity and widespread use that would allow efficient interpretation and processing of data about emotions and taking decisions on the basis of the set goals.

Today, there are several computer languages that were designed specifically for annotation of different emotion states present (i.e. that can be perceived) in multimedia files. These languages have different levels of expressivity and formality, and are used for various purposes. The most important and frequently used are: Synchronized Multimedia Integration (SMIL), Speech Synthesis Markup Language (SSML), Extensible MultiModal Annotation Markup Language (EMMA), Emotion Annotation and Representation Language (EARL) and Virtual Human Markup Language (VHML). All these meta-formats for describing emotions are stored in formatted text files that are used to annotate data about the emotions that are found in other files. None of the existing meta-formats are based on logic. Annotated files may be of any format. The latest emotion language to be developed – in fact, its development is still ongoing – is Emotion Markup Language, or Emotion ML for short. EmotionML is being developed under the umbrella of W3C bringing together different partners from academia and industry such as DFKI, Deutsche Telekom, Fraunhofer Institute, Nuance Communications etc. EmotionML is based on XML so it is easy to build, parse and maintain, it is not tied to a specific platform, and it can also be read by human experts. It is designed as a “general purpose annotation language” and has largest vocabulary of all existing emotion languages.

EmotionML was developed as a plug in language which may be included in a variety of applications in three main areas:

  1. Manual annotation of data;
  2. Automatic recognition of emotion-related states from user behavior; and
  3. Generation of emotion-related system behavior

In practical terms, EmotionML is very good for annotation of affect in multimedia content. Once when the content is annotated it is then possible to store it in a XML database and retrieve it according to various query parameters supported by EmotionML syntax and semantics. For example, it is possible to represent emotion categories (i.e. discrete emotions such as anger, disgust, fear, happiness, sadness, and surprise), values of different emotion dimensions, appraisals and action tendencies, expert confidence in annotations, emotion expression modalities (e.g. voice), start and end times of specific emotions in video or sound files, time course of emotions etc.

As could be expected, EmotionML does not define how to annotate emotion as such, i.e. how to build EmotionML files. The files content may come from subjects’ questionnaires, expert interviews, automated physiology-based emotion estimation or any other method that could be suitable for identification of specific values defined by EmotionML semantics. However, neither approach is without its drawbacks. The choice which to use will depend on a set of circumstances in each separate case leveraging trade-offs between accuracy, time and implementation cost.

However, although EmotionML may excel in terms of its rich semantics and simple and effective syntax, it is still a purely XML language without any capabilities for higher knowledge representation and automatic reasoning as with, for example, RDF, RDFS and OWL. EmotionML is indeed very good for information storage and interchange, but it does not define mechanisms for using the stored information, patterns what to do with the information, methods or best practices how to implement custom tools for reading, processing and writing EmotionML statements etc. All these higher processes and complex tasks are left to individual researchers, their particular requirements and implementation capabilities.