Call for papers

We invite paper submissions for the second workshop on People in Vision, Language, and the Mind (formerly ONION 2020), which discusses how people, their bodies and faces as well as mental states are described in text with associated images, and modelled in computational and cognitive terms. We are interested in contributions from diverse areas including language generation, language analysis, cognitive computing, affective computing and multimodal (especially vision and language) modelling.

Detailed Workshop goals

The workshop will provide a forum to present and discuss current research focusing on multimodal resources as well as computational and cognitive models aiming to describe people in terms of their bodies and faces, including their affective state as it is reflected physically. Such models might either generate textual descriptions of people, generate images corresponding to people’s descriptions, or in general exploit multimodal representations for different purposes and applications. Knowledge of the way human bodies and faces are perceived, understood and described by humans is key to the creation of such resources and models, therefore the workshop also invites contributions where the human body and face are studied from a cognitive, neurocognitive or multimodal communication perspective.

Human body postures and faces are being studied by researchers from different research communities, including those working with vision and language modelling, natural language generation, cognitive science, cognitive psychology, multimodal communication and embodied conversational agents. The workshop aims to reach out to all these communities to explore the many different aspects of research on the human body and face, including the resources that such research needs, and to foster cross-disciplinary synergy.

The ability to adequately model and describe people in terms of their body and face is interesting for a variety of language technology applications, e.g., conversational agents and interactive narrative generation, as well as forensic applications in which people need to be depicted or their images generated from textual or spoken descriptions.

Such systems need resources and models where images associated with human bodies and faces are coupled with linguistic descriptions, therefore the research needed to develop them is placed at the interface between vision and language research.

At the same time, this line of research raises important ethical questions, both from the perspective of data collection methodology and from the perspective of bias detection and avoidance in models trained to process and interpret human attributes.

By focusing on the modelling and processing of people, and bringing in relevant insights from the cognitive and neurocognitive fields, the workshop will explore and further develop a particular area within vision and language research.

Relevant topics

We are inviting short and long papers reporting original research, surveys, position papers, and demos. Authors are strongly encouraged to identify and discuss ethical issues arising from their work, insofar as it involves the use of image data or descriptions of people.

Relevant topics include, but are not limited to, the following ones:

Datasets of facial images, as well as body postures, gestures and their descriptions
Methods for the creation and annotation of multimodal resources dedicated to the description of people
Methods for the validation of multimodal resources for descriptions of people
Experimental studies of facial expression understanding by humans
Models or algorithms for automatic facial description generation
Emotion recognition by humans
Multimodal automatic emotion recognition from images and text
Subjectivity in face perception
Communicative, relational and intentional aspects of head pose and eye-gaze
Collection and annotation methods for facial descriptions
Coding schemes for the annotation of body posture and facial expression
Understanding and description of the human face and body in different contexts, including commercial applications, art, forensics, etc.
Modelling of the human body, face and facial expressions for embodied conversational agents
Generation of full-body images and/or facial images from textual descriptions
Ethical and data protection issues related to the collection and/or automatic description of images of real people
Any form of bias in models which seek to make sense of human physical attributes in language and vision.

Submission guidelines

Short paper submissions may consist of up to 4 pages of content, while long papers may have up to 8 pages of content. References and appendices do not count towards these page limits.

All submissions must follow the LREC 2022 style files, which are available for LaTeX (preferred) and MS Word.

Papers must be submitted digitally, in PDF format, and uploaded through the START online submission system.

The authors of accepted papers will be required to submit a camera-ready version to be included in the final proceedings. Authors of accepted papers will be notified after the notification of acceptance with further details.

When submitting a paper from the START page, authors will be asked to provide essential information about resources (in a broad sense, i.e. also technologies, standards, evaluation kits, etc.) that have been used for the work described in the paper or are a new result of your research.

Moreover, ELRA encourages all LREC authors to share the described LRs (data, tools, services, etc.) to enable their reuse and replicability of experiments (including evaluation ones).

P-VLAM 2022

Detailed Workshop goals

Relevant topics

Submission guidelines