Stephanie M. Lukin

I am a senior computer science researcher at the Army Resesarch Laboratory.

My work covers visual storytelling, narrative intelligence, and multi-modal human-robot dialogue.

selected publications

2024

Multimodal IE
MUMOSA, Interactive Dashboard for MUlti-MOdal Situation Awareness

Stephanie M Lukin, Shawn Bowser, Reece Suchocki, Douglas Summers-Stay, Francis Ferraro, Cynthia Matuszek, and Clare Voss

In Proceedings of the Workshop on the Future of Event Detection (FuturED), 2024

Abs Bib HTML

Information extraction has led the way for event detection from text for many years. Recent advances in neural models, such as Large Language Models (LLMs) and Vision-Language Models (VLMs), have enabled the integration of multiple modalities, providing richer sources of information about events. Concurrently, the development of schema graphs and 3D reconstruction methods has enhanced our ability to visualize and annotate complex events. Building on these innovations, we introduce the MUMOSA (MUlti-MOdal Situation Awareness) interactive dashboard that brings these diverse resources together. MUMOSA aims to provide a comprehensive platform for event situational awareness, offering users a powerful tool for understanding and analyzing complex scenarios across modalities.
@inproceedings{lukin2024mumosa, title = {MUMOSA, Interactive Dashboard for MUlti-MOdal Situation Awareness}, author = {Lukin, Stephanie M and Bowser, Shawn and Suchocki, Reece and Summers-Stay, Douglas and Ferraro, Francis and Matuszek, Cynthia and Voss, Clare}, booktitle = {Proceedings of the Workshop on the Future of Event Detection (FuturED)}, pages = {32--47}, year = {2024}, }
HRI
Human–robot dialogue annotation for multi-modal common ground

Claire Bonial, Stephanie M Lukin, Mitchell Abrams, Anthony Baker, Lucia Donatelli, Ashley Foots, Cory J Hayes, Cassidy Henry, Taylor Hudson, Matthew Marge, and others

Language Resources and Evaluation, 2024

Abs Bib HTML

In this paper, we describe the development of symbolic representations annotated on human–robot dialogue data to make dimensions of meaning accessible to autonomous systems participating in collaborative, natural language dialogue, and to enable common ground with human partners. A particular challenge for establishing common ground arises in remote dialogue (occurring in disaster relief or searchand- rescue tasks), where a human and robot are engaged in a joint navigation and exploration task of an unfamiliar environment, but where the robot cannot immediately share high quality visual information due to limited communication constraints. Engaging in a dialogue provides an effective way to communicate, while on-demand or lower-quality visual information can be supplemented for establishing common ground. Within this paradigm, we capture propositional semantics and the illocutionary force of a single utterance within the dialogue through our Dialogue- AMR annotation, an augmentation of Abstract Meaning Representation. We then capture patterns in how different utterances within and across speaker floors relate to one another in our development of a multi-floor Dialogue Structure annotation schema. Finally, we begin to annotate and analyze the ways in which the visual modalities provide contextual information to the dialogue for overcoming disparities in the collaborators’ understanding of the environment. We conclude by discussing the use-cases, architectures, and systems we have implemented from our annotations that enable physical robots to autonomously engage with humans in bi-directional dialogue and navigation.
@article{bonial2024human, title = {Human--robot dialogue annotation for multi-modal common ground}, author = {Bonial, Claire and Lukin, Stephanie M and Abrams, Mitchell and Baker, Anthony and Donatelli, Lucia and Foots, Ashley and Hayes, Cory J and Henry, Cassidy and Hudson, Taylor and Marge, Matthew and others}, journal = {Language Resources and Evaluation}, pages = {1--51}, year = {2024}, publisher = {Springer}, }

2023

Visual Storytelling
Envisioning Narrative Intelligence: A Creative Visual Storytelling Anthology

Brett A Halperin, and Stephanie M Lukin

In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, 2023

Abs Bib HTML

In this paper, we collect an anthology of 100 visual stories from authors who participated in our systematic creative process of improvised story-building based on image sequences. Following close reading and thematic analysis of our anthology, we present five themes that characterize the variations found in this creative visual storytelling process: (1) Narrating What is in Vision vs. Envisioning; (2) Dynamically Characterizing Entities/Objects; (3) Sensing Experiential Information About the Scenery; (4) Modulating the Mood; (5) Encoding Narrative Biases. In understanding the varied ways that people derive stories from images, we offer considerations for collecting story-driven training data to inform automatic story generation. In correspondence with each theme, we envision narrative intelligence criteria for computational visual storytelling as: creative, reliable, expressive, grounded, and responsible. From these criteria, we discuss how to foreground creative expression, account for biases, and operate in the bounds of visual storyworlds.
@inproceedings{halperin2023envisioning, title = {Envisioning Narrative Intelligence: A Creative Visual Storytelling Anthology}, author = {Halperin, Brett A and Lukin, Stephanie M}, booktitle = {Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems}, pages = {1--21}, year = {2023}, dimensions = {true}, }