Stephanie M. Lukin

I am a senior computer science researcher at the Army Resesarch Laboratory.

My work covers visual storytelling, narrative intelligence, and multi-modal human-robot dialogue.

selected publications

2024

HRI
3D Gaussian Splatting for Human-Robot Interaction

Shawn Bowser, and Stephanie M. Lukin

In Interactive AI for Human-Centered Robotics Workshop at RO-MAN. Best Paper Award, 2024

Abs Bib HTML

In order to assist a humans’ ability to make decisions in uncertain and high-stakes scenarios, e.g., disaster relief, we aim to provide an interactive and “smart” visual model of an environment that a human can explore and query. We contribute a method for photorealistic 3D reconstruction of a scene from 2D images using improvements to 3D Gaussian Splatting (3DGS) methods. We showcase our process using a synthetic scene and showing a high level of fidelity between the ground truth synthetic scene and the reconstruction. We visualize the 3D reconstruction through a proof-of-concept web interface with robot ego-centric and exo-centric views, as well as semantic labels of objects within the scene, through which a human can interact. We discuss our ongoing design of one such human-robot collaborative task using this interface.
@inproceedings{bowser20243dgs, title = {3D Gaussian Splatting for Human-Robot Interaction}, author = {Bowser, Shawn and Lukin, Stephanie M.}, booktitle = {Interactive AI for Human-Centered Robotics Workshop at RO-MAN. Best Paper Award}, year = {2024}, address = {Pasadena, California, USA}, publisher = {IEEE}, }
HRI
SCOUT: A Situated and Multi-Modal Human-Robot Dialogue Corpus

Stephanie M. Lukin, Claire Bonial, Matthew Marge, Taylor A. Hudson, Cory J. Hayes, Kimberly Pollard, Anthony Baker, Ashley N. Foots, Ron Artstein, Felix Gervits, Mitchell Abrams, Cassidy Henry, Lucia Donatelli, Anton Leuski, Susan G. Hill, David Traum, and Clare Voss

In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), 2024

Abs Bib HTML

We introduce the Situated Corpus Of Understanding Transactions (SCOUT), a multi-modal collection of human-robot dialogue in the task domain of collaborative exploration. The corpus was constructed from multiple Wizard-of-Oz experiments where human participants gave verbal instructions to a remotely-located robot to move and gather information about its surroundings. SCOUT contains 89,056 utterances and 310,095 words from 278 dialogues averaging 320 utterances per dialogue. The dialogues are aligned with the multi-modal data streams available during the experiments: 5,785 images and 30 maps. The corpus has been annotated with Abstract Meaning Representation and Dialogue-AMR to identify the speaker’s intent and meaning within an utterance, and with Transactional Units and Relations to track relationships between utterances to reveal patterns of the Dialogue Structure. We describe how the corpus and its annotations have been used to develop autonomous human-robot systems and enable research in open questions of how humans speak to robots. We release this corpus to accelerate progress in autonomous, situated, human-robot dialogue, especially in the context of navigation tasks where details about the environment need to be discovered.
@inproceedings{lukin-etal-2024-scout-situated, title = {{SCOUT}: A Situated and Multi-Modal Human-Robot Dialogue Corpus}, author = {Lukin, Stephanie M. and Bonial, Claire and Marge, Matthew and Hudson, Taylor A. and Hayes, Cory J. and Pollard, Kimberly and Baker, Anthony and Foots, Ashley N. and Artstein, Ron and Gervits, Felix and Abrams, Mitchell and Henry, Cassidy and Donatelli, Lucia and Leuski, Anton and Hill, Susan G. and Traum, David and Voss, Clare}, booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)}, year = {2024}, address = {Torino, Italia}, publisher = {ELRA and ICCL}, pages = {14445--14458}, }

2023

Visual Storytelling
Envisioning Narrative Intelligence: A Creative Visual Storytelling Anthology

Brett A Halperin, and Stephanie M Lukin

In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, 2023

Abs Bib HTML

In this paper, we collect an anthology of 100 visual stories from authors who participated in our systematic creative process of improvised story-building based on image sequences. Following close reading and thematic analysis of our anthology, we present five themes that characterize the variations found in this creative visual storytelling process: (1) Narrating What is in Vision vs. Envisioning; (2) Dynamically Characterizing Entities/Objects; (3) Sensing Experiential Information About the Scenery; (4) Modulating the Mood; (5) Encoding Narrative Biases. In understanding the varied ways that people derive stories from images, we offer considerations for collecting story-driven training data to inform automatic story generation. In correspondence with each theme, we envision narrative intelligence criteria for computational visual storytelling as: creative, reliable, expressive, grounded, and responsible. From these criteria, we discuss how to foreground creative expression, account for biases, and operate in the bounds of visual storyworlds.
@inproceedings{halperin2023envisioning, title = {Envisioning Narrative Intelligence: A Creative Visual Storytelling Anthology}, author = {Halperin, Brett A and Lukin, Stephanie M}, booktitle = {Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems}, pages = {1--21}, year = {2023}, dimensions = {true}, }
Visual Storytelling
SEE&TELL: Controllable Narrative Generation from Images

Stephanie M Lukin, and Sungmin Eum

In The AAAI-23 Workshop on Creative AI Across Modalities, 2023

Abs Bib HTML

We propose a visual storytelling framework with a distinction between what is present and observable in the visual storyworld, and what story is ultimately told. We implement a model that tells a story from an image using three affordances: 1) a fixed set of visual properties in an image that constitute a holistic representation its contents, 2) a variable stage direction that establishes the story setting, and 3) incremental questions about character goals. The generated narrative plans are then realized as expressive texts using few-shot learning. Following this approach, we generated 64 visual stories and measured the preservation, loss, and gain of visual information throughout the pipeline, and the willingness of a reader to take action to read more. We report different proportions of visual information preserved and lost depending upon the phase of the pipeline and the stage direction’s apparent relatedness to the image, and report 83% of stories were found to be interesting.
@inproceedings{lukin2023see, title = {SEE&TELL: Controllable Narrative Generation from Images}, author = {Lukin, Stephanie M and Eum, Sungmin}, booktitle = {The AAAI-23 Workshop on Creative AI Across Modalities}, year = {2023}, }