ASIST Study 3 Dataset
The ASIST Study 3 dataset was collected in 2022 as part of DARPA's Artificial Social Intelligence for Successful Teams (ASIST) program.
Experiment description
In this experiment, teams of 3 participants conduct urban search-and-rescue missions in Minecraft, with some teams being advised by AI agent advisors.
Each team participates in three 'missions' (or 'trials' -- we will use those terms interchangeably here): one 'training' mission, followed by two 'real' missions.
The preregistration for the experiment describes the motivation for the experiment as well as details on the data collection.
How to access the dataset
Option 1: ASU Dataverse
The dataset is publicly available via the ASU Dataverse Research Data Repository.
However, this option is less than ideal if you need to frequently access different subsets of the data, or work with the data programmatically. This option is recommended only if you do not have SSH access to the copy of this dataset on the lab servers.
Option 2: Access via the lab servers
This dataset is currently hosted on one of the lab's network storage volumes, mule. Specifically, the raw ASIST Study 3 data is located in the following directory:
/media/mule/projects/tomcat/protected/study-3_2022
The mule volume is mounted on the orca, kraken, and leviathan VMs (see Compute and Storage), so you should be able to access the data if you have SSH access to any of these VMs and permission to access the data (if you have SSH access to the VM and cannot access the data, please contact Adarsh).
File naming conventions
The general naming convention for the files is as follows:
<Completeness>_<Data type>[-Part 1 | _Trial-<Trial ID>_Team-<Team ID>_Member-<Participant ID>_CondBtwn-<Advisor>_CondWin-Vers-<Version number>.<extension>
EBNF Grammar
filename = [validity]
"_HSRData_"
remainder
validity = "Missing" | "Terminated"
remainder =
"ClientAudio" [part] trial team member condbtwn condwin "Vers-1" "wav"
| "DockerLogs" [part] "_Trial-na_" team "_Member-na" condbtwn condwin "Vers-1" ".tar.gz"
| "MetaData_Study-3.csv"
| "OBVideo" [part] trial team member condbtwn condwin "Vers-1" ".mp4"
| "QCTrialMessages" trial team "_Member-na_" condbtwn condwin version "txt"
| "Surveys" survey_part [ "Fulltext" | "Numeric" ] "Trial-na_Team-na_Member-na_CondBtwn-na_CondWin-na_Vers-07272022." ("csv" | "sav")
| "TrialMessages" [part] trial team "Member-na" condbtwn condwin version ".metadata"
| "ZoomAudio" "_Trial_na" team "_Member-na" condbtwn condwin "Vers-1.m4a"
| "ZoomAudioTranscript" "_Trial_na" team "_Member-na" condbtwn condwin "Vers-1.vtt"
| "ZoomVideo" "_Trial_na" team "_Member-na" condbtwn condwin "Vers-1.mp4"
survey_part = "0" | "1" | "2" | "3"
part = "-Part" part_number
part_number = "1" | "2"
trial = "_Trial-" trial_id
trial_id = "Training" | trial_number
trial_number = "T000XXX" | ...
condwin = "_CondWin-na"
version = "_Vers-" version_number
version_number = "1" | "2" | "3" ...
condbtwn = "CondBtwn-" condition
condition = "none" | "Human-01" | "ASI-" performer "-TA1"
performer = "UAZ" | "DOLL" | "CRA" | "USC" | "SIFT" | "CMURI"
team = "_Team-" team_id
team_id = "TM00323" | ...
member = "_Member-" member_id
member_id = "na" | "HumanAdvisor" | "E000726" | ...
Completeness
The Completeness part of the filename above can take on the following three values:
-
HSRData: The data is valid. In general, you will want to only use files from this dataset with names starting withHSRData. -
Missing: Missing data -
Terminated: Data from a trial that was terminated early.
Data type
File descriptions
There are multiple types of files in the dataset. They are described below.
Metadata
-
HSRData_MetaData_Study-3.csv: Metadata about the experiment, filenames, etc.
Message bus data
-
*.metadata: Messages sent on the message bus, one file for each trial. Each line of this file is a JSON object. The messages contain information about participant positions, actions, etc. This also includes automated transcriptions of the participants' dialog done in real time via Google Cloud Speech.
Documentation of the message formats can be found here.
Video recordings
-
*.mp4: Individual and team video recordings (of the missions).
Example rsync invocation for downloading all the videos (assumes you are able to SSH into kraken using the invocation ssh kraken):
rsync \
-avP \
kraken:/media/mule/projects/tomcat/protected/study-3_2022 \
--include "*/" \
--include "HSRData_*.mp4" \
--exclude "*"
Audio recordings
-
*.m4a: Zoom audio recordings, one per team -
*.wav: Individual participant audio recordings. These audio recordings are captured via the participants' browser rather than Zoom, in order to have real-time, source-separated audio streams for automated speech recognition.
Survey data
-
HSRData_Surveys*.csv: Data from Qualtrics surveys filled out by participants. -
*.sav: Alternate file format for Qualtrics survey exports
Other data
-
*.tar.gz: Docker logs from the different testbed components. One.tar.gzcompressed archive per team. -
*.txt: Quality control reports for the.metadatafiles. -
*.vtt: Automated transcriptions of the experimental sessions generated by Zoom (one per team).