This documentation reflects AI2-THOR version 2.1.0. For the latest AI2-THOR documentation, visit ai2thor.allenai.org.

Event Metadata

Each call to the controller.step() function returns an Event object that contains a rich amount of information about the state of the environment and each of the objects within the environment.

import ai2thor.controller
controller = ai2thor.controller.Controller()
controller.start()
# can be any one of the scenes FloorPlan###
controller.reset('FloorPlan28')
event = controller.step(dict(action='Initialize', gridSize=0.25))

Event Object

# return object from controller.step()
event = controller.step(dict(action=<SOME ACTION>))

Attribute	Type	Description
metadata	dict	all attributes about agent, objects, visibility, etc. See description below for more detailed documentation
screen_width	int	width of the player; extracted from event.metadata[‘screenWidth’]
screen_height	int	height of the player; extracted from event.metadata[‘screenHeight’]
frame	Numpy Array	Current RGB image from the agent’s camera. Shape of array is (width, height, channels). Channels are in RGB order. Shape: (h, w, c) dtype: numpy.uint8
depth_frame	Numpy Array	Numpy Array containing depth information in millimeters with a max set of 5 meters. Shape: (h, w) dtype: numpy.float32
cv2img	Numpy Array	Numpy Array suitable for use with OpenCV. Shape: (h, w, c) Channels are in BGR order.
color_to_object_id	dict	Dictionary: key=RGB tuple, value=string that corresponds to either an objectId or object type. This is structure is populated only when renderObjectImage is set to True when Initialize called for a scene.
object_id_to_color	dict	Inverse of the color_to_object_id structure.
instance_segmentation_frame	Numpy Array	Segmentation image by individual object, Shape: (h, w, c) colors correspond to the keys found in color_to_object_id. Only available when renderObjectImage is enabled during Initialize call.
class_segmentation_frame	number	Segmentation image by class of object (e.g. all mugs are the same color). Colors correspond to keys found in color_to_object_id. Only available when renderClassImage is enabled during Initialize call.
instance_detections2D	dict	2D bounding boxes of detected objects. Dictionary: key=objectId value=bounding box. bounding box=[start_x, start_y, end_x, end_y]. Only available when renderObjectImage is enabled during Initialize call.
class_detections2D	number	2D bounding boxes of detected classes. Dictionary: key=object class value=list of bounding boxes. bounding box=[start_x, start_y, end_x, end_y]. Only available when renderObjectImage is enabled during Initialize call.
instance_masks	dict	Dictionary of object masks that can be applied to other images from the event. key=objectId value=Numpy array shape: (h, w) dtype=numpy.bool. Only available when renderObjectImage is enabled during Initialize call.
class_masks	dict	Dictionary of class masks that can be applied to other images from the event. key=object class value=Numpy array shape: (h, w) dtype=numpy.bool. Only available when renderObjectImage is enabled during Initialize call.
third_party_camera_frames	List	List of current RGB images from any third party cameras in the scene. The order of the list corresponds to the order they cameras were added. Shape of image array is (width, height, channels). Channels are in RGB order. Shape: (h, w, c) dtype: numpy.uint8
third_party_class_segmentation_frames	List	List of current segmentation images from any third party cameras in the scene. The order of the list corresponds to the order they cameras were added. Segmentation image by class of object (e.g. all mugs are the same color). Colors correspond to keys found in color_to_object_id. Only available when renderClassImage is enabled during Initialize call
third_party_instance_segmentation_frames	List	List of current segmentation images from any third party cameras in the scene. The order of the list corresponds to the order they cameras were added. Segmentation image by individual object, Shape: (h, w, c) colors correspond to the keys found in color_to_object_id. Only available when renderObjectImage is enabled during Initialize call.
third_party_depth_frames	List	List of current depth images from any third party cameras in the scene. The order of the list corresponds to the order they cameras were added. Each image is a Numpy Array containing depth information in millimeters with a max set of 5 meters. Shape: (h, w) dtype: numpy.float32. Only available when renderDepthImage=True is passed to the Initialize action

Mult-Agent Event Object

If the environment has been initialized with more than one agent a Multi-Agent Event Object will be returned from the step() method.

controller.step(dict(action='Initialize', agentCount=2))
event = controller.step(dict(action=<SOME ACTION>, agentId=0)) # agentId can be 0..N (where N=number of agents - 1)

Attribute	Type	Description
metadata	dict	Metadata for the active agent (agent that received the most recent action). All attributes about agent, objects, visibility, etc. See description below for more detailed documentation
screen_width	int	width of the player; extracted from event.metadata[‘screenWidth’]
screen_height	int	height of the player; extracted from event.metadata[‘screenHeight’]
cv2img	Numpy Array	cv2img for the active agent. Numpy Array suitable for use with OpenCV. Shape: (h, w, c) Channels are in BGR order.
events	list	Array of event objects. One per agent. Element 0 corresponds to the first agent, element 1 for the second.
third_party_camera_frames	List	List of current RGB images from any third party cameras in the scene. The order of the list corresponds to the order they cameras were added. Shape of image array is (width, height, channels). Channels are in RGB order. Shape: (h, w, c) dtype: numpy.uint8

Metadata attributes

# retrieved by using the instance variable 'metadata'
event.metadata

Attribute	Type	Description	Example
agent	agent	attributes pertaining to agent’s location, camera position and rotation
errorMessage	string	string explaining why the last action failed (if lastActionSuccess is false)
lastAction	string	The action that was issued to the agent to generate the response	MoveAhead
lastActionSuccess	boolean	True/False whether the last action suceeded	True
objects	array of objects	Array of all objects in the scene
screenHeight	number	Height of the image rendered by Unity	300
screenWidth	number	Width of the image rendered by Unity	300
sequenceId	number	Used to ensure that commands and responses are aligned
thirdPartyCameras	List<thirdPartyCamera>	List of third party camera attributes

Agent attributes

event.metadata['agent']

Attribute	Type	Description	Example
cameraHorizon	float	Position of camera relative to the horizon. 0.0 is looking straight ahead, 30.0 degrees is looking down by 30 degrees and 330 is looking up by 30.0 degrees.	0.0
position	vector3	X,Y,Z coordinates of the agent in the world reference frame
rotation	vector3	X,Y,Z rotations of the agent in degrees in global space

Object attributes

Attribute	Type	Description	Example
distance	float	Distance from centerpoint of object to the agent’s camera	3.541793
name	string	Name of the object in Unity Scene. These names are unique within any individual scene.	Table_akjlis2j
objectId	string	Unique id for the object within the scene	TableTop\|-02.08\|+00.94\|-03.62
position	vector3	X,Y,Z coordinates of the object in global space
rotation	vector3	X,Y,Z rotations of the object in degrees in global space
visible	boolean	Boolean indicating whether the object is visible to the agent	True
pickupable	boolean	Boolean indicating whether the object can be picked up by the agent. It will only be possible to actually pick up the object if it is also reachable by the agent (ie: seeing a SoapBar through a Glass shower door will report the SoapBar as visible, but it cannot be reached through the glass	True
isPickedUp	boolean	Boolean indicating whether the object is currently picked up by an Agent	True
receptacle	boolean	Boolean indicating whether the object is a receptacle that can contain other objects	True
receptacleObjectIds	array of strings	If the object is a receptacle, this is an array of objectIds that the receptacle contains	Spoon\|-02.1\|+00.93\|2.62, Knife\|-01.1\|+00.93\|4.34
openable	boolean	Boolean indicating whether the object can be opened or closed with the `OpenObject` and `CloseObject` actions True
isOpen	boolean	Boolean indicating whether the object is open or closed	True
toggleable	boolean	Boolean indicating whether the object can be toggled on or off using the `ToggleObjectOn` and `ToggleObjectOff` actions	True
isToggled	boolean	Boolean indicating whether the object is on or off	True
breakable	boolean	Boolean indicating whether the object can be broken using either the `BreakObject` action or will break from high enough physical force	True
isBroken	boolean	Boolean indicating whether the object is currently broken	True
canFillWithLiquid	boolean	Boolean indicating whether the object can be filled with a liquid using the `FillObjectWithLiquid` action	True
isFilledWithLiquid	boolean	Boolean indicating whether the object is filled with a liquid	True
dirtyable	boolean	Boolean indicating whether the object can be toggled dirty or clean using the `DirtyObject` and `CleanObject` actions	True
isDirty	boolean	Boolean indicating whether the object is dirty	True
cookable	boolean	Boolean indicating whether the object can be cooked	True
isCooked	boolean	Boolean indicating whether the object has been cooked	True
sliceable	boolean	Boolean indicating whether the object can be sliced with the `SliceObject` action	True
isSliced	boolean	Boolean indicating whether the object has been sliced	True
canBeUsedUp	boolean	Boolean indicating whether the object can be used up with the `UseUpObject` action	True
isUsedUp	boolean	Boolean indicating whether the object has been used up	True
ObjectTemperature	string	String that lists the object’s current relative temperature. Valid strings are: Hot, Cold, RoomTemp	Hot
canChangeTempToHot	boolean	Boolean indicating whether the object is a source of Heat and can contextually change other object’s Temperature to Hot	True
canChangeTempToCold	boolean	Boolean indicating whether the object is a source of Cold and can contextually change other object’s Temperature to Cold	True
mass	float	The mass of a Pickupable sim object in Kilograms	0.5
salientMaterials	array of strings	Array of strings listing the salient materials a pickupable object is composed of. Valid strings are: Metal, Wood, Plastic, Glass, Ceramic, Stone, Fabric, Rubber, Food, Paper, Wax, Soap, Sponge, Organic	Metal, Plastic

Vector3 attributes

Attribute	Type	Description	Example
x	float
y	float
z	float

thirdPartyCamera attributes

event.metadata['thirdPartyCameras']

Attribute	Type	Description	Example
thirdPartyCameraId	int	id of the camera. Used in conjuction with UpdateThirdPartyCamera action to change the position/rotation of a camera.	0
position	vector3	X,Y,Z coordinates of the agent in the world reference frame
rotation	vector3	X,Y,Z rotations of the agent in degrees in global space

Next Steps

Continue on to the Examples documentation.