Each call to the controller.step()
function returns an Event
object that contains a rich amount of information about the state of the environment and each of the objects within the environment.
import ai2thor.controller
controller = ai2thor.controller.Controller()
controller.start()
# can be any one of the scenes FloorPlan###
controller.reset('FloorPlan28')
event = controller.step(dict(action='Initialize', gridSize=0.25))
# return object from controller.step()
event = controller.step(dict(action=<SOME ACTION>))
Attribute | Type | Description |
---|---|---|
metadata | dict | all attributes about agent, objects, visibility, etc. See description below for more detailed documentation |
screen_width | int | width of the player; extracted from event.metadata[‘screenWidth’] |
screen_height | int | height of the player; extracted from event.metadata[‘screenHeight’] |
frame | Numpy Array | Current RGB image from the agent’s camera. Shape of array is (width, height, channels). Channels are in RGB order. Shape: (h, w, c) dtype: numpy.uint8 |
depth_frame | Numpy Array | Numpy Array containing depth information in millimeters with a max set of 5 meters. Shape: (h, w) dtype: numpy.float32 |
cv2img | Numpy Array | Numpy Array suitable for use with OpenCV. Shape: (h, w, c) Channels are in BGR order. |
color_to_object_id | dict | Dictionary: key=RGB tuple, value=string that corresponds to either an objectId or object type. This is structure is populated only when renderObjectImage is set to True when Initialize called for a scene. |
object_id_to_color | dict | Inverse of the color_to_object_id structure. |
instance_segmentation_frame | Numpy Array | Segmentation image by individual object, Shape: (h, w, c) colors correspond to the keys found in color_to_object_id. Only available when renderObjectImage is enabled during Initialize call. |
class_segmentation_frame | number | Segmentation image by class of object (e.g. all mugs are the same color). Colors correspond to keys found in color_to_object_id. Only available when renderClassImage is enabled during Initialize call. |
instance_detections2D | dict | 2D bounding boxes of detected objects. Dictionary: key=objectId value=bounding box. bounding box=[start_x, start_y, end_x, end_y]. Only available when renderObjectImage is enabled during Initialize call. |
class_detections2D | number | 2D bounding boxes of detected classes. Dictionary: key=object class value=list of bounding boxes. bounding box=[start_x, start_y, end_x, end_y]. Only available when renderObjectImage is enabled during Initialize call. |
instance_masks | dict | Dictionary of object masks that can be applied to other images from the event. key=objectId value=Numpy array shape: (h, w) dtype=numpy.bool. Only available when renderObjectImage is enabled during Initialize call. |
class_masks | dict | Dictionary of class masks that can be applied to other images from the event. key=object class value=Numpy array shape: (h, w) dtype=numpy.bool. Only available when renderObjectImage is enabled during Initialize call. |
third_party_camera_frames | List |
List of current RGB images from any third party cameras in the scene. The order of the list corresponds to the order they cameras were added. Shape of image array is (width, height, channels). Channels are in RGB order. Shape: (h, w, c) dtype: numpy.uint8 |
third_party_class_segmentation_frames | List |
List of current segmentation images from any third party cameras in the scene. The order of the list corresponds to the order they cameras were added. Segmentation image by class of object (e.g. all mugs are the same color). Colors correspond to keys found in color_to_object_id. Only available when renderClassImage is enabled during Initialize call |
third_party_instance_segmentation_frames | List |
List of current segmentation images from any third party cameras in the scene. The order of the list corresponds to the order they cameras were added. Segmentation image by individual object, Shape: (h, w, c) colors correspond to the keys found in color_to_object_id. Only available when renderObjectImage is enabled during Initialize call. |
third_party_depth_frames | List |
List of current depth images from any third party cameras in the scene. The order of the list corresponds to the order they cameras were added. Each image is a Numpy Array containing depth information in millimeters with a max set of 5 meters. Shape: (h, w) dtype: numpy.float32. Only available when renderDepthImage=True is passed to the Initialize action |
controller.step(dict(action='Initialize', agentCount=2))
event = controller.step(dict(action=<SOME ACTION>, agentId=0)) # agentId can be 0..N (where N=number of agents - 1)
Attribute | Type | Description |
---|---|---|
metadata | dict | Metadata for the active agent (agent that received the most recent action). All attributes about agent, objects, visibility, etc. See description below for more detailed documentation |
screen_width | int | width of the player; extracted from event.metadata[‘screenWidth’] |
screen_height | int | height of the player; extracted from event.metadata[‘screenHeight’] |
cv2img | Numpy Array | cv2img for the active agent. Numpy Array suitable for use with OpenCV. Shape: (h, w, c) Channels are in BGR order. |
events | list | Array of event objects. One per agent. Element 0 corresponds to the first agent, element 1 for the second. |
third_party_camera_frames | List |
List of current RGB images from any third party cameras in the scene. The order of the list corresponds to the order they cameras were added. Shape of image array is (width, height, channels). Channels are in RGB order. Shape: (h, w, c) dtype: numpy.uint8 |
# retrieved by using the instance variable 'metadata'
event.metadata
Attribute | Type | Description | Example |
---|---|---|---|
agent | agent | attributes pertaining to agent’s location, camera position and rotation | |
errorMessage | string | string explaining why the last action failed (if lastActionSuccess is false) | |
lastAction | string | The action that was issued to the agent to generate the response | MoveAhead |
lastActionSuccess | boolean | True/False whether the last action suceeded | True |
objects | array of objects | Array of all objects in the scene | |
screenHeight | number | Height of the image rendered by Unity | 300 |
screenWidth | number | Width of the image rendered by Unity | 300 |
sequenceId | number | Used to ensure that commands and responses are aligned | |
thirdPartyCameras | List<thirdPartyCamera> | List of third party camera attributes |
event.metadata['agent']
Attribute | Type | Description | Example |
---|---|---|---|
cameraHorizon | float | Position of camera relative to the horizon. 0.0 is looking straight ahead, 30.0 degrees is looking down by 30 degrees and 330 is looking up by 30.0 degrees. | 0.0 |
position | vector3 | X,Y,Z coordinates of the agent in the world reference frame | |
rotation | vector3 | X,Y,Z rotations of the agent in degrees in global space |
Attribute | Type | Description | Example |
---|---|---|---|
distance | float | Distance from centerpoint of object to the agent’s camera | 3.541793 |
name | string | Name of the object in Unity Scene. These names are unique within any individual scene. | Table_akjlis2j |
objectId | string | Unique id for the object within the scene | TableTop|-02.08|+00.94|-03.62 |
position | vector3 | X,Y,Z coordinates of the object in global space | |
rotation | vector3 | X,Y,Z rotations of the object in degrees in global space | |
visible | boolean | Boolean indicating whether the object is visible to the agent | True |
pickupable | boolean | Boolean indicating whether the object can be picked up by the agent. It will only be possible to actually pick up the object if it is also reachable by the agent (ie: seeing a SoapBar through a Glass shower door will report the SoapBar as visible, but it cannot be reached through the glass | True |
isPickedUp | boolean | Boolean indicating whether the object is currently picked up by an Agent | True |
receptacle | boolean | Boolean indicating whether the object is a receptacle that can contain other objects | True |
receptacleObjectIds | array of strings | If the object is a receptacle, this is an array of objectIds that the receptacle contains | Spoon|-02.1|+00.93|2.62, Knife|-01.1|+00.93|4.34 |
openable | boolean | Boolean indicating whether the object can be opened or closed with the OpenObject and CloseObject actions True |
|
isOpen | boolean | Boolean indicating whether the object is open or closed | True |
toggleable | boolean | Boolean indicating whether the object can be toggled on or off using the ToggleObjectOn and ToggleObjectOff actions |
True |
isToggled | boolean | Boolean indicating whether the object is on or off | True |
breakable | boolean | Boolean indicating whether the object can be broken using either the BreakObject action or will break from high enough physical force |
True |
isBroken | boolean | Boolean indicating whether the object is currently broken | True |
canFillWithLiquid | boolean | Boolean indicating whether the object can be filled with a liquid using the FillObjectWithLiquid action |
True |
isFilledWithLiquid | boolean | Boolean indicating whether the object is filled with a liquid | True |
dirtyable | boolean | Boolean indicating whether the object can be toggled dirty or clean using the DirtyObject and CleanObject actions |
True |
isDirty | boolean | Boolean indicating whether the object is dirty | True |
cookable | boolean | Boolean indicating whether the object can be cooked | True |
isCooked | boolean | Boolean indicating whether the object has been cooked | True |
sliceable | boolean | Boolean indicating whether the object can be sliced with the SliceObject action |
True |
isSliced | boolean | Boolean indicating whether the object has been sliced | True |
canBeUsedUp | boolean | Boolean indicating whether the object can be used up with the UseUpObject action |
True |
isUsedUp | boolean | Boolean indicating whether the object has been used up | True |
ObjectTemperature | string | String that lists the object’s current relative temperature. Valid strings are: Hot, Cold, RoomTemp | Hot |
canChangeTempToHot | boolean | Boolean indicating whether the object is a source of Heat and can contextually change other object’s Temperature to Hot | True |
canChangeTempToCold | boolean | Boolean indicating whether the object is a source of Cold and can contextually change other object’s Temperature to Cold | True |
mass | float | The mass of a Pickupable sim object in Kilograms | 0.5 |
salientMaterials | array of strings | Array of strings listing the salient materials a pickupable object is composed of. Valid strings are: Metal, Wood, Plastic, Glass, Ceramic, Stone, Fabric, Rubber, Food, Paper, Wax, Soap, Sponge, Organic | Metal, Plastic |
Attribute | Type | Description | Example |
---|---|---|---|
x | float | ||
y | float | ||
z | float |
event.metadata['thirdPartyCameras']
Attribute | Type | Description | Example |
---|---|---|---|
thirdPartyCameraId | int | id of the camera. Used in conjuction with UpdateThirdPartyCamera action to change the position/rotation of a camera. | 0 |
position | vector3 | X,Y,Z coordinates of the agent in the world reference frame | |
rotation | vector3 | X,Y,Z rotations of the agent in degrees in global space |
Continue on to the Examples documentation.