policy¶

policy ¶

Modules:

Name	Description
`base_policy`	Abstract Policy class to run a policy in the environment for data collection or evaluation.
`dummy_policy`
`learned_policy`
`random_policy`
`solvers`

base_policy ¶

Abstract Policy class to run a policy in the environment for data collection or evaluation. This class is designed to be subclassed and implemented with specific policies. For example, the policy can be a planner, a human teleop interface, a reinforcement learning agent, or any other type of agent that can interact with the environment to collect data.

Classes:

Name	Description
`BasePolicy`	Abstract base class for policies.
`InferencePolicy`
`PlannerPolicy`
`StatefulPolicy`

Attributes:

Name	Type	Description
`NONE_PHASE`

NONE_PHASE `module-attribute` ¶

NONE_PHASE = -1

BasePolicy ¶

BasePolicy(config: MlSpacesExpConfig, task: BaseMujocoTask | None = None)

Bases: ABC

Abstract base class for policies.

This class provides a template for policies that can be used to interact with an environment for the purpose of data collection or evaluation. It declares methods that should be implemented by any concrete policy class.

Methods:

Name	Description
`add_auxiliary_objects`	Add auxiliary objects to the scene that might be required for the policy.
`get_action`	Decide on the action to take based on the current observation of the or environment.
`get_all_phases`	Returns:
`get_info`	Get additional information from the policy. Called after episode ended. This method can be
`get_phase`	Returns:
`reset`	Reset the policy's internal state.

Attributes:

Name	Type	Description
`config`
`task`

Source code in molmo_spaces/policy/base_policy.py

def __init__(self, config: "MlSpacesExpConfig", task: BaseMujocoTask | None = None) -> None:
    self.config = config
    self.task = task

config `instance-attribute` ¶

config = config

task `instance-attribute` ¶

task = task

add_auxiliary_objects `staticmethod` ¶

add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py

@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None

get_action `abstractmethod` ¶

get_action(observation)

Decide on the action to take based on the current observation of the or environment. Information could be observations, goals in the case of an rl_agent, or it could be the full environment state in the case of a planner.

Parameters:

Name	Type	Description	Default
`observation`		The current information about the task or environment.	required

Returns:

Type	Description
	The action to take in response to the information.

Source code in molmo_spaces/policy/base_policy.py

@abstractmethod
def get_action(self, observation):
    """
    Decide on the action to take based on the current observation of the or environment.
    Information could be observations, goals in the case of an rl_agent, or it could be the full
    environment state in the case of a planner.

    Args:
        observation: The current information about the task or environment.

    Returns:
        The action to take in response to the information.
    """
    pass

get_all_phases ¶

get_all_phases() -> dict[str | int]

Returns:

Type	Description
`dict[str \| int]`	A dictionary of all possible policy phases

Source code in molmo_spaces/policy/base_policy.py

def get_all_phases(self) -> dict[str | int]:
    """
    Returns:
        A dictionary of all possible policy phases
    """
    return {"unknown": 0}

get_info ¶

get_info() -> dict

Get additional information from the policy. Called after episode ended. This method can be overridden by subclasses to provide extra information about the policy's state. Must be json serializable.

Returns:

Type	Description
`dict`	A dictionary containing additional information about the policy.

Source code in molmo_spaces/policy/base_policy.py

def get_info(self) -> dict:
    """
    Get additional information from the policy. Called after episode ended. This method can be
    overridden by subclasses to provide extra information about the policy's state.
    Must be json serializable.

    Returns:
        A dictionary containing additional information about the policy.
    """
    return {}

get_phase ¶

get_phase() -> str

Returns:

Type	Description
`str`	The policy phase

Source code in molmo_spaces/policy/base_policy.py

def get_phase(self) -> str:
    """
    Returns:
        The policy phase
    """
    return "unknown"

reset `abstractmethod` ¶

reset()

Reset the policy's internal state.

This method should be implemented by each subclass to reset the policy's internal state. It is typically called at the beginning of each episode or task.

Source code in molmo_spaces/policy/base_policy.py

@abstractmethod
def reset(self):
    """
    Reset the policy's internal state.

    This method should be implemented by each subclass to reset the policy's internal state.
    It is typically called at the beginning of each episode or task.
    """
    pass

InferencePolicy ¶

InferencePolicy(config: MlSpacesExpConfig, task_type)

Bases: BasePolicy

Methods:

Name	Description
`add_auxiliary_objects`	Add auxiliary objects to the scene that might be required for the policy.
`get_action`
`get_all_phases`	Returns:
`get_info`
`get_phase`	Returns:
`inference_model`
`model_output_to_action`
`obs_to_model_input`
`prepare_model`
`render`	Not required to be implemented by subclasses.
`reset`

Attributes:

Name	Type	Description
`config`
`current_phase`
`target_poses`
`task`
`task_type`

Source code in molmo_spaces/policy/base_policy.py

def __init__(self, config: "MlSpacesExpConfig", task_type) -> None:
    super().__init__(config)
    self.task_type = task_type

    # TODO(max): remove these (added to silence warnings)
    self.target_poses = {"grasp": np.eye(4)}
    self.current_phase = NONE_PHASE

config `instance-attribute` ¶

config = config

current_phase `instance-attribute` ¶

current_phase = NONE_PHASE

target_poses `instance-attribute` ¶

target_poses = {'grasp': eye(4)}

task `instance-attribute` ¶

task = task

task_type `instance-attribute` ¶

task_type = task_type

add_auxiliary_objects `staticmethod` ¶

add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py

@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None

get_action ¶

get_action(observation)

Source code in molmo_spaces/policy/base_policy.py

def get_action(self, observation):
    model_input = self.obs_to_model_input(observation)
    model_output = self.inference_model(model_input)
    action = self.model_output_to_action(model_output)
    return action

get_all_phases ¶

get_all_phases() -> dict[str | int]

Returns:

Type	Description
`dict[str \| int]`	A dictionary of all possible policy phases

Source code in molmo_spaces/policy/base_policy.py

def get_all_phases(self) -> dict[str | int]:
    """
    Returns:
        A dictionary of all possible policy phases
    """
    return {"unknown": 0}

get_info ¶

get_info() -> dict

Source code in molmo_spaces/policy/base_policy.py

def get_info(self) -> dict:
    info = super().get_info()
    info["task_type"] = self.task_type
    info["timestamp"] = time.time()
    return info

get_phase ¶

get_phase() -> str

Returns:

Type	Description
`str`	The policy phase

Source code in molmo_spaces/policy/base_policy.py

def get_phase(self) -> str:
    """
    Returns:
        The policy phase
    """
    return "unknown"

inference_model `abstractmethod` ¶

inference_model(model_input)

Source code in molmo_spaces/policy/base_policy.py

@abstractmethod
def inference_model(self, model_input):
    raise NotImplementedError

model_output_to_action `abstractmethod` ¶

model_output_to_action(model_output)

Source code in molmo_spaces/policy/base_policy.py

@abstractmethod
def model_output_to_action(self, model_output):
    raise NotImplementedError

obs_to_model_input `abstractmethod` ¶

obs_to_model_input(obs)

Source code in molmo_spaces/policy/base_policy.py

@abstractmethod
def obs_to_model_input(self, obs):
    raise NotImplementedError

prepare_model `abstractmethod` ¶

prepare_model(model_name: str)

Source code in molmo_spaces/policy/base_policy.py

@abstractmethod
def prepare_model(self, model_name: str):
    raise NotImplementedError

render ¶

render(obs)

Not required to be implemented by subclasses. Defaults to noop.

Source code in molmo_spaces/policy/base_policy.py

def render(self, obs):
    """
    Not required to be implemented by subclasses.
    Defaults to noop.
    """
    pass

reset `abstractmethod` ¶

reset()

Source code in molmo_spaces/policy/base_policy.py

@abstractmethod
def reset(self):
    raise NotImplementedError

PlannerPolicy ¶

PlannerPolicy(config: MlSpacesExpConfig, task: BaseMujocoTask | None = None)

Bases: BasePolicy

Methods:

Name	Description
`add_auxiliary_objects`	Add auxiliary objects to the scene that might be required for the policy.
`get_action`	Decide on the action to take based on the current observation of the or environment.
`get_all_phases`	Returns:
`get_info`	Get additional information from the policy. Called after episode ended. This method can be
`get_phase`	Returns:
`planners`	Abstract property representing the list or dict of planner instances.
`reset`	Reset the policy's internal state.

Attributes:

Name	Type	Description
`config`
`task`

Source code in molmo_spaces/policy/base_policy.py

def __init__(self, config: "MlSpacesExpConfig", task: BaseMujocoTask | None = None) -> None:
    self.config = config
    self.task = task

config `instance-attribute` ¶

config = config

task `instance-attribute` ¶

task = task

add_auxiliary_objects `staticmethod` ¶

add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py

@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None

get_action `abstractmethod` ¶

get_action(observation)

Decide on the action to take based on the current observation of the or environment. Information could be observations, goals in the case of an rl_agent, or it could be the full environment state in the case of a planner.

Parameters:

Name	Type	Description	Default
`observation`		The current information about the task or environment.	required

Returns:

Type	Description
	The action to take in response to the information.

Source code in molmo_spaces/policy/base_policy.py

@abstractmethod
def get_action(self, observation):
    """
    Decide on the action to take based on the current observation of the or environment.
    Information could be observations, goals in the case of an rl_agent, or it could be the full
    environment state in the case of a planner.

    Args:
        observation: The current information about the task or environment.

    Returns:
        The action to take in response to the information.
    """
    pass

get_all_phases ¶

get_all_phases() -> dict[str | int]

Returns:

Type	Description
`dict[str \| int]`	A dictionary of all possible policy phases

Source code in molmo_spaces/policy/base_policy.py

def get_all_phases(self) -> dict[str | int]:
    """
    Returns:
        A dictionary of all possible policy phases
    """
    return {"unknown": 0}

get_info ¶

get_info() -> dict

Get additional information from the policy. Called after episode ended. This method can be overridden by subclasses to provide extra information about the policy's state. Must be json serializable.

Returns:

Type	Description
`dict`	A dictionary containing additional information about the policy.

Source code in molmo_spaces/policy/base_policy.py

def get_info(self) -> dict:
    """
    Get additional information from the policy. Called after episode ended. This method can be
    overridden by subclasses to provide extra information about the policy's state.
    Must be json serializable.

    Returns:
        A dictionary containing additional information about the policy.
    """
    return {}

get_phase ¶

get_phase() -> str

Returns:

Type	Description
`str`	The policy phase

Source code in molmo_spaces/policy/base_policy.py

def get_phase(self) -> str:
    """
    Returns:
        The policy phase
    """
    return "unknown"

planners `abstractmethod` ¶

planners() -> None

Abstract property representing the list or dict of planner instances.

Source code in molmo_spaces/policy/base_policy.py

@abstractmethod
def planners(self) -> None:
    """Abstract property representing the list or dict of planner instances."""
    pass

reset `abstractmethod` ¶

reset()

Reset the policy's internal state.

This method should be implemented by each subclass to reset the policy's internal state. It is typically called at the beginning of each episode or task.

Source code in molmo_spaces/policy/base_policy.py

@abstractmethod
def reset(self):
    """
    Reset the policy's internal state.

    This method should be implemented by each subclass to reset the policy's internal state.
    It is typically called at the beginning of each episode or task.
    """
    pass

StatefulPolicy ¶

StatefulPolicy(config: MlSpacesExpConfig, task: BaseMujocoTask | None = None)

Bases: BasePolicy

Methods:

Name	Description
`add_auxiliary_objects`	Add auxiliary objects to the scene that might be required for the policy.
`get_action`	Decide on the action to take based on the current observation of the or environment.
`get_all_phases`	Returns:
`get_info`	Get additional information from the policy. Called after episode ended. This method can be
`get_phase`	Returns:
`get_state`
`reset`	Reset the policy's internal state.
`set_state`

Attributes:

Name	Type	Description
`config`
`task`

Source code in molmo_spaces/policy/base_policy.py

def __init__(self, config: "MlSpacesExpConfig", task: BaseMujocoTask | None = None) -> None:
    self.config = config
    self.task = task

config `instance-attribute` ¶

config = config

task `instance-attribute` ¶

task = task

add_auxiliary_objects `staticmethod` ¶

add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py

@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None

get_action `abstractmethod` ¶

get_action(observation)

Decide on the action to take based on the current observation of the or environment. Information could be observations, goals in the case of an rl_agent, or it could be the full environment state in the case of a planner.

Parameters:

Name	Type	Description	Default
`observation`		The current information about the task or environment.	required

Returns:

Type	Description
	The action to take in response to the information.

Source code in molmo_spaces/policy/base_policy.py

@abstractmethod
def get_action(self, observation):
    """
    Decide on the action to take based on the current observation of the or environment.
    Information could be observations, goals in the case of an rl_agent, or it could be the full
    environment state in the case of a planner.

    Args:
        observation: The current information about the task or environment.

    Returns:
        The action to take in response to the information.
    """
    pass

get_all_phases ¶

get_all_phases() -> dict[str | int]

Returns:

Type	Description
`dict[str \| int]`	A dictionary of all possible policy phases

Source code in molmo_spaces/policy/base_policy.py

def get_all_phases(self) -> dict[str | int]:
    """
    Returns:
        A dictionary of all possible policy phases
    """
    return {"unknown": 0}

get_info ¶

get_info() -> dict

Get additional information from the policy. Called after episode ended. This method can be overridden by subclasses to provide extra information about the policy's state. Must be json serializable.

Returns:

Type	Description
`dict`	A dictionary containing additional information about the policy.

Source code in molmo_spaces/policy/base_policy.py

def get_info(self) -> dict:
    """
    Get additional information from the policy. Called after episode ended. This method can be
    overridden by subclasses to provide extra information about the policy's state.
    Must be json serializable.

    Returns:
        A dictionary containing additional information about the policy.
    """
    return {}

get_phase ¶

get_phase() -> str

Returns:

Type	Description
`str`	The policy phase

Source code in molmo_spaces/policy/base_policy.py

def get_phase(self) -> str:
    """
    Returns:
        The policy phase
    """
    return "unknown"

get_state `abstractmethod` ¶

get_state()

Source code in molmo_spaces/policy/base_policy.py

@abstractmethod
def get_state(self):
    raise NotImplementedError

reset `abstractmethod` ¶

reset()

Reset the policy's internal state.

This method should be implemented by each subclass to reset the policy's internal state. It is typically called at the beginning of each episode or task.

Source code in molmo_spaces/policy/base_policy.py

@abstractmethod
def reset(self):
    """
    Reset the policy's internal state.

    This method should be implemented by each subclass to reset the policy's internal state.
    It is typically called at the beginning of each episode or task.
    """
    pass

set_state `abstractmethod` ¶

set_state(state)

Source code in molmo_spaces/policy/base_policy.py

@abstractmethod
def set_state(self, state):
    raise NotImplementedError

dummy_policy ¶

Classes:

Name	Description
`BrownianMotionPolicy`	Policy that applies Gaussian noise increments over noop control, resulting in Brownian motion.
`DummyPolicy`	A Dummy Policy that return null actions.

BrownianMotionPolicy ¶

BrownianMotionPolicy(config: MlSpacesExpConfig, task: BaseMujocoTask | None = None)

Bases: DummyPolicy

Policy that applies Gaussian noise increments over noop control, resulting in Brownian motion.

Methods:

Name	Description
`add_auxiliary_objects`	Add auxiliary objects to the scene that might be required for the policy.
`get_action`
`get_all_phases`	Returns:
`get_info`	Get additional information from the policy. Called after episode ended. This method can be
`get_phase`	Returns:
`reset`	Reset the policy state. No state to reset for DummyPolicy.

Attributes:

Name	Type	Description
`config`
`current_phase`
`std`
`target_poses`
`task`
`type`

Source code in molmo_spaces/policy/dummy_policy.py

def __init__(self, config: "MlSpacesExpConfig", task: BaseMujocoTask | None = None) -> None:
    super().__init__(config, task)
    self.task = task
    self.std = config.policy_config.std

config `instance-attribute` ¶

config = config

current_phase `instance-attribute` ¶

current_phase = NONE_PHASE

std `instance-attribute` ¶

std = std

target_poses `instance-attribute` ¶

target_poses = {'grasp': eye(4)}

task `instance-attribute` ¶

task = task

type `property` ¶

type

add_auxiliary_objects `staticmethod` ¶

add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py

@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None

get_action ¶

get_action(observation) -> dict

Source code in molmo_spaces/policy/dummy_policy.py

def get_action(self, observation) -> dict:
    robot_view = self.task.env.current_robot.robot_view
    action = robot_view.get_noop_ctrl_dict()
    for mg_id, ctrl in action.items():
        action[mg_id] = ctrl + np.random.normal(loc=0.0, scale=self.std, size=ctrl.shape)
    return action

get_all_phases ¶

get_all_phases() -> dict[str | int]

Returns:

Type	Description
`dict[str \| int]`	A dictionary of all possible policy phases

Source code in molmo_spaces/policy/base_policy.py

def get_all_phases(self) -> dict[str | int]:
    """
    Returns:
        A dictionary of all possible policy phases
    """
    return {"unknown": 0}

get_info ¶

get_info() -> dict

Get additional information from the policy. Called after episode ended. This method can be overridden by subclasses to provide extra information about the policy's state. Must be json serializable.

Returns:

Type	Description
`dict`	A dictionary containing additional information about the policy.

Source code in molmo_spaces/policy/base_policy.py

def get_info(self) -> dict:
    """
    Get additional information from the policy. Called after episode ended. This method can be
    overridden by subclasses to provide extra information about the policy's state.
    Must be json serializable.

    Returns:
        A dictionary containing additional information about the policy.
    """
    return {}

get_phase ¶

get_phase() -> str

Returns:

Type	Description
`str`	The policy phase

Source code in molmo_spaces/policy/base_policy.py

def get_phase(self) -> str:
    """
    Returns:
        The policy phase
    """
    return "unknown"

reset ¶

reset()

Reset the policy state. No state to reset for DummyPolicy.

Source code in molmo_spaces/policy/dummy_policy.py

def reset(self):
    """
    Reset the policy state. No state to reset for DummyPolicy.
    """
    pass

DummyPolicy ¶

DummyPolicy(config: MlSpacesExpConfig, task: BaseMujocoTask | None = None)

Bases: BasePolicy

A Dummy Policy that return null actions.

Methods:

Name	Description
`add_auxiliary_objects`	Add auxiliary objects to the scene that might be required for the policy.
`get_action`	Dummy action to take based on the action space.
`get_all_phases`	Returns:
`get_info`	Get additional information from the policy. Called after episode ended. This method can be
`get_phase`	Returns:
`reset`	Reset the policy state. No state to reset for DummyPolicy.

Attributes:

Name	Type	Description
`config`
`current_phase`
`target_poses`
`task`
`type`

Source code in molmo_spaces/policy/dummy_policy.py

def __init__(self, config: "MlSpacesExpConfig", task: BaseMujocoTask | None = None) -> None:
    self.config = config
    # Required attributes for sensors that expect a policy with target poses
    self.target_poses = {"grasp": np.eye(4)}
    self.current_phase = NONE_PHASE

config `instance-attribute` ¶

config = config

current_phase `instance-attribute` ¶

current_phase = NONE_PHASE

target_poses `instance-attribute` ¶

target_poses = {'grasp': eye(4)}

task `instance-attribute` ¶

task = task

type `property` ¶

type

add_auxiliary_objects `staticmethod` ¶

add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py

@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None

get_action ¶

get_action(obervation)

Dummy action to take based on the action space.

Parameters:

Name	Type	Description	Default
`info`		The current information about the task or environment (not used).	required

Returns:

Source code in molmo_spaces/policy/dummy_policy.py

def get_action(self, obervation):
    """
    Dummy action to take based on the action space.

    Args:
        info: The current information about the task or environment (not used).

    Returns:
    """
    return dict()

get_all_phases ¶

get_all_phases() -> dict[str | int]

Returns:

Type	Description
`dict[str \| int]`	A dictionary of all possible policy phases

Source code in molmo_spaces/policy/base_policy.py

def get_all_phases(self) -> dict[str | int]:
    """
    Returns:
        A dictionary of all possible policy phases
    """
    return {"unknown": 0}

get_info ¶

get_info() -> dict

Get additional information from the policy. Called after episode ended. This method can be overridden by subclasses to provide extra information about the policy's state. Must be json serializable.

Returns:

Type	Description
`dict`	A dictionary containing additional information about the policy.

Source code in molmo_spaces/policy/base_policy.py

def get_info(self) -> dict:
    """
    Get additional information from the policy. Called after episode ended. This method can be
    overridden by subclasses to provide extra information about the policy's state.
    Must be json serializable.

    Returns:
        A dictionary containing additional information about the policy.
    """
    return {}

get_phase ¶

get_phase() -> str

Returns:

Type	Description
`str`	The policy phase

Source code in molmo_spaces/policy/base_policy.py

def get_phase(self) -> str:
    """
    Returns:
        The policy phase
    """
    return "unknown"

reset ¶

reset()

Reset the policy state. No state to reset for DummyPolicy.

Source code in molmo_spaces/policy/dummy_policy.py

def reset(self):
    """
    Reset the policy state. No state to reset for DummyPolicy.
    """
    pass

learned_policy ¶

Modules:

Name	Description
`bimanual_yam_pi_policy`	Bimanual YAM Pi0.5 Policy using LeRobot gRPC inference.
`cap_policy`
`dreamzero_policy`
`keyboard_policy`
`lerobot_grpc_client`	LeRobot gRPC client for remote policy inference.
`phone_policy`
`pi_policy`
`rum_client`
`spacemouse_policy`
`utils`
`websocket_policy`

bimanual_yam_pi_policy ¶

Bimanual YAM Pi0.5 Policy using LeRobot gRPC inference.

This policy connects to a LeRobot async inference server to run Pi0.5 for bimanual manipulation with the YAM robot.

Classes:

Name	Description
`BimanualYamPiPolicy`	Policy for bimanual YAM robot using Pi0.5 via LeRobot gRPC server.

Attributes:

Name	Type	Description
`log`

log `module-attribute` ¶

log = getLogger(__name__)

BimanualYamPiPolicy ¶

BimanualYamPiPolicy(exp_config: MlSpacesExpConfig)

Bases: InferencePolicy

Policy for bimanual YAM robot using Pi0.5 via LeRobot gRPC server.

This policy: - Connects to a LeRobot async inference server - Sends observations (3 cameras + 14-dim state + task prompt) - Receives 14-dim action chunks (left arm, left gripper, right arm, right gripper) - Buffers actions to avoid calling the model every step

Expected observation format from MuJoCo

"left_wrist_camera": RGB image from left wrist
"right_wrist_camera": RGB image from right wrist
"exo_camera": RGB image from top-down exo camera
"qpos": {"left_arm": (6,), "right_arm": (6,), "left_gripper": (N,), "right_gripper": (N,)}

Action format to robot

"left_arm": (6,) joint positions
"right_arm": (6,) joint positions
"left_gripper": (1,) gripper command
"right_gripper": (1,) gripper command

Initialize the policy.

Parameters:

Name	Type	Description	Default
`exp_config`	`MlSpacesExpConfig`	Experiment configuration containing policy config	required

Methods:

Name	Description
`__del__`	Cleanup on deletion.
`add_auxiliary_objects`	Add auxiliary objects to the scene that might be required for the policy.
`get_action`
`get_all_phases`	Returns:
`get_info`	Get policy information for logging.
`get_phase`	Returns:
`inference_model`	Run inference with action buffering.
`model_output_to_action`	Convert model output to bimanual robot action dictionary.
`obs_to_model_input`	Transform MuJoCo observations to raw format for LeRobot's build_dataset_frame.
`prepare_model`	Connect to the LeRobot gRPC server and load the policy.
`render`	Display camera views for debugging.
`reset`	Reset the policy state for a new episode.

Attributes:

Name	Type	Description
`GRIPPER_MAX`
`STATE_NAMES`
`actions_buffer`	`list[ndarray] \| None`
`buffer_length`
`camera_mapping`
`checkpoint_path`
`client`	`LeRobotGRPCClient \| None`
`config`
`current_buffer_index`	`int`
`current_phase`
`grasping_type`
`remote_config`
`starting_time`	`float \| None`
`target_poses`
`task`
`task_type`

Source code in molmo_spaces/policy/learned_policy/bimanual_yam_pi_policy.py

def __init__(self, exp_config: MlSpacesExpConfig) -> None:
    """Initialize the policy.

    Args:
        exp_config: Experiment configuration containing policy config
    """
    super().__init__(exp_config, exp_config.task_type)

    policy_config = exp_config.policy_config
    self.remote_config = policy_config.remote_config
    self.checkpoint_path = policy_config.checkpoint_path
    self.grasping_type = getattr(policy_config, "grasping_type", "binary")
    self.buffer_length = 50

    # Camera mapping: MuJoCo camera name -> LeRobot feature key
    self.camera_mapping = getattr(
        policy_config,
        "camera_mapping",
        {
            "left_wrist_camera": "observation.images.left",
            "right_wrist_camera": "observation.images.right",
            "exo_camera": "observation.images.top",
        },
    )

    # Will be initialized in prepare_model()
    self.client: LeRobotGRPCClient | None = None
    self.actions_buffer: list[np.ndarray] | None = None
    self.current_buffer_index: int = 0
    self.starting_time: float | None = None

GRIPPER_MAX `class-attribute` `instance-attribute` ¶

GRIPPER_MAX = 0.041

STATE_NAMES `class-attribute` `instance-attribute` ¶

STATE_NAMES = ['left_joint_0.pos', 'left_joint_1.pos', 'left_joint_2.pos', 'left_joint_3.pos', 'left_joint_4.pos', 'left_joint_5.pos', 'left_gripper.pos', 'right_joint_0.pos', 'right_joint_1.pos', 'right_joint_2.pos', 'right_joint_3.pos', 'right_joint_4.pos', 'right_joint_5.pos', 'right_gripper.pos']

actions_buffer `instance-attribute` ¶

actions_buffer: list[ndarray] | None = None

buffer_length `instance-attribute` ¶

buffer_length = 50

camera_mapping `instance-attribute` ¶

camera_mapping = getattr(policy_config, 'camera_mapping', {'left_wrist_camera': 'observation.images.left', 'right_wrist_camera': 'observation.images.right', 'exo_camera': 'observation.images.top'})

checkpoint_path `instance-attribute` ¶

checkpoint_path = checkpoint_path

client `instance-attribute` ¶

client: LeRobotGRPCClient | None = None

config `instance-attribute` ¶

config = config

current_buffer_index `instance-attribute` ¶

current_buffer_index: int = 0

current_phase `instance-attribute` ¶

current_phase = NONE_PHASE

grasping_type `instance-attribute` ¶

grasping_type = getattr(policy_config, 'grasping_type', 'binary')

remote_config `instance-attribute` ¶

remote_config = remote_config

starting_time `instance-attribute` ¶

starting_time: float | None = None

target_poses `instance-attribute` ¶

target_poses = {'grasp': eye(4)}

task `instance-attribute` ¶

task = task

task_type `instance-attribute` ¶

task_type = task_type

del ¶

__del__()

Cleanup on deletion.

Source code in molmo_spaces/policy/learned_policy/bimanual_yam_pi_policy.py

def __del__(self):
    """Cleanup on deletion."""
    if hasattr(self, "client") and self.client is not None:
        self.client.close()

add_auxiliary_objects `staticmethod` ¶

add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py

@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None

get_action ¶

get_action(observation)

Source code in molmo_spaces/policy/base_policy.py

def get_action(self, observation):
    model_input = self.obs_to_model_input(observation)
    model_output = self.inference_model(model_input)
    action = self.model_output_to_action(model_output)
    return action

get_all_phases ¶

get_all_phases() -> dict[str | int]

Returns:

Type	Description
`dict[str \| int]`	A dictionary of all possible policy phases

Source code in molmo_spaces/policy/base_policy.py

def get_all_phases(self) -> dict[str | int]:
    """
    Returns:
        A dictionary of all possible policy phases
    """
    return {"unknown": 0}

get_info ¶

get_info() -> dict

Get policy information for logging.

Returns:

Type	Description
`dict`	Dictionary with policy metadata

Source code in molmo_spaces/policy/learned_policy/bimanual_yam_pi_policy.py

def get_info(self) -> dict:
    """Get policy information for logging.

    Returns:
        Dictionary with policy metadata
    """
    info = super().get_info()
    info["policy_name"] = "bimanual_yam_pi"
    info["policy_checkpoint"] = self.model_name
    info["policy_buffer_length"] = self.buffer_length
    info["policy_grasping_type"] = self.grasping_type
    info["prompt"] = self.task.get_task_description()
    info["time_spent"] = time.time() - self.starting_time if self.starting_time else None
    info["timestamp"] = time.time()
    return info

get_phase ¶

get_phase() -> str

Returns:

Type	Description
`str`	The policy phase

Source code in molmo_spaces/policy/base_policy.py

def get_phase(self) -> str:
    """
    Returns:
        The policy phase
    """
    return "unknown"

inference_model ¶

inference_model(model_input: dict) -> ndarray

Run inference with action buffering.

Only calls the remote model when the buffer is empty or exhausted.

Parameters:

Name	Type	Description	Default
`model_input`	`dict`	Preprocessed observation in LeRobot format	required

Returns:

Type	Description
`ndarray`	Single action from the buffer

Source code in molmo_spaces/policy/learned_policy/bimanual_yam_pi_policy.py

def inference_model(self, model_input: dict) -> np.ndarray:
    """Run inference with action buffering.

    Only calls the remote model when the buffer is empty or exhausted.

    Args:
        model_input: Preprocessed observation in LeRobot format

    Returns:
        Single action from the buffer
    """
    if self.starting_time is None:
        self.starting_time = time.time()

    # Refill buffer if needed
    if self.actions_buffer is None or self.current_buffer_index >= len(self.actions_buffer):
        # Debug: check for NaN/inf in observation before sending to server
        for key, value in model_input.items():
            if isinstance(value, np.ndarray):
                if np.isnan(value).any():
                    log.error(f"NaN detected in observation key '{key}'")
                if np.isinf(value).any():
                    log.error(f"Inf detected in observation key '{key}'")
            elif isinstance(value, float) and (np.isnan(value) or np.isinf(value)):
                log.error(f"NaN/Inf detected in observation key '{key}': {value}")

        self.actions_buffer = self.client.infer(model_input)
        log.info(f"Got action buffer as output from inference: {self.actions_buffer}")
        self.current_buffer_index = 0

        if not self.actions_buffer:
            log.warning("Received empty action buffer from server")
            # Return zeros as fallback
            return np.zeros(14, dtype=np.float32)

    # Get next action from buffer
    action = self.actions_buffer[self.current_buffer_index]
    self.current_buffer_index += 1

    return action

model_output_to_action ¶

model_output_to_action(model_output: ndarray) -> dict

Convert model output to bimanual robot action dictionary.

Model output format (14 dims): [0:6] - left arm joint positions [6] - left gripper [7:13] - right arm joint positions [13] - right gripper

Parameters:

Name	Type	Description	Default
`model_output`	`ndarray`	14-dim action array from model	required

Returns:

Type	Description
`dict`	Dictionary with keys: left_arm, right_arm, left_gripper, right_gripper

Source code in molmo_spaces/policy/learned_policy/bimanual_yam_pi_policy.py

def model_output_to_action(self, model_output: np.ndarray) -> dict:
    """Convert model output to bimanual robot action dictionary.

    Model output format (14 dims):
        [0:6]  - left arm joint positions
        [6]    - left gripper
        [7:13] - right arm joint positions
        [13]   - right gripper

    Args:
        model_output: 14-dim action array from model

    Returns:
        Dictionary with keys: left_arm, right_arm, left_gripper, right_gripper
    """
    # Extract components from model output
    left_arm = model_output[0:6].astype(np.float32)
    left_gripper_val = float(model_output[6])
    right_arm = model_output[7:13].astype(np.float32)
    right_gripper_val = float(model_output[13])

    action = {
        "left_arm": left_arm,
        "right_arm": right_arm,
        "left_gripper": self._scale_gripper_action(left_gripper_val),
        "right_gripper": self._scale_gripper_action(right_gripper_val),
    }

    log.info(
        f"[ACTION] step={self.current_buffer_index} "
        f"left_arm={np.array2string(left_arm, precision=3, separator=', ')} "
        f"left_grip={left_gripper_val:.3f} "
        f"right_arm={np.array2string(right_arm, precision=3, separator=', ')} "
        f"right_grip={right_gripper_val:.3f}"
    )

    return action

obs_to_model_input ¶

obs_to_model_input(obs: dict) -> dict

Transform MuJoCo observations to raw format for LeRobot's build_dataset_frame.

The server's build_dataset_frame expects: - Individual state values with keys matching STATE_NAMES - Image keys without "observation.images." prefix (e.g., "left", "right", "top") - "task" key for the text prompt

Parameters:

Name	Type	Description	Default
`obs`	`dict`	MuJoCo observation dictionary with cameras and joint states	required

Returns:

Type	Description
`dict`	Dictionary in raw format that build_dataset_frame will transform

Source code in molmo_spaces/policy/learned_policy/bimanual_yam_pi_policy.py

def obs_to_model_input(self, obs: dict) -> dict:
    """Transform MuJoCo observations to raw format for LeRobot's build_dataset_frame.

    The server's build_dataset_frame expects:
    - Individual state values with keys matching STATE_NAMES
    - Image keys without "observation.images." prefix (e.g., "left", "right", "top")
    - "task" key for the text prompt

    Args:
        obs: MuJoCo observation dictionary with cameras and joint states

    Returns:
        Dictionary in raw format that build_dataset_frame will transform
    """
    model_input = {}

    # Process camera images - use short keys (build_dataset_frame strips "observation.images." prefix)
    camera_short_keys = {
        "left_wrist_camera": "left",
        "right_wrist_camera": "right",
        "exo_camera": "top",
    }
    for mj_cam, short_key in camera_short_keys.items():
        if mj_cam in obs:
            model_input[short_key] = obs[mj_cam]

    # Build state values as individual keys matching STATE_NAMES
    # Order: [left_arm(6), left_gripper(1), right_arm(6), right_gripper(1)]
    qpos = obs.get("qpos", {})

    left_arm = np.array(qpos.get("left_arm", np.zeros(6)))[:6]
    right_arm = np.array(qpos.get("right_arm", np.zeros(6)))[:6]

    left_grip = self._normalize_gripper(qpos.get("left_gripper", np.array([0.0])))
    right_grip = self._normalize_gripper(qpos.get("right_gripper", np.array([0.0])))

    # Combine into 14-dim array
    state_values = np.concatenate(
        [
            left_arm,  # [0:6]
            np.array([left_grip]),  # [6]
            right_arm,  # [7:13]
            np.array([right_grip]),  # [13]
        ]
    ).astype(np.float32)

    # Add individual state values with keys matching STATE_NAMES
    for i, name in enumerate(self.STATE_NAMES):
        model_input[name] = float(state_values[i])

    # Add task prompt
    prompt = self.task.get_task_description()
    model_input["task"] = prompt

    return model_input

prepare_model ¶

prepare_model() -> None

Connect to the LeRobot gRPC server and load the policy.

Source code in molmo_spaces/policy/learned_policy/bimanual_yam_pi_policy.py

def prepare_model(self) -> None:
    """Connect to the LeRobot gRPC server and load the policy."""
    self.model_name = os.path.basename(self.checkpoint_path)

    host = self.remote_config.get("host", "localhost")
    port = self.remote_config.get("port", 8080)

    self.client = LeRobotGRPCClient(host=host, port=port)

    policy_type = self.remote_config.get("policy_type", "pi05")
    device = self.remote_config.get("device", "cuda")

    # Define lerobot_features matching the training dataset format (Jiafei1224/c)
    # These features tell build_dataset_frame how to transform raw observations
    lerobot_features = {
        "observation.state": {
            "dtype": "float32",
            "shape": [14],
            "names": self.STATE_NAMES,
        },
        "observation.images.left": {
            "dtype": "video",
            "shape": [360, 640, 3],
            "names": ["height", "width", "channels"],
        },
        "observation.images.right": {
            "dtype": "video",
            "shape": [360, 640, 3],
            "names": ["height", "width", "channels"],
        },
        "observation.images.top": {
            "dtype": "video",
            "shape": [360, 640, 3],
            "names": ["height", "width", "channels"],
        },
    }

    self.client.connect(
        pretrained_name_or_path=self.checkpoint_path,
        policy_type=policy_type,
        device=device,
        actions_per_chunk=self.buffer_length,
        lerobot_features=lerobot_features,
    )

    log.info(f"BimanualYamPiPolicy connected to {host}:{port}")

render ¶

render(obs: dict) -> None

Display camera views for debugging.

Parameters:

Name	Type	Description	Default
`obs`	`dict`	Observation dictionary with camera images	required

Source code in molmo_spaces/policy/learned_policy/bimanual_yam_pi_policy.py

def render(self, obs: dict) -> None:
    """Display camera views for debugging.

    Args:
        obs: Observation dictionary with camera images
    """
    images = []
    for mj_cam in self.camera_mapping:
        if mj_cam in obs:
            images.append(obs[mj_cam])

    if images:
        combined = np.concatenate(images, axis=1)
        cv2.imshow("BimanualYam Views", cv2.cvtColor(combined, cv2.COLOR_RGB2BGR))
        cv2.waitKey(1)

reset ¶

reset() -> None

Reset the policy state for a new episode.

Source code in molmo_spaces/policy/learned_policy/bimanual_yam_pi_policy.py

def reset(self) -> None:
    """Reset the policy state for a new episode."""
    self.actions_buffer = None
    self.current_buffer_index = 0
    self.starting_time = None

    # Reset client timestep counter
    if self.client is not None:
        self.client.reset()

cap_policy ¶

Classes:

Name	Description
`CAP_Policy`

Functions:

Name	Description
`action_tensor_to_matrix`

Attributes:

Name	Type	Description
`log`

log `module-attribute` ¶

log = getLogger(__name__)

CAP_Policy ¶

CAP_Policy(exp_config: MlSpacesExpConfig, task_type: str)

Bases: InferencePolicy

Methods:

Name	Description
`add_auxiliary_objects`	Add auxiliary objects to the scene that might be required for the policy.
`get_action`
`get_all_phases`	Returns:
`get_info`
`get_phase`	Returns:
`inference_model`
`model_output_to_action`
`obs_to_model_input`
`prepare_model`
`render`
`reset`

Attributes:

Name	Type	Description
`config`
`current_phase`
`grasping_threshold`
`grasping_type`
`model`
`remote_config`
`target_poses`
`task`
`task_type`
`use_exo`
`use_vlm`

Source code in molmo_spaces/policy/learned_policy/cap_policy.py

def __init__(
    self,
    exp_config: MlSpacesExpConfig,
    task_type: str,
) -> None:
    super().__init__(exp_config, exp_config.task_type)
    self.task_type = task_type
    self.remote_config = exp_config.policy_config.remote_config
    self.grasping_type = exp_config.policy_config.grasping_type
    self.grasping_threshold = exp_config.policy_config.grasping_threshold
    self.use_vlm = exp_config.policy_config.use_vlm
    self.use_exo = exp_config.policy_config.exo_vlm
    self.model = None

config `instance-attribute` ¶

config = config

current_phase `instance-attribute` ¶

current_phase = NONE_PHASE

grasping_threshold `instance-attribute` ¶

grasping_threshold = grasping_threshold

grasping_type `instance-attribute` ¶

grasping_type = grasping_type

model `instance-attribute` ¶

model = None

remote_config `instance-attribute` ¶

remote_config = remote_config

target_poses `instance-attribute` ¶

target_poses = {'grasp': eye(4)}

task `instance-attribute` ¶

task = task

task_type `instance-attribute` ¶

task_type = task_type

use_exo `instance-attribute` ¶

use_exo = exo_vlm

use_vlm `instance-attribute` ¶

use_vlm = use_vlm

add_auxiliary_objects `staticmethod` ¶

add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py

@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None

get_action ¶

get_action(observation)

Source code in molmo_spaces/policy/base_policy.py

def get_action(self, observation):
    model_input = self.obs_to_model_input(observation)
    model_output = self.inference_model(model_input)
    action = self.model_output_to_action(model_output)
    return action

get_all_phases ¶

get_all_phases() -> dict[str | int]

Returns:

Type	Description
`dict[str \| int]`	A dictionary of all possible policy phases

Source code in molmo_spaces/policy/base_policy.py

def get_all_phases(self) -> dict[str | int]:
    """
    Returns:
        A dictionary of all possible policy phases
    """
    return {"unknown": 0}

get_info ¶

get_info() -> dict

Source code in molmo_spaces/policy/learned_policy/cap_policy.py

def get_info(self) -> dict:
    info = super().get_info()

    info["policy_checkpoint"] = self.model_name
    info["policy_grasping_threshold"] = self.grasping_threshold
    info["policy_grasping_type"] = self.grasping_type
    info["time_spent"] = time.time() - self.starting_time if self.starting_time else None
    info["timestamp"] = time.time()
    return info

get_phase ¶

get_phase() -> str

Returns:

Type	Description
`str`	The policy phase

Source code in molmo_spaces/policy/base_policy.py

def get_phase(self) -> str:
    """
    Returns:
        The policy phase
    """
    return "unknown"

inference_model ¶

inference_model(model_input)

Source code in molmo_spaces/policy/learned_policy/cap_policy.py

def inference_model(self, model_input):
    self.step_counter += 1

    model_output = self.model.infer(model_input)
    model_output[0][:3] = np.array(
        [-model_output[0][0], -model_output[0][2], -model_output[0][1]]
    )
    model_output[0][3:6] = np.array(
        [-model_output[0][3], -model_output[0][5], -model_output[0][4]]
    )
    self.is_grasping = max(self.is_grasping, model_output[0][6] < self.grasping_threshold)
    delta_pose_mat = action_tensor_to_matrix(model_output[0][:6], "euler")
    T_world_ego = self.T_world_camera @ delta_pose_mat
    self.T_world_rum = T_world_ego

    goal_pose_7d = np.array(
        list(self.T_world_rum[:3, 3])
        + list(R.from_matrix(self.T_world_rum[:3, :3]).as_quat(scalar_first=True))
    )

    goal_pose_homogeneous = np.eye(4)
    goal_pose_homogeneous[:3, 3] = goal_pose_7d[:3]
    goal_pose_homogeneous[:3, :3] = R.from_quat(
        goal_pose_7d[3:7], scalar_first=True
    ).as_matrix()

    kinematics = self.task.env.current_robot.kinematics
    robot_view = self.task.env.current_robot.robot_view
    gripper_mgs = set(robot_view.get_gripper_movegroup_ids())
    mgs_except_gripper = [x for x in robot_view.move_group_ids() if x not in gripper_mgs]
    new_pose = goal_pose_homogeneous.copy()

    jp = kinematics.ik(
        "arm",
        new_pose,
        mgs_except_gripper,
        robot_view.get_qpos_dict(),
        robot_view.base.pose,
        rel_to_base=False,
    )
    action = robot_view.get_ctrl_dict()
    if jp is not None:
        action.update({mg_id: jp[mg_id] for mg_id in mgs_except_gripper})

    if self.grasping_type == "binary":
        if self.is_grasping:
            action["gripper"] = np.array([-255.0])
        else:
            action["gripper"] = np.array([0.0])
    else:
        action["gripper"] = (1 - model_output[0][6]) * np.array([-255.0])
        if self.is_grasping:
            action["gripper"] = np.array([-255.0])
    return action

model_output_to_action ¶

model_output_to_action(model_output)

Source code in molmo_spaces/policy/learned_policy/cap_policy.py

def model_output_to_action(self, model_output):
    return model_output

obs_to_model_input ¶

obs_to_model_input(obs)

Source code in molmo_spaces/policy/learned_policy/cap_policy.py

def obs_to_model_input(self, obs):
    if self.model is None:
        self.prepare_model()

    if hasattr(self, "T_world_object") is False or self.T_world_object is None:
        if not self.use_vlm:
            T_base_object = obs["object_poses"][self.task.config.task_config.pickup_obj_name]
            T_world_base = np.eye(4)
            T_world_base[:3, 3] = obs["robot_base_pose"][:3]
            T_world_base[:3, :3] = R.from_quat(
                obs["robot_base_pose"][3:7], scalar_first=True
            ).as_matrix()
            self.T_world_object = T_world_base @ T_base_object
        else:
            exo_depth = obs["exo_camera_1_depth"]
            exo_rgb = obs["exo_camera_1"]
            ego_depth = obs["wrist_camera_depth"]
            ego_rgb = obs["wrist_camera"]
            point_norm = self.model.infer_point(
                rgb=exo_rgb if self.use_exo else ego_rgb,
                object_name=self.task.config.task_config.referral_expressions[
                    "pickup_obj_name"
                ],
                task=self.task_type,
            )
            if self.use_exo:
                K = np.array(obs["sensor_param_exo_camera_1"]["intrinsic_cv"])
            else:
                K = np.array(obs["sensor_param_wrist_camera"]["intrinsic_cv"])
            fovy = 2 * np.arctan((2 * K[1, 2]) / (2 * K[1, 1]))
            x_norm, y_norm = point_norm
            width, height = (
                (exo_rgb.shape[1], exo_rgb.shape[0])
                if self.use_exo
                else (ego_rgb.shape[1], ego_rgb.shape[0])
            )
            x = int(x_norm * width)
            y = int(y_norm * height)
            depth_value = exo_depth[y, x] + 0.03 if self.use_exo else ego_depth[y, x] + 0.03
            f = height / (2 * np.tan(fovy / 2))
            cam_mat = np.array([[f, 0, width / 2], [0, f, height / 2], [0, 0, 1]])
            cx = cam_mat[0, 2]
            cy = cam_mat[1, 2]
            fx = cam_mat[0, 0]
            fy = cam_mat[1, 1]
            z_cam = -depth_value
            x_cam = -(x - cx) * z_cam / fx
            y_cam = -(cy - y) * z_cam / fy
            p_cam = np.array([x_cam, y_cam, z_cam])
            T_corr = np.eye(4)
            T_corr[:3, :3] = np.diag([1, -1, -1])
            if self.use_exo:
                camera_pose = (
                    np.array(obs["sensor_param_exo_camera_1"]["cam2world_gl"].copy()) @ T_corr
                )
            else:
                camera_pose = (
                    np.array(obs["sensor_param_wrist_camera"]["cam2world_gl"].copy()) @ T_corr
                )
            p_world = camera_pose[:3, :3] @ p_cam + camera_pose[:3, 3]
            self.T_world_object = np.eye(4)
            self.T_world_object[:3, 3] = p_world

    T_base_ego = np.eye(4)
    T_base_ego[:3, 3] = obs["tcp_pose"][:3]
    T_base_ego[:3, :3] = R.from_quat(obs["tcp_pose"][3:7], scalar_first=True).as_matrix()

    T_world_base = np.eye(4)
    T_world_base[:3, 3] = obs["robot_base_pose"][:3]
    T_world_base[:3, :3] = R.from_quat(
        obs["robot_base_pose"][3:7], scalar_first=True
    ).as_matrix()

    T_world_ego = T_world_base @ T_base_ego
    self.T_world_camera = T_world_ego
    T_camera_object = np.linalg.inv(T_world_ego) @ self.T_world_object
    object_3d_position = T_camera_object[:3, 3]
    object_3d_position = np.array(
        [-T_camera_object[0, 3], -T_camera_object[2, 3], -T_camera_object[1, 3]]
    )

    if self.is_grasping:
        object_3d_position = np.array([0.00, 0.18, 0.04])
    return {
        "rgb_ego": cv2.resize(obs["wrist_camera"], (224, 224)),
        "object_3d_position": object_3d_position,
    }

prepare_model ¶

prepare_model()

Source code in molmo_spaces/policy/learned_policy/cap_policy.py

def prepare_model(self):
    host = "localhost"
    port = 8765

    max_retries = 3
    for attempt in range(max_retries):
        try:
            self.model = RUMClient(host=host, port=port)
            metadata = self.model.get_server_metadata()
            self.model_name = metadata.get("checkpoint", "rum")
            log.info(f"Connected to RUM server at {host}:{port}")
            break
        except Exception as e:
            if attempt < max_retries - 1:
                log.warning(f"Connection attempt {attempt + 1} failed: {e}. Retrying...")
                time.sleep(1)
            else:
                log.error(f"Failed to connect to RUM server after {max_retries} attempts")
                raise

render ¶

render(obs)

Source code in molmo_spaces/policy/learned_policy/cap_policy.py

def render(self, obs):
    views = np.concatenate([obs["wrist_camera"], obs["exo_camera_1"]], axis=1)
    cv2.imshow("views", cv2.cvtColor(views, cv2.COLOR_RGB2BGR))
    cv2.waitKey(1)

reset ¶

reset()

Source code in molmo_spaces/policy/learned_policy/cap_policy.py

def reset(self):
    self.starting_time = None
    self.T_world_object = None
    self.T_world_camera = None
    self.T_world_rum = None
    self.is_grasping = False
    self.step_counter = 0

action_tensor_to_matrix ¶

action_tensor_to_matrix(action_tensor, rot_unit)

Source code in molmo_spaces/policy/learned_policy/cap_policy.py

def action_tensor_to_matrix(action_tensor, rot_unit):
    affine = np.eye(4)
    if rot_unit == "euler":
        r = R.from_euler("xyz", action_tensor[3:6], degrees=False)
    elif rot_unit == "axis":
        r = R.from_rotvec(action_tensor[3:6])
    else:
        raise NotImplementedError
    affine[:3, :3] = r.as_matrix()
    affine[:3, -1] = action_tensor[:3]
    return affine

dreamzero_policy ¶

Classes:

Name	Description
`DreamZeroWebsocketClient`	Websocket client that adds endpoint field for DreamZero server.
`DreamZero_Policy`

Attributes:

Name	Type	Description
`PING_INTERVAL_SECS`
`PING_TIMEOUT_SECS`
`log`

PING_INTERVAL_SECS `module-attribute` ¶

PING_INTERVAL_SECS = 60

PING_TIMEOUT_SECS `module-attribute` ¶

PING_TIMEOUT_SECS = 600

log `module-attribute` ¶

log = getLogger(__name__)

DreamZeroWebsocketClient ¶

DreamZeroWebsocketClient(host: str = '0.0.0.0', port: int = 8000)

Websocket client that adds endpoint field for DreamZero server.

Methods:

Name	Description
`infer`
`reset`

Source code in molmo_spaces/policy/learned_policy/dreamzero_policy.py

def __init__(self, host: str = "0.0.0.0", port: int = 8000) -> None:
    self._uri = f"ws://{host}:{port}"
    self._packer = msgpack_numpy.Packer()
    self._ws, self._server_metadata = self._wait_for_server()
    # store the URI that actually worked so reconnects reuse it
    self._connected_uri = self._uri

infer ¶

infer(obs: dict) -> dict

Source code in molmo_spaces/policy/learned_policy/dreamzero_policy.py

def infer(self, obs: dict) -> dict:
    obs["endpoint"] = "infer"
    data = self._packer.pack(obs)
    try:
        self._ws.send(data)
        response = self._ws.recv()
    except websockets.exceptions.ConnectionClosedError:
        logging.warning("ConnectionClosedError during infer. Reconnecting and retrying...")
        self._reconnect()
        self._ws.send(data)
        response = self._ws.recv()
    if isinstance(response, str):
        raise RuntimeError(f"Error in inference server:\n{response}")
    return msgpack_numpy.unpackb(response)

reset ¶

reset(reset_info: dict = None) -> None

Source code in molmo_spaces/policy/learned_policy/dreamzero_policy.py

def reset(self, reset_info: dict = None) -> None:
    if reset_info is None:
        reset_info = {}
    reset_info["endpoint"] = "reset"
    data = self._packer.pack(reset_info)
    try:
        self._ws.send(data)
        response = self._ws.recv()
    except websockets.exceptions.ConnectionClosedError:
        logging.warning("ConnectionClosedError during reset. Reconnecting and retrying...")
        self._reconnect()
        self._ws.send(data)
        response = self._ws.recv()
    return response

DreamZero_Policy ¶

DreamZero_Policy(exp_config: MlSpacesExpConfig, task_type: str)

Bases: InferencePolicy

Methods:

Name	Description
`add_auxiliary_objects`	Add auxiliary objects to the scene that might be required for the policy.
`get_action`
`get_all_phases`	Returns:
`get_info`
`get_phase`	Returns:
`inference_model`
`model_output_to_action`
`obs_to_model_input`
`prepare_model`
`render`
`reset`

Attributes:

Name	Type	Description
`checkpoint_path`
`chunk_size`
`config`
`current_phase`
`grasping_threshold`
`grasping_type`
`model`
`remote_config`
`session_id`
`target_poses`
`task`
`task_type`

Source code in molmo_spaces/policy/learned_policy/dreamzero_policy.py

def __init__(
    self,
    exp_config: MlSpacesExpConfig,
    task_type: str,
) -> None:
    super().__init__(exp_config, exp_config.task_type)
    self.remote_config = exp_config.policy_config.remote_config
    self.checkpoint_path = exp_config.policy_config.checkpoint_path
    self.grasping_type = exp_config.policy_config.grasping_type
    self.chunk_size = exp_config.policy_config.chunk_size
    self.grasping_threshold = exp_config.policy_config.grasping_threshold
    self.model = None
    self.session_id = None

checkpoint_path `instance-attribute` ¶

checkpoint_path = checkpoint_path

chunk_size `instance-attribute` ¶

chunk_size = chunk_size

config `instance-attribute` ¶

config = config

current_phase `instance-attribute` ¶

current_phase = NONE_PHASE

grasping_threshold `instance-attribute` ¶

grasping_threshold = grasping_threshold

grasping_type `instance-attribute` ¶

grasping_type = grasping_type

model `instance-attribute` ¶

model = None

remote_config `instance-attribute` ¶

remote_config = remote_config

session_id `instance-attribute` ¶

session_id = None

target_poses `instance-attribute` ¶

target_poses = {'grasp': eye(4)}

task `instance-attribute` ¶

task = task

task_type `instance-attribute` ¶

task_type = task_type

add_auxiliary_objects `staticmethod` ¶

add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py

@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None

get_action ¶

get_action(observation)

Source code in molmo_spaces/policy/base_policy.py

def get_action(self, observation):
    model_input = self.obs_to_model_input(observation)
    model_output = self.inference_model(model_input)
    action = self.model_output_to_action(model_output)
    return action

get_all_phases ¶

get_all_phases() -> dict[str | int]

Returns:

Type	Description
`dict[str \| int]`	A dictionary of all possible policy phases

Source code in molmo_spaces/policy/base_policy.py

def get_all_phases(self) -> dict[str | int]:
    """
    Returns:
        A dictionary of all possible policy phases
    """
    return {"unknown": 0}

get_info ¶

get_info() -> dict

Source code in molmo_spaces/policy/learned_policy/dreamzero_policy.py

def get_info(self) -> dict:
    info = super().get_info()
    info["policy_name"] = "dreamzero"
    info["policy_checkpoint"] = self.model_name
    info["policy_buffer_length"] = self.chunk_size
    info["policy_grasping_threshold"] = self.grasping_threshold
    info["policy_grasping_type"] = self.grasping_type
    info["prompt"] = self.task.get_task_description()
    info["session_id"] = self.session_id
    info["time_spent"] = time.time() - self.starting_time if self.starting_time else None
    info["timestamp"] = time.time()
    return info

get_phase ¶

get_phase() -> str

Returns:

Type	Description
`str`	The policy phase

Source code in molmo_spaces/policy/base_policy.py

def get_phase(self) -> str:
    """
    Returns:
        The policy phase
    """
    return "unknown"

inference_model ¶

inference_model(model_input)

Source code in molmo_spaces/policy/learned_policy/dreamzero_policy.py

def inference_model(self, model_input):
    if self.model is None:
        self.prepare_model()
    if self.starting_time is None:
        self.starting_time = time.time()
    if self.actions_buffer is None or self.current_buffer_index >= self.chunk_size:
        result = self.model.infer(model_input)
        self.actions_buffer = result["actions"]
        self.current_buffer_index = 0
    model_output = self.actions_buffer[self.current_buffer_index]
    self.current_buffer_index += 1
    return model_output

model_output_to_action ¶

model_output_to_action(model_output)

Source code in molmo_spaces/policy/learned_policy/dreamzero_policy.py

def model_output_to_action(self, model_output):
    gripper_pos = np.clip(model_output[7], 0.0, 1.0)

    if self.grasping_type == "binary":
        gripper_pos = (
            np.array([255.0]) if gripper_pos >= self.grasping_threshold else np.array([0.0])
        )
    elif self.grasping_type == "semi_binary":
        gripper_pos = (
            gripper_pos * np.array([255.0])
            if gripper_pos <= self.grasping_threshold
            else np.array([255.0])
        )
    elif self.grasping_type == "continuous":
        gripper_pos = gripper_pos * np.array([255.0])
    else:
        raise ValueError(f"Invalid grasping type: {self.grasping_type}")

    arm_output = model_output[:7].reshape(
        7,
    )
    action = {
        "arm": arm_output,
        "gripper": gripper_pos,
    }
    return action

obs_to_model_input ¶

obs_to_model_input(obs)

Source code in molmo_spaces/policy/learned_policy/dreamzero_policy.py

def obs_to_model_input(self, obs):
    # self.render(obs)
    prompt = self.task.get_task_description()
    grip = np.clip(obs["qpos"]["gripper"][0] / 0.824033, 0, 1)
    if grip < 0.1:
        grip = 0.00
    exo_camera_key = (
        "droid_shoulder_light_randomization"
        if "droid_shoulder_light_randomization" in obs
        else "exo_camera_1"
    )
    wrist_camera_key = (
        "wrist_camera_zed_mini" if "wrist_camera_zed_mini" in obs else "wrist_camera"
    )

    model_input = {
        "observation/exterior_image_0_left": resize_with_pad(obs[exo_camera_key], 180, 320),
        "observation/exterior_image_1_left": resize_with_pad(obs[exo_camera_key], 180, 320),
        "observation/wrist_image_left": resize_with_pad(obs[wrist_camera_key], 180, 320),
        "observation/joint_position": np.array(
            obs["qpos"]["arm"][:7], dtype=np.float64
        ).reshape(
            7,
        ),
        "observation/cartesian_position": np.zeros((6,), dtype=np.float64),
        "observation/gripper_position": np.array(grip, dtype=np.float64).reshape(
            1,
        ),
        "prompt": prompt,
        "session_id": self.session_id,
    }
    return model_input

prepare_model ¶

prepare_model()

Source code in molmo_spaces/policy/learned_policy/dreamzero_policy.py

def prepare_model(self):
    self.model_name = (
        os.path.basename(self.checkpoint_path) if self.checkpoint_path else "dreamzero"
    )
    if self.remote_config is not None:
        self._prepare_remote_model()
    else:
        raise NotImplementedError("DreamZero policy only supports remote model inference")

render ¶

render(obs)

Source code in molmo_spaces/policy/learned_policy/dreamzero_policy.py

def render(self, obs):
    views = np.concatenate([obs["wrist_camera"], obs["exo_camera_1"]], axis=1)
    cv2.imshow("views", cv2.cvtColor(views, cv2.COLOR_RGB2BGR))
    cv2.waitKey(1)

reset ¶

reset()

Source code in molmo_spaces/policy/learned_policy/dreamzero_policy.py

def reset(self):
    self.actions_buffer = None
    self.current_buffer_index = 0
    self.starting_time = None
    self.session_id = str(uuid.uuid4())

keyboard_policy ¶

Classes:

Name	Description
`Keyboard_Policy`

Attributes:

Name	Type	Description
`log`

log `module-attribute` ¶

log = getLogger(__name__)

Keyboard_Policy ¶

Keyboard_Policy(exp_config: MlSpacesExpConfig, task_type: str)

Bases: InferencePolicy

Methods:

Name	Description
`add_auxiliary_objects`	Add auxiliary objects to the scene that might be required for the policy.
`get_action`
`get_all_phases`	Returns:
`get_info`
`get_phase`	Returns:
`inference_model`
`model_output_to_action`
`obs_to_model_input`
`prepare_model`
`render`
`reset`

Attributes:

Name	Type	Description
`config`
`current_phase`
`robot_type`
`rot_step`
`step_size`
`target_poses`
`task`
`task_type`

Source code in molmo_spaces/policy/learned_policy/keyboard_policy.py

def __init__(
    self,
    exp_config: MlSpacesExpConfig,
    task_type: str,
) -> None:
    super().__init__(exp_config, task_type)
    from pynput import keyboard

    self.robot_type = exp_config.robot_config.name
    self.step_size = exp_config.policy_config.step_size
    self.rot_step = exp_config.policy_config.rot_step
    self._pressed = set()
    self._gripper_open = True
    self._listener = keyboard.Listener(
        on_press=self._on_press,
        on_release=self._on_release,
    )
    self._listener.start()
    log.info(
        "Keyboard policy started. "
        "w/s: up/down (z), arrows: x/y. "
        "a/d: yaw, e/r: pitch, z/c: roll. "
        "Space: toggle gripper. q: pause."
    )

config `instance-attribute` ¶

config = config

current_phase `instance-attribute` ¶

current_phase = NONE_PHASE

robot_type `instance-attribute` ¶

robot_type = name

rot_step `instance-attribute` ¶

rot_step = rot_step

step_size `instance-attribute` ¶

step_size = step_size

target_poses `instance-attribute` ¶

target_poses = {'grasp': eye(4)}

task `instance-attribute` ¶

task = task

task_type `instance-attribute` ¶

task_type = task_type

add_auxiliary_objects `staticmethod` ¶

add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py

@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None

get_action ¶

get_action(observation)

Source code in molmo_spaces/policy/base_policy.py

def get_action(self, observation):
    model_input = self.obs_to_model_input(observation)
    model_output = self.inference_model(model_input)
    action = self.model_output_to_action(model_output)
    return action

get_all_phases ¶

get_all_phases() -> dict[str | int]

Returns:

Type	Description
`dict[str \| int]`	A dictionary of all possible policy phases

Source code in molmo_spaces/policy/base_policy.py

def get_all_phases(self) -> dict[str | int]:
    """
    Returns:
        A dictionary of all possible policy phases
    """
    return {"unknown": 0}

get_info ¶

get_info() -> dict

Source code in molmo_spaces/policy/learned_policy/keyboard_policy.py

def get_info(self) -> dict:
    info = super().get_info()
    info["policy_name"] = "keyboard"
    info["timestamp"] = time.time()
    return info

get_phase ¶

get_phase() -> str

Returns:

Type	Description
`str`	The policy phase

Source code in molmo_spaces/policy/base_policy.py

def get_phase(self) -> str:
    """
    Returns:
        The policy phase
    """
    return "unknown"

inference_model ¶

inference_model(model_input)

Source code in molmo_spaces/policy/learned_policy/keyboard_policy.py

def inference_model(self, model_input):
    if self._is_paused():
        return None

    self.current_position += self.current_rotation @ self._get_delta_position()
    self.current_rotation = self._get_delta_rotation() @ self.current_rotation

    if self.robot_type == "floating_rum":
        goal_pose = self.init_robot_pose.copy()
        goal_pose[:3, 3] = self.current_position
        goal_pose[:3, :3] = self.current_rotation

        goal_pose_7d = np.array(
            list(goal_pose[:3, 3])
            + list(R.from_matrix(goal_pose[:3, :3]).as_quat(scalar_first=True))
        )
        return {
            "base": goal_pose_7d,
            "gripper": np.array([0.0]) if self._gripper_open else np.array([-255.0]),
        }
    else:
        kinematics = self.task.env.current_robot.kinematics
        robot_view = self.task.env.current_robot.robot_view

        gripper_mgs = set(robot_view.get_gripper_movegroup_ids())
        mgs_except_gripper = [x for x in robot_view.move_group_ids() if x not in gripper_mgs]

        new_pose = np.eye(4)
        new_pose[:3, 3] = self.current_position
        new_pose[:3, :3] = self.current_rotation

        jp = kinematics.ik(
            "arm",
            new_pose,
            mgs_except_gripper,
            robot_view.get_qpos_dict(),
            robot_view.base.pose,
            rel_to_base=True,
        )
        action = robot_view.get_ctrl_dict()
        if jp is not None:
            action.update({mg_id: jp[mg_id] for mg_id in mgs_except_gripper})

        action["gripper"] = np.array([0.0]) if self._gripper_open else np.array([255.0])
        return action

model_output_to_action ¶

model_output_to_action(model_output)

Source code in molmo_spaces/policy/learned_policy/keyboard_policy.py

def model_output_to_action(self, model_output):
    return model_output

obs_to_model_input ¶

obs_to_model_input(obs)

Source code in molmo_spaces/policy/learned_policy/keyboard_policy.py

def obs_to_model_input(self, obs):
    self.render(obs)

    robot_pose = obs["robot_base_pose"]
    T_world_robot = np.eye(4)
    T_world_robot[:3, :3] = R.from_quat(robot_pose[3:], scalar_first=True).as_matrix()
    T_world_robot[:3, 3] = robot_pose[:3]

    if self.robot_type == "floating_rum":
        if self.init_robot_pose is None:
            self.init_robot_pose = T_world_robot
            self.current_position = T_world_robot[:3, 3].copy()
            self.current_rotation = T_world_robot[:3, :3].copy()
        return {"T_world_robot": T_world_robot}

    tcp_pose = obs["tcp_pose"]
    qpos = obs["qpos"]["arm"]

    T_robot_tcp = np.eye(4)
    T_robot_tcp[:3, :3] = R.from_quat(tcp_pose[3:], scalar_first=True).as_matrix()
    T_robot_tcp[:3, 3] = tcp_pose[:3]

    if self.init_tcp_pose is None:
        self.init_tcp_pose = T_robot_tcp.copy()
        self.current_position = T_robot_tcp[:3, 3].copy()
        self.current_rotation = T_robot_tcp[:3, :3].copy()

    T_world_tcp = T_world_robot @ T_robot_tcp
    return {
        "qpos": qpos,
        "T_world_robot": T_world_robot,
        "T_world_tcp": T_world_tcp,
        "T_robot_tcp": T_robot_tcp,
    }

prepare_model `abstractmethod` ¶

prepare_model(model_name: str)

Source code in molmo_spaces/policy/base_policy.py

@abstractmethod
def prepare_model(self, model_name: str):
    raise NotImplementedError

render ¶

render(obs)

Source code in molmo_spaces/policy/learned_policy/keyboard_policy.py

def render(self, obs):
    if self.robot_type == "floating_rum":
        try:
            window_exists = cv2.getWindowProperty("views", cv2.WND_PROP_VISIBLE) >= 0
        except cv2.error:
            window_exists = False

        if not window_exists:
            screen_res = (1920, 1080)
            aspect_ratio = obs["wrist_camera"].shape[0] / obs["wrist_camera"].shape[1]
            new_width = int(screen_res[0] * 0.9)
            new_height = int(new_width * aspect_ratio)
            cv2.namedWindow("views", cv2.WINDOW_NORMAL)
            cv2.resizeWindow("views", new_width, new_height)
            cv2.moveWindow(
                "views", (screen_res[0] - new_width) // 2, (screen_res[1] - new_height) // 2
            )

        cv2.imshow("views", cv2.cvtColor(obs["wrist_camera"], cv2.COLOR_RGB2BGR))
        cv2.waitKey(1)
    else:
        views = np.concatenate(
            [np.flip(np.flip(obs["wrist_camera"], axis=0), axis=1), obs["exo_camera_1"]], axis=1
        )
        try:
            window_exists = cv2.getWindowProperty("views", cv2.WND_PROP_VISIBLE) >= 0
        except cv2.error:
            window_exists = False

        if not window_exists:
            screen_res = (1920, 1080)
            aspect_ratio = views.shape[0] / views.shape[1]
            new_width = int(screen_res[0] * 0.9)
            new_height = int(new_width * aspect_ratio)
            cv2.namedWindow("views", cv2.WINDOW_NORMAL)
            cv2.resizeWindow("views", new_width, new_height)
            cv2.moveWindow(
                "views", (screen_res[0] - new_width) // 2, (screen_res[1] - new_height) // 2
            )

        cv2.imshow("views", cv2.cvtColor(views, cv2.COLOR_RGB2BGR))
        cv2.waitKey(1)

reset ¶

reset()

Source code in molmo_spaces/policy/learned_policy/keyboard_policy.py

def reset(self):
    self.init_robot_pose = None
    self.init_tcp_pose = None
    self.current_position = None
    self.current_rotation = None
    self._gripper_open = True

lerobot_grpc_client ¶

LeRobot gRPC client for remote policy inference.

This client connects to a LeRobot async inference server and handles the gRPC protocol for sending observations and receiving actions.

Classes:

Name	Description
`LeRobotGRPCClient`	gRPC client for LeRobot async inference server.

Attributes:

Name	Type	Description
`log`

log `module-attribute` ¶

log = getLogger(__name__)

LeRobotGRPCClient ¶

LeRobotGRPCClient(host: str = 'localhost', port: int = 8080)

gRPC client for LeRobot async inference server.

This client handles the communication protocol with a LeRobot policy server, including connection setup, observation streaming, and action retrieval.

Initialize the gRPC client.

Parameters:

Name	Type	Description	Default
`host`	`str`	Server hostname	`'localhost'`
`port`	`int`	Server port	`8080`

Methods:

Name	Description
`close`	Close the gRPC channel.
`connect`	Connect to the server and initialize the policy.
`infer`	Send observation and receive action chunk.
`reset`	Reset the client state (timestep counter).

Attributes:

Name	Type	Description
`channel`
`connected`
`host`
`port`
`stub`
`timestep`

Source code in molmo_spaces/policy/learned_policy/lerobot_grpc_client.py

def __init__(self, host: str = "localhost", port: int = 8080):
    """Initialize the gRPC client.

    Args:
        host: Server hostname
        port: Server port
    """
    self.host = host
    self.port = port
    self.channel = None
    self.stub = None
    self.timestep = 0
    self.connected = False

    # Will be set during connect() - imported from lerobot for pickle compatibility
    self._services_pb2 = None
    self._services_pb2_grpc = None
    self._RemotePolicyConfig = None
    self._TimedObservation = None

channel `instance-attribute` ¶

channel = None

connected `instance-attribute` ¶

connected = False

host `instance-attribute` ¶

host = host

port `instance-attribute` ¶

port = port

stub `instance-attribute` ¶

stub = None

timestep `instance-attribute` ¶

timestep = 0

close ¶

close() -> None

Close the gRPC channel.

Source code in molmo_spaces/policy/learned_policy/lerobot_grpc_client.py

def close(self) -> None:
    """Close the gRPC channel."""
    if self.channel:
        self.channel.close()
        self.channel = None
        self.stub = None
    self.connected = False
    log.info("Disconnected from LeRobot server")

connect ¶

connect(pretrained_name_or_path: str, policy_type: str = 'pi05', device: str = 'cuda', actions_per_chunk: int = 50, lerobot_features: dict | None = None, rename_map: dict | None = None, max_retries: int = 5, retry_delay: float = 1.0) -> None

Connect to the server and initialize the policy.

Parameters:

Name	Type	Description	Default
`pretrained_name_or_path`	`str`	HuggingFace model ID or local path	required
`policy_type`	`str`	Type of policy (e.g., "pi05", "act", "smolvla")	`'pi05'`
`device`	`str`	Device to run inference on ("cuda", "cpu")	`'cuda'`
`actions_per_chunk`	`int`	Number of actions per inference call	`50`
`lerobot_features`	`dict \| None`	Feature definitions for observations	`None`
`rename_map`	`dict \| None`	Optional mapping to rename observation keys	`None`
`max_retries`	`int`	Number of connection retry attempts	`5`
`retry_delay`	`float`	Delay between retries in seconds	`1.0`

Source code in molmo_spaces/policy/learned_policy/lerobot_grpc_client.py

def connect(
    self,
    pretrained_name_or_path: str,
    policy_type: str = "pi05",
    device: str = "cuda",
    actions_per_chunk: int = 50,
    lerobot_features: dict | None = None,
    rename_map: dict | None = None,
    max_retries: int = 5,
    retry_delay: float = 1.0,
) -> None:
    """Connect to the server and initialize the policy.

    Args:
        pretrained_name_or_path: HuggingFace model ID or local path
        policy_type: Type of policy (e.g., "pi05", "act", "smolvla")
        device: Device to run inference on ("cuda", "cpu")
        actions_per_chunk: Number of actions per inference call
        lerobot_features: Feature definitions for observations
        rename_map: Optional mapping to rename observation keys
        max_retries: Number of connection retry attempts
        retry_delay: Delay between retries in seconds
    """
    # Import from lerobot to ensure pickle compatibility with server
    try:
        from lerobot.async_inference.helpers import RemotePolicyConfig, TimedObservation
        from lerobot.transport import services_pb2, services_pb2_grpc
    except ImportError as e:
        log.error(
            "lerobot package with transport module is required. "
            "Install it with: pip install -e '.[async]' from lerobot repo"
        )
        raise e

    self._services_pb2 = services_pb2
    self._services_pb2_grpc = services_pb2_grpc
    self._RemotePolicyConfig = RemotePolicyConfig
    self._TimedObservation = TimedObservation

    # Create gRPC channel with large message limits for images
    self.channel = grpc.insecure_channel(
        f"{self.host}:{self.port}",
        options=[
            ("grpc.max_send_message_length", 100 * 1024 * 1024),  # 100MB
            ("grpc.max_receive_message_length", 100 * 1024 * 1024),
        ],
    )
    self.stub = services_pb2_grpc.AsyncInferenceStub(self.channel)

    # Retry connection
    for attempt in range(max_retries):
        try:
            # Signal ready
            self.stub.Ready(services_pb2.Empty())

            # Build and send policy config using lerobot's class for pickle compatibility
            policy_config = self._RemotePolicyConfig(
                policy_type=policy_type,
                pretrained_name_or_path=pretrained_name_or_path,
                lerobot_features=lerobot_features or {},
                actions_per_chunk=actions_per_chunk,
                device=device,
                rename_map=rename_map or {},
            )

            config_bytes = pickle.dumps(policy_config)
            self.stub.SendPolicyInstructions(services_pb2.PolicySetup(data=config_bytes))

            self.connected = True
            log.info(f"Connected to LeRobot server at {self.host}:{self.port}")
            log.info(f"Loaded policy: {pretrained_name_or_path} ({policy_type})")
            return

        except grpc.RpcError as e:
            if attempt < max_retries - 1:
                log.warning(f"Connection attempt {attempt + 1} failed: {e}. Retrying...")
                time.sleep(retry_delay)
            else:
                log.error(f"Failed to connect after {max_retries} attempts")
                raise

infer ¶

infer(observation: dict[str, Any]) -> list[ndarray]

Send observation and receive action chunk.

Parameters:

Name	Type	Description	Default
`observation`	`dict[str, Any]`	Dictionary with observation data matching LeRobot format: - "observation.state": np.ndarray of robot state - "observation.images.": np.ndarray images (H, W, C) - "task": optional text prompt	required

Returns:

Type	Description
`list[ndarray]`	List of action arrays (one per timestep in the chunk)

Source code in molmo_spaces/policy/learned_policy/lerobot_grpc_client.py

def infer(self, observation: dict[str, Any]) -> list[np.ndarray]:
    """Send observation and receive action chunk.

    Args:
        observation: Dictionary with observation data matching LeRobot format:
            - "observation.state": np.ndarray of robot state
            - "observation.images.<camera>": np.ndarray images (H, W, C)
            - "task": optional text prompt

    Returns:
        List of action arrays (one per timestep in the chunk)
    """
    if not self.connected:
        raise RuntimeError("Not connected. Call connect() first.")

    services_pb2 = self._services_pb2

    # Wrap observation in TimedObservation using lerobot's class for pickle compatibility
    timed_obs = self._TimedObservation(
        timestamp=time.time(),
        timestep=self.timestep,
        observation=observation,
        must_go=True,  # Force processing
    )
    self.timestep += 1

    # Serialize and send
    obs_bytes = pickle.dumps(timed_obs)
    self.stub.SendObservations(self._chunk_observation(obs_bytes))

    # Get action chunk
    response = self.stub.GetActions(services_pb2.Empty())

    if response.data:
        timed_actions = pickle.loads(response.data)
        # Extract action tensors from TimedAction objects
        return [
            ta.action.numpy() if hasattr(ta.action, "numpy") else np.array(ta.action)
            for ta in timed_actions
        ]

    return []

reset ¶

reset() -> None

Reset the client state (timestep counter).

Source code in molmo_spaces/policy/learned_policy/lerobot_grpc_client.py

def reset(self) -> None:
    """Reset the client state (timestep counter)."""
    self.timestep = 0

phone_policy ¶

Classes:

Name	Description
`Phone_Policy`

Attributes:

Name	Type	Description
`log`

log `module-attribute` ¶

log = getLogger(__name__)

Phone_Policy ¶

Phone_Policy(exp_config: MlSpacesExpConfig, task_type: str)

Bases: InferencePolicy

Methods:

Name	Description
`add_auxiliary_objects`	Add auxiliary objects to the scene that might be required for the policy.
`get_action`
`get_all_phases`	Returns:
`get_info`
`get_phase`	Returns:
`inference_model`
`model_output_to_action`
`obs_to_model_input`
`prepare_model`
`render`
`reset`

Attributes:

Name	Type	Description
`config`
`current_phase`
`robot_type`
`session`
`target_poses`
`task`
`task_type`

Source code in molmo_spaces/policy/learned_policy/phone_policy.py

def __init__(
    self,
    exp_config: MlSpacesExpConfig,
    task_type: str,
) -> None:
    super().__init__(exp_config, task_type)
    self.robot_type = exp_config.robot_config.name
    self.session = teledex.Session()
    self.session.start()
    while self.session.get_latest_data()["position"] is None:
        time.sleep(0.1)
    log.info(f"Teledex session connected. Robot type: {self.robot_type}")

config `instance-attribute` ¶

config = config

current_phase `instance-attribute` ¶

current_phase = NONE_PHASE

robot_type `instance-attribute` ¶

robot_type = name

session `instance-attribute` ¶

session = Session()

target_poses `instance-attribute` ¶

target_poses = {'grasp': eye(4)}

task `instance-attribute` ¶

task = task

task_type `instance-attribute` ¶

task_type = task_type

add_auxiliary_objects `staticmethod` ¶

add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py

@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None

get_action ¶

get_action(observation)

Source code in molmo_spaces/policy/base_policy.py

def get_action(self, observation):
    model_input = self.obs_to_model_input(observation)
    model_output = self.inference_model(model_input)
    action = self.model_output_to_action(model_output)
    return action

get_all_phases ¶

get_all_phases() -> dict[str | int]

Returns:

Type	Description
`dict[str \| int]`	A dictionary of all possible policy phases

Source code in molmo_spaces/policy/base_policy.py

def get_all_phases(self) -> dict[str | int]:
    """
    Returns:
        A dictionary of all possible policy phases
    """
    return {"unknown": 0}

get_info ¶

get_info() -> dict

Source code in molmo_spaces/policy/learned_policy/phone_policy.py

def get_info(self) -> dict:
    info = super().get_info()
    info["policy_name"] = "mujoco_ar"
    info["timestamp"] = time.time()
    return info

get_phase ¶

get_phase() -> str

Returns:

Type	Description
`str`	The policy phase

Source code in molmo_spaces/policy/base_policy.py

def get_phase(self) -> str:
    """
    Returns:
        The policy phase
    """
    return "unknown"

inference_model ¶

inference_model(model_input)

Source code in molmo_spaces/policy/learned_policy/phone_policy.py

def inference_model(self, model_input):
    if self.session.get_latest_data()["button"]:
        return None
    if self.robot_type == "floating_rum":
        phone_pose = np.eye(4)
        phone_pose[:3, :3] = self.session.get_latest_data()["rotation"].reshape(3, 3)
        phone_pose[:3, 3] = self.session.get_latest_data()["position"].reshape(3)

        if self.init_phone_pose is None:
            self.init_phone_pose = phone_pose

        transform = np.eye(4)
        transform[:3, :3] = R.from_euler("xyz", [0, 0, 0], degrees=True).as_matrix()
        delta_phone_pose = (
            transform
            @ np.linalg.inv(self.init_phone_pose)
            @ phone_pose
            @ np.linalg.inv(transform)
        )
        delta_phone_pose[:3, 3] *= 3  # sensitivity

        goal_pose = self.init_robot_pose @ delta_phone_pose

        goal_pose_7d = np.array(
            list(goal_pose[:3, 3])
            + list(R.from_matrix(goal_pose[:3, :3]).as_quat(scalar_first=True))
        )
        action = {
            "base": goal_pose_7d,
            "gripper": np.array([0.0])
            if self.session.get_latest_data()["toggle"] == self.current_grasp
            else np.array([-255.0]),
        }
        return action
    else:
        kinematics = self.task.env.current_robot.kinematics
        robot_view = self.task.env.current_robot.robot_view

        gripper_mgs = set(robot_view.get_gripper_movegroup_ids())
        mgs_except_gripper = [x for x in robot_view.move_group_ids() if x not in gripper_mgs]

        phone_pose = np.eye(4)
        phone_pose[:3, :3] = self.session.get_latest_data()["rotation"].reshape(3, 3)
        phone_pose[:3, 3] = self.session.get_latest_data()["position"].reshape(3)

        if self.init_phone_pose is None:
            self.init_phone_pose = phone_pose

        delta_phone_pose = self.init_phone_pose @ np.linalg.inv(phone_pose)
        delta_phone_position = -delta_phone_pose[:3, 3] * 2  # Negate to flip directions
        delta_phone_rotation = delta_phone_pose[:3, :3].T  # Transpose to invert rotation
        new_pose = np.eye(4)
        new_pose[:3, :3] = delta_phone_rotation @ self.init_tcp_pose[:3, :3]
        new_pose[:3, 3] = self.init_tcp_pose[:3, 3] + delta_phone_position

        jp = kinematics.ik(
            "arm",
            new_pose,
            mgs_except_gripper,
            robot_view.get_qpos_dict(),
            robot_view.base.pose,
            rel_to_base=True,
        )
        action = robot_view.get_ctrl_dict()
        if jp is not None:
            action.update({mg_id: jp[mg_id] for mg_id in mgs_except_gripper})

        if self.session.get_latest_data()["toggle"] != self.current_grasp:
            action["gripper"] = np.array([255.0])
        else:
            action["gripper"] = np.array([0.0])
        return action

model_output_to_action ¶

model_output_to_action(model_output)

Source code in molmo_spaces/policy/learned_policy/phone_policy.py

def model_output_to_action(self, model_output):
    return model_output

obs_to_model_input ¶

obs_to_model_input(obs)

Source code in molmo_spaces/policy/learned_policy/phone_policy.py

def obs_to_model_input(self, obs):
    if self.robot_type == "floating_rum":
        self.render(obs)

        T_world_robot = np.eye(4)
        T_world_robot[:3, 3] = obs["robot_base_pose"][:3]
        T_world_robot[:3, :3] = R.from_quat(
            obs["robot_base_pose"][3:7], scalar_first=True
        ).as_matrix()

        if self.init_robot_pose is None:
            self.init_robot_pose = T_world_robot

        return {
            "T_world_robot": T_world_robot,
        }
    else:
        self.render(obs)
        robot_pose = obs["robot_base_pose"]
        tcp_pose = obs["tcp_pose"]
        qpos = obs["qpos"]["arm"]

        T_world_robot = np.eye(4)
        T_world_robot[:3, :3] = R.from_quat(robot_pose[3:], scalar_first=True).as_matrix()
        T_world_robot[:3, 3] = robot_pose[:3]

        T_robot_tcp = np.eye(4)
        T_robot_tcp[:3, :3] = R.from_quat(tcp_pose[3:], scalar_first=True).as_matrix()
        T_robot_tcp[:3, 3] = tcp_pose[:3]

        if self.init_tcp_pose is None:
            self.init_tcp_pose = T_robot_tcp

        T_world_tcp = T_world_robot @ T_robot_tcp

        return {
            "qpos": qpos,
            "T_world_robot": T_world_robot,
            "T_world_tcp": T_world_tcp,
            "T_robot_tcp": T_robot_tcp,
        }

prepare_model `abstractmethod` ¶

prepare_model(model_name: str)

Source code in molmo_spaces/policy/base_policy.py

@abstractmethod
def prepare_model(self, model_name: str):
    raise NotImplementedError

render ¶

render(obs)

Source code in molmo_spaces/policy/learned_policy/phone_policy.py

def render(self, obs):
    if self.robot_type == "floating_rum":
        views = np.concatenate([obs["wrist_camera"], obs["exo_camera_1"]], axis=1)

        try:
            window_exists = cv2.getWindowProperty("views", cv2.WND_PROP_VISIBLE) >= 0
        except cv2.error:
            window_exists = False
        if not window_exists:
            screen_res = (1920, 1080)
            aspect_ratio = views.shape[0] / views.shape[1]
            new_width = int(screen_res[0] * 0.9)
            new_height = int(new_width * aspect_ratio)
            cv2.namedWindow("views", cv2.WINDOW_NORMAL)
            cv2.resizeWindow("views", new_width, new_height)
            cv2.moveWindow(
                "views", (screen_res[0] - new_width) // 2, (screen_res[1] - new_height) // 2
            )
        cv2.imshow("views", cv2.cvtColor(views, cv2.COLOR_RGB2BGR))
        cv2.waitKey(1)
    else:
        views = np.concatenate(
            [np.flip(np.flip(obs["wrist_camera"], axis=0), axis=1), obs["exo_camera_1"]], axis=1
        )

        try:
            window_exists = cv2.getWindowProperty("views", cv2.WND_PROP_VISIBLE) >= 0
        except cv2.error:
            window_exists = False
        if not window_exists:
            screen_res = (1920, 1080)
            aspect_ratio = views.shape[0] / views.shape[1]
            new_width = int(screen_res[0] * 0.9)
            new_height = int(new_width * aspect_ratio)
            cv2.namedWindow("views", cv2.WINDOW_NORMAL)
            cv2.resizeWindow("views", new_width, new_height)
            cv2.moveWindow(
                "views", (screen_res[0] - new_width) // 2, (screen_res[1] - new_height) // 2
            )
        cv2.imshow("views", cv2.cvtColor(views, cv2.COLOR_RGB2BGR))
        cv2.waitKey(1)

reset ¶

reset()

Source code in molmo_spaces/policy/learned_policy/phone_policy.py

def reset(self):
    self.init_robot_pose = None
    self.init_phone_pose = None
    self.init_tcp_pose = None
    self.current_grasp = self.session.get_latest_data()["toggle"]

pi_policy ¶

Classes:

Name	Description
`PI_Policy`
`PI_PolicyState`

Attributes:

Name	Type	Description
`log`

log `module-attribute` ¶

log = getLogger(__name__)

PI_Policy ¶

PI_Policy(exp_config: MlSpacesExpConfig, task_type: str)

Bases: InferencePolicy, StatefulPolicy

Methods:

Name	Description
`add_auxiliary_objects`	Add auxiliary objects to the scene that might be required for the policy.
`get_action`
`get_all_phases`	Returns:
`get_info`
`get_phase`	Returns:
`get_state`
`inference_model`
`model_output_to_action`
`obs_to_model_input`
`prepare_model`
`render`
`reset`
`set_state`

Attributes:

Name	Type	Description
`checkpoint_path`
`chunk_size`
`config`
`current_phase`
`grasping_threshold`
`grasping_type`
`model`
`remote_config`
`target_poses`
`task`
`task_type`

Source code in molmo_spaces/policy/learned_policy/pi_policy.py

def __init__(
    self,
    exp_config: MlSpacesExpConfig,
    task_type: str,
) -> None:
    super().__init__(exp_config, exp_config.task_type)
    self.remote_config = exp_config.policy_config.remote_config
    self.checkpoint_path = exp_config.policy_config.checkpoint_path
    self.grasping_type = exp_config.policy_config.grasping_type
    self.chunk_size = exp_config.policy_config.chunk_size
    self.grasping_threshold = exp_config.policy_config.grasping_threshold
    self.model = None  # don't init model till inference to allow multiprocessing

checkpoint_path `instance-attribute` ¶

checkpoint_path = checkpoint_path

chunk_size `instance-attribute` ¶

chunk_size = chunk_size

config `instance-attribute` ¶

config = config

current_phase `instance-attribute` ¶

current_phase = NONE_PHASE

grasping_threshold `instance-attribute` ¶

grasping_threshold = grasping_threshold

grasping_type `instance-attribute` ¶

grasping_type = grasping_type

model `instance-attribute` ¶

model = None

remote_config `instance-attribute` ¶

remote_config = remote_config

target_poses `instance-attribute` ¶

target_poses = {'grasp': eye(4)}

task `instance-attribute` ¶

task = task

task_type `instance-attribute` ¶

task_type = task_type

add_auxiliary_objects `staticmethod` ¶

add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py

@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None

get_action ¶

get_action(observation)

Source code in molmo_spaces/policy/base_policy.py

def get_action(self, observation):
    model_input = self.obs_to_model_input(observation)
    model_output = self.inference_model(model_input)
    action = self.model_output_to_action(model_output)
    return action

get_all_phases ¶

get_all_phases() -> dict[str | int]

Returns:

Type	Description
`dict[str \| int]`	A dictionary of all possible policy phases

Source code in molmo_spaces/policy/base_policy.py

def get_all_phases(self) -> dict[str | int]:
    """
    Returns:
        A dictionary of all possible policy phases
    """
    return {"unknown": 0}

get_info ¶

get_info() -> dict

Source code in molmo_spaces/policy/learned_policy/pi_policy.py

def get_info(self) -> dict:
    info = super().get_info()
    info["policy_name"] = (
        self.model.get_server_metadata().get("policy_name", "pi")
        if hasattr(self.model, "get_server_metadata")
        else "pi"
    )
    info["policy_checkpoint"] = self.model_name
    info["policy_buffer_length"] = self.chunk_size
    info["policy_grasping_threshold"] = self.grasping_threshold
    info["policy_grasping_type"] = self.grasping_type
    info["prompt"] = self.task.get_task_description()
    log.info(f"Current prompt: {info['prompt']}")

    info["time_spent"] = time.time() - self.starting_time if self.starting_time else None
    info["timestamp"] = time.time()
    return info

get_phase ¶

get_phase() -> str

Returns:

Type	Description
`str`	The policy phase

Source code in molmo_spaces/policy/base_policy.py

def get_phase(self) -> str:
    """
    Returns:
        The policy phase
    """
    return "unknown"

get_state ¶

get_state()

Source code in molmo_spaces/policy/learned_policy/pi_policy.py

def get_state(self):
    return PI_PolicyState(
        actions_buffer=self.actions_buffer,
        current_buffer_index=self.current_buffer_index,
        starting_time=self.starting_time,
    )

inference_model ¶

inference_model(model_input)

Source code in molmo_spaces/policy/learned_policy/pi_policy.py

def inference_model(self, model_input):
    if self.model is None:
        self.prepare_model()
    if self.starting_time is None:
        self.starting_time = time.time()
    if self.actions_buffer is None or self.current_buffer_index >= self.chunk_size:
        import websockets

        try:
            self.actions_buffer = self.model.infer(model_input)["actions"]
        except websockets.exceptions.ConnectionClosedError:
            log.error("Connection closed error. Attempting to reset connection...")
            self.prepare_model()
            log.info("Sleeping 5s...")
            time.sleep(5)
            log.info("Retrying inference...")
            self.actions_buffer = self.model.infer(model_input)["actions"]
        self.current_buffer_index = 0
    model_output = self.actions_buffer[self.current_buffer_index]
    self.current_buffer_index += 1
    return model_output

model_output_to_action ¶

model_output_to_action(model_output)

Source code in molmo_spaces/policy/learned_policy/pi_policy.py

def model_output_to_action(self, model_output):
    if self.grasping_type == "continuous":
        gripper_pos = model_output[7] * np.array([255.0])
    else:  # binary
        gripper_pos = (
            np.array([255.0]) if model_output[7] > self.grasping_threshold else np.array([0.0])
        )

    arm_output = model_output[:7].reshape(
        7,
    )
    action = {
        "arm": arm_output,
        "gripper": gripper_pos,
    }
    return action

obs_to_model_input ¶

obs_to_model_input(obs)

Source code in molmo_spaces/policy/learned_policy/pi_policy.py

def obs_to_model_input(self, obs):
    # self.render(obs)
    if isinstance(obs, list):
        if len(obs) > 1:
            log.warning(
                "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n"
                "WARNING: obs list has %d elements but only using the first one!\n"
                "This may indicate a batching issue - expected single observation.\n"
                "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!",
                len(obs),
            )
        obs = obs[0]
    model_input = {**obs}
    prompt = self.task.get_task_description()

    # For local eval
    if isinstance(obs, list | tuple):
        obs = obs[0]

    grip = np.clip(obs["qpos"]["gripper"][0] / 0.824033, 0, 1)
    exo_camera_key = (
        "droid_shoulder_light_randomization"
        if "droid_shoulder_light_randomization" in obs
        else "exo_camera_1"
    )
    wrist_camera_key = (
        "wrist_camera_zed_mini" if "wrist_camera_zed_mini" in obs else "wrist_camera"
    )
    model_input = {
        "observation/exterior_image_1_left": resize_with_pad(obs[exo_camera_key], 224, 224),
        "observation/wrist_image_left": resize_with_pad(obs[wrist_camera_key], 224, 224),
        "observation/joint_position": np.array(obs["qpos"]["arm"][:7]).reshape(
            7,
        ),
        "observation/gripper_position": np.array(grip).reshape(
            1,
        ),
        "prompt": prompt.lower(),
    }
    return model_input

prepare_model ¶

prepare_model()

Source code in molmo_spaces/policy/learned_policy/pi_policy.py

def prepare_model(self):
    self.model_name = os.path.basename(self.checkpoint_path)
    if self.remote_config is not None:
        self._prepare_remote_model(self.checkpoint_path)
    else:
        self._prepare_local_model(self.checkpoint_path)

render ¶

render(obs)

Source code in molmo_spaces/policy/learned_policy/pi_policy.py

def render(self, obs):
    views = np.concatenate([obs["wrist_camera"], obs["exo_camera_1"]], axis=1)
    cv2.imshow("views", cv2.cvtColor(views, cv2.COLOR_RGB2BGR))
    cv2.waitKey(1)

reset ¶

reset()

Source code in molmo_spaces/policy/learned_policy/pi_policy.py

def reset(self):
    self.actions_buffer = None
    self.current_buffer_index = 0
    self.starting_time = None

set_state ¶

set_state(state: PI_PolicyState)

Source code in molmo_spaces/policy/learned_policy/pi_policy.py

def set_state(self, state: PI_PolicyState):
    self.actions_buffer = state.actions_buffer
    self.current_buffer_index = state.current_buffer_index
    self.starting_time = state.starting_time

PI_PolicyState `dataclass` ¶

PI_PolicyState(actions_buffer: list[ndarray] | None = None, current_buffer_index: int = 0, starting_time: float | None = None)

Attributes:

Name	Type	Description
`actions_buffer`	`list[ndarray] \| None`
`current_buffer_index`	`int`
`starting_time`	`float \| None`

actions_buffer `class-attribute` `instance-attribute` ¶

actions_buffer: list[ndarray] | None = None

current_buffer_index `class-attribute` `instance-attribute` ¶

current_buffer_index: int = 0

starting_time `class-attribute` `instance-attribute` ¶

starting_time: float | None = None

rum_client ¶

Classes:

Name	Description
`RUMClient`

Attributes:

Name	Type	Description
`log`

log `module-attribute` ¶

log = getLogger(__name__)

RUMClient ¶

RUMClient(host: str = 'localhost', port: int = 8765, max_size: int = 50 * 1024 * 1024)

Methods:

Name	Description
`close`
`get_server_metadata`
`infer`
`infer_point`
`infer_point_molmo`
`reset`

Attributes:

Name	Type	Description
`uri`
`websocket`

Source code in molmo_spaces/policy/learned_policy/rum_client.py

def __init__(self, host: str = "localhost", port: int = 8765, max_size: int = 50 * 1024 * 1024):
    if ws_client is None:
        raise ImportError("websockets package required. Install with: pip install websockets")
    self.uri = f"ws://{host}:{port}"
    self.websocket = ws_client.connect(self.uri, max_size=max_size)

uri `instance-attribute` ¶

uri = f'ws://{host}:{port}'

websocket `instance-attribute` ¶

websocket = connect(uri, max_size=max_size)

close ¶

close()

Source code in molmo_spaces/policy/learned_policy/rum_client.py

def close(self):
    if self.websocket:
        self.websocket.close()
        self.websocket = None

get_server_metadata ¶

get_server_metadata() -> dict

Source code in molmo_spaces/policy/learned_policy/rum_client.py

def get_server_metadata(self) -> dict:
    response = self._send({"action": "metadata"})
    return {
        "policy_name": response.get("policy_name"),
        "checkpoint": response.get("checkpoint"),
    }

infer ¶

infer(observation: dict) -> ndarray

Source code in molmo_spaces/policy/learned_policy/rum_client.py

def infer(self, observation: dict) -> np.ndarray:
    request = {
        "action": "infer",
        "observation": {
            "rgb_ego": observation["rgb_ego"].tolist(),
            "object_3d_position": observation["object_3d_position"].tolist(),
        },
    }
    return np.array(self._send(request)["result"], dtype=np.float32)

infer_point ¶

infer_point(rgb: ndarray, object_name: str, task: str) -> ndarray

Source code in molmo_spaces/policy/learned_policy/rum_client.py

def infer_point(self, rgb: np.ndarray, object_name: str, task: str) -> np.ndarray:
    try:
        import google.genai as genai
        from google.genai import types
    except ImportError as e:
        raise ImportError(
            "google-genai package is not installed. Please install it with `pip install google-genai`."
        ) from e

    if not hasattr(self, "_gemini_client"):
        api_key = os.getenv("GEMINI_API_KEY")
        if not api_key:
            raise ValueError(
                "GEMINI_API_KEY environment variable not set. "
                "Please export GEMINI_API_KEY=your_api_key"
            )
        self._gemini_client = genai.Client(api_key=api_key)

    if rgb.dtype == np.float32 or rgb.dtype == np.float64:
        rgb = (rgb * 255).astype(np.uint8)
    image = Image.fromarray(rgb)

    if task == "pick":
        prompt = f"""Get the best point matching the following object: {object_name}. The label returned should be an identifying name for the object detected in the image, and the point should be at a point to pick that object.
The answer should follow the json format: [{{"point": [y, x], "label": "{object_name}"}}...]. The points are in [y, x] format normalized to 0-1000."""
    elif task == "open":
        prompt = f"""Get the best point to be grasped to open the {object_name} (eg. a handle). The label returned should be an identifying name for the object detected in the image, and the point should be the best point to open that object.
The answer should follow the json format: [{{"point": [y, x], "label": "{object_name}"}}...]. The points are in [y, x] format normalized to 0-1000."""
    elif task == "close":
        prompt = f"""Get the best point to be grasped to close the {object_name} (eg. a handle). The label returned should be an identifying name for the object detected in the image, and the point should be the best point to close that object.
The answer should follow the json format: [{{"point": [y, x], "label": "{object_name}"}}...]. The points are in [y, x] format normalized to 0-1000."""
    else:
        raise ValueError(f"Unsupported task: {task}")

    try:
        buf = io.BytesIO()
        image.save(buf, format="PNG")
        image_bytes = buf.getvalue()

        contents = [
            types.Part.from_bytes(
                data=image_bytes,
                mime_type="image/png",
            ),
            prompt,
        ]

        config = types.GenerateContentConfig(
            temperature=0.0, thinking_config=types.ThinkingConfig(thinking_budget=0)
        )

        response = self._call_gemini_api_with_retry(
            model="gemini-robotics-er-1.5-preview", contents=contents, config=config
        )
        generated_text = response.text

        json_point = self._parse_json_point_yx(generated_text)
        if json_point:
            y_pos, x_pos = json_point
            x_norm = float(x_pos) / 1000.0
            y_norm = float(y_pos) / 1000.0

            x_norm = max(0.0, min(1.0, x_norm))
            y_norm = max(0.0, min(1.0, y_norm))

            return np.array([x_norm, y_norm], dtype=np.float32)

        return np.array([0.5, 0.5], dtype=np.float32)

    except Exception as e:
        log.error(f"Gemini API error: {e}")
        return np.array([0.5, 0.5], dtype=np.float32)

infer_point_molmo ¶

infer_point_molmo(rgb: ndarray, object_name: str = None, prompt: str = None, task: str = 'pick') -> ndarray

Source code in molmo_spaces/policy/learned_policy/rum_client.py

def infer_point_molmo(
    self, rgb: np.ndarray, object_name: str = None, prompt: str = None, task: str = "pick"
) -> np.ndarray:
    if object_name is None and prompt is None:
        raise ValueError("Either 'object_name' or 'prompt' must be provided")
    if object_name is not None and prompt is not None:
        raise ValueError("Provide either 'object_name' or 'prompt', not both")

    request = {
        "action": "infer_point",
        "rgb": rgb.tolist(),
    }

    if object_name is not None:
        request["object_name"] = object_name
        request["task"] = task
    else:
        request["prompt"] = prompt

    response = self._send(request)
    return np.array(response["point"], dtype=np.float32)

reset ¶

reset()

Source code in molmo_spaces/policy/learned_policy/rum_client.py

def reset(self):
    self._send({"action": "reset"})

spacemouse_policy ¶

Classes:

Name	Description
`SpaceMouse_Policy`

Attributes:

Name	Type	Description
`DOUBLE_CLICK_TIME`
`VENDOR_ID`
`log`

DOUBLE_CLICK_TIME `module-attribute` ¶

DOUBLE_CLICK_TIME = 0.3

VENDOR_ID `module-attribute` ¶

VENDOR_ID = 9583

log `module-attribute` ¶

log = getLogger(__name__)

SpaceMouse_Policy ¶

SpaceMouse_Policy(exp_config: MlSpacesExpConfig, task_type: str)

Bases: InferencePolicy

Methods:

Name	Description
`add_auxiliary_objects`	Add auxiliary objects to the scene that might be required for the policy.
`get_action`
`get_all_phases`	Returns:
`get_info`
`get_phase`	Returns:
`inference_model`
`model_output_to_action`
`obs_to_model_input`
`prepare_model`
`render`
`reset`

Attributes:

Name	Type	Description
`config`
`current_phase`
`pos_sensitivity`
`robot_type`
`rot_sensitivity`
`target_poses`
`task`
`task_type`

Source code in molmo_spaces/policy/learned_policy/spacemouse_policy.py

def __init__(
    self,
    exp_config: MlSpacesExpConfig,
    task_type: str,
) -> None:
    super().__init__(exp_config, task_type)
    self.robot_type = exp_config.robot_config.name
    self.pos_sensitivity = exp_config.policy_config.pos_sensitivity
    self.rot_sensitivity = exp_config.policy_config.rot_sensitivity
    self._mouse = _SpaceMouseReader(exp_config.policy_config.product_id)
    self._pressed = set()
    self._listener = keyboard.Listener(
        on_press=self._on_press,
        on_release=self._on_release,
    )
    self._listener.start()
    log.info(
        "SpaceMouse ready. Right click: engage. Left click: toggle gripper. Right double-click: disengage. q: pause."
    )

config `instance-attribute` ¶

config = config

current_phase `instance-attribute` ¶

current_phase = NONE_PHASE

pos_sensitivity `instance-attribute` ¶

pos_sensitivity = pos_sensitivity

robot_type `instance-attribute` ¶

robot_type = name

rot_sensitivity `instance-attribute` ¶

rot_sensitivity = rot_sensitivity

target_poses `instance-attribute` ¶

target_poses = {'grasp': eye(4)}

task `instance-attribute` ¶

task = task

task_type `instance-attribute` ¶

task_type = task_type

add_auxiliary_objects `staticmethod` ¶

add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py

@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None

get_action ¶

get_action(observation)

Source code in molmo_spaces/policy/base_policy.py

def get_action(self, observation):
    model_input = self.obs_to_model_input(observation)
    model_output = self.inference_model(model_input)
    action = self.model_output_to_action(model_output)
    return action

get_all_phases ¶

get_all_phases() -> dict[str | int]

Returns:

Type	Description
`dict[str \| int]`	A dictionary of all possible policy phases

Source code in molmo_spaces/policy/base_policy.py

def get_all_phases(self) -> dict[str | int]:
    """
    Returns:
        A dictionary of all possible policy phases
    """
    return {"unknown": 0}

get_info ¶

get_info() -> dict

Source code in molmo_spaces/policy/learned_policy/spacemouse_policy.py

def get_info(self) -> dict:
    info = super().get_info()
    info["policy_name"] = "spacemouse"
    info["timestamp"] = time.time()
    return info

get_phase ¶

get_phase() -> str

Returns:

Type	Description
`str`	The policy phase

Source code in molmo_spaces/policy/base_policy.py

def get_phase(self) -> str:
    """
    Returns:
        The policy phase
    """
    return "unknown"

inference_model ¶

inference_model(model_input)

Source code in molmo_spaces/policy/learned_policy/spacemouse_policy.py

def inference_model(self, model_input):
    if self._is_paused():
        return None

    ctrl = self._mouse.control if self._mouse.engaged else [0.0] * 6
    delta_pos = np.array(ctrl[:3]) * self.pos_sensitivity
    delta_rot = R.from_euler("xyz", np.array(ctrl[3:]) * self.rot_sensitivity).as_matrix()

    new_position = self.current_position + self.current_rotation @ delta_pos
    new_rotation = delta_rot @ self.current_rotation

    gripper_open = self._mouse.gripper_open

    if self.robot_type == "floating_rum":
        self.current_position = new_position
        self.current_rotation = new_rotation
        goal_pose = self.init_robot_pose.copy()
        goal_pose[:3, 3] = self.current_position
        goal_pose[:3, :3] = self.current_rotation

        goal_pose_7d = np.array(
            list(goal_pose[:3, 3])
            + list(R.from_matrix(goal_pose[:3, :3]).as_quat(scalar_first=True))
        )
        return {
            "base": goal_pose_7d,
            "gripper": np.array([0.0]) if gripper_open else np.array([-255.0]),
        }
    else:
        kinematics = self.task.env.current_robot.kinematics
        robot_view = self.task.env.current_robot.robot_view

        gripper_mgs = set(robot_view.get_gripper_movegroup_ids())
        mgs_except_gripper = [x for x in robot_view.move_group_ids() if x not in gripper_mgs]

        new_pose = np.eye(4)
        new_pose[:3, 3] = new_position
        new_pose[:3, :3] = new_rotation

        jp = kinematics.ik(
            "arm",
            new_pose,
            mgs_except_gripper,
            robot_view.get_qpos_dict(),
            robot_view.base.pose,
            rel_to_base=True,
        )
        action = robot_view.get_ctrl_dict()
        if jp is not None:
            self.current_position = new_position
            self.current_rotation = new_rotation
            action.update({mg_id: jp[mg_id] for mg_id in mgs_except_gripper})

        action["gripper"] = np.array([0.0]) if gripper_open else np.array([255.0])
        return action

model_output_to_action ¶

model_output_to_action(model_output)

Source code in molmo_spaces/policy/learned_policy/spacemouse_policy.py

def model_output_to_action(self, model_output):
    return model_output

obs_to_model_input ¶

obs_to_model_input(obs)

Source code in molmo_spaces/policy/learned_policy/spacemouse_policy.py

def obs_to_model_input(self, obs):
    self.render(obs)

    robot_pose = obs["robot_base_pose"]
    T_world_robot = np.eye(4)
    T_world_robot[:3, :3] = R.from_quat(robot_pose[3:], scalar_first=True).as_matrix()
    T_world_robot[:3, 3] = robot_pose[:3]

    if self.robot_type == "floating_rum":
        if self.init_robot_pose is None:
            self.init_robot_pose = T_world_robot
            self.current_position = T_world_robot[:3, 3].copy()
            self.current_rotation = T_world_robot[:3, :3].copy()
        return {"T_world_robot": T_world_robot}

    tcp_pose = obs["tcp_pose"]
    qpos = obs["qpos"]["arm"]

    T_robot_tcp = np.eye(4)
    T_robot_tcp[:3, :3] = R.from_quat(tcp_pose[3:], scalar_first=True).as_matrix()
    T_robot_tcp[:3, 3] = tcp_pose[:3]

    if self.init_tcp_pose is None:
        self.init_tcp_pose = T_robot_tcp.copy()
        self.current_position = T_robot_tcp[:3, 3].copy()
        self.current_rotation = T_robot_tcp[:3, :3].copy()

    T_world_tcp = T_world_robot @ T_robot_tcp
    return {
        "qpos": qpos,
        "T_world_robot": T_world_robot,
        "T_world_tcp": T_world_tcp,
        "T_robot_tcp": T_robot_tcp,
    }

prepare_model `abstractmethod` ¶

prepare_model(model_name: str)

Source code in molmo_spaces/policy/base_policy.py

@abstractmethod
def prepare_model(self, model_name: str):
    raise NotImplementedError

render ¶

render(obs)

Source code in molmo_spaces/policy/learned_policy/spacemouse_policy.py

def render(self, obs):
    if self.robot_type == "floating_rum":
        try:
            window_exists = cv2.getWindowProperty("views", cv2.WND_PROP_VISIBLE) >= 0
        except cv2.error:
            window_exists = False

        if not window_exists:
            screen_res = (1920, 1080)
            aspect_ratio = obs["wrist_camera"].shape[0] / obs["wrist_camera"].shape[1]
            new_width = int(screen_res[0] * 0.9)
            new_height = int(new_width * aspect_ratio)
            cv2.namedWindow("views", cv2.WINDOW_NORMAL)
            cv2.resizeWindow("views", new_width, new_height)
            cv2.moveWindow(
                "views", (screen_res[0] - new_width) // 2, (screen_res[1] - new_height) // 2
            )

        cv2.imshow("views", cv2.cvtColor(obs["wrist_camera"], cv2.COLOR_RGB2BGR))
        cv2.waitKey(1)
    else:
        views = np.concatenate(
            [np.flip(np.flip(obs["wrist_camera"], axis=0), axis=1), obs["exo_camera_1"]], axis=1
        )
        try:
            window_exists = cv2.getWindowProperty("views", cv2.WND_PROP_VISIBLE) >= 0
        except cv2.error:
            window_exists = False

        if not window_exists:
            screen_res = (1920, 1080)
            aspect_ratio = views.shape[0] / views.shape[1]
            new_width = int(screen_res[0] * 0.9)
            new_height = int(new_width * aspect_ratio)
            cv2.namedWindow("views", cv2.WINDOW_NORMAL)
            cv2.resizeWindow("views", new_width, new_height)
            cv2.moveWindow(
                "views", (screen_res[0] - new_width) // 2, (screen_res[1] - new_height) // 2
            )

        cv2.imshow("views", cv2.cvtColor(views, cv2.COLOR_RGB2BGR))
        cv2.waitKey(1)

reset ¶

reset()

Source code in molmo_spaces/policy/learned_policy/spacemouse_policy.py

def reset(self):
    self.init_robot_pose = None
    self.init_tcp_pose = None
    self.current_position = None
    self.current_rotation = None
    self._mouse.reset_grasp()

utils ¶

Classes:

Name	Description
`PromptSampler`

Functions:

Name	Description
`generate_object_hash`
`resize_with_pad`

Attributes:

Name	Type	Description
`log`

log `module-attribute` ¶

log = getLogger(__name__)

PromptSampler ¶

PromptSampler(task_type: str = 'pick', prompt_templates: list[str] = None, prompt_object_word_num: int = 1, disambiguate_distractors_by_pos: bool = False)

Parameters:

Name	Type	Description	Default
`task_type`	`str`	The type of task to sample prompts for.	`'pick'`
`prompt_templates`	`list[str]`	A list of prompt templates to sample from. If None, the default templates for the task type will be used.	`None`
`prompt_object_word_num`	`int`	The number of words to use for the object name in the prompt.	`1`
`disambiguate_distractors_by_pos`	`bool`	Whether to disambiguate distractors by position in the prompt. This relies on functionality only present when using a frozen config.	`False`

Methods:

Name	Description
`clean_object_name`
`get_prompt`
`get_short_description`
`get_state`
`get_target_object_uid`
`next`
`set_state`

Attributes:

Name	Type	Description
`DEFAULT_TEMPLATES_BY_TASK`
`current_index`
`prompt_object_word_num`
`prompt_templates`
`task_type`

Source code in molmo_spaces/policy/learned_policy/utils.py

def __init__(
    self,
    task_type: str = "pick",
    prompt_templates: list[str] = None,
    prompt_object_word_num: int = 1,
    disambiguate_distractors_by_pos: bool = False,
) -> None:
    """
    Args:
        task_type: The type of task to sample prompts for.
        prompt_templates: A list of prompt templates to sample from. If None, the default templates for the task type will be used.
        prompt_object_word_num: The number of words to use for the object name in the prompt.
        disambiguate_distractors_by_pos: Whether to disambiguate distractors by position in the prompt.
            This relies on functionality only present when using a frozen config.
    """
    if prompt_templates is not None and task_type in ["pick", "pick_and_place"]:
        self.prompt_templates = prompt_templates
    elif task_type in self.DEFAULT_TEMPLATES_BY_TASK:
        self.prompt_templates = self.DEFAULT_TEMPLATES_BY_TASK[task_type]
    else:
        raise ValueError(
            f"Unknown task_type '{task_type}'. "
            f"Available task types: {list(self.DEFAULT_TEMPLATES_BY_TASK.keys())}"
        )
    self.task_type = task_type
    self.current_index = -1
    self.prompt_object_word_num = prompt_object_word_num
    self._cached_prompt = None
    self._disambiguate_distractors_by_pos = disambiguate_distractors_by_pos

DEFAULT_TEMPLATES_BY_TASK `class-attribute` `instance-attribute` ¶

DEFAULT_TEMPLATES_BY_TASK = {'pick': ['pick up the {}.'], 'open': ['open the {}.'], 'pick_and_place': ['pick up the {} and place it on the {}.'], 'packing': ['pack container.'], 'close': ['close the {}.']}

current_index `instance-attribute` ¶

current_index = -1

prompt_object_word_num `instance-attribute` ¶

prompt_object_word_num = prompt_object_word_num

prompt_templates `instance-attribute` ¶

prompt_templates = prompt_templates

task_type `instance-attribute` ¶

task_type = task_type

clean_object_name ¶

clean_object_name(task: BaseMujocoTask) -> str

Source code in molmo_spaces/policy/learned_policy/utils.py

def clean_object_name(self, task: BaseMujocoTask) -> str:
    return self.get_short_description(self.get_target_object_uid(task))[0]

get_prompt ¶

get_prompt(task: BaseMujocoTask) -> str

Source code in molmo_spaces/policy/learned_policy/utils.py

def get_prompt(self, task: BaseMujocoTask) -> str:
    if self._cached_prompt is not None:
        return self._cached_prompt

    object_uid = self.get_target_object_uid(task)
    target_name = task.env.config.task_config.pickup_obj_name

    # Check if this is a custom object with a provided name
    eval_params = task.env.config.eval_runtime_params
    if (
        eval_params
        and eval_params.custom_object_name
        and target_name.startswith("custom_object/")
    ):
        # Use the provided custom object name directly
        object_name = eval_params.custom_object_name.lower()
    else:
        # Standard object handling
        short_descriptions: list[str] = ObjectMeta.short_descriptions(object_uid)
        target_category = "_".join(target_name.split("_")[0:1])

        if not short_descriptions:
            object_name = target_category
        elif self.prompt_object_word_num == 0:
            description = short_descriptions[3].lower()
            object_name = short_descriptions[0].lower()
            object_name = description.replace(object_name, "object")
        else:
            object_name = short_descriptions[self.prompt_object_word_num - 1].lower()

    if self._disambiguate_distractors_by_pos and self.task_type in ["pick", "pick_and_place"]:
        # TODO: this should pull from metadata or something, since object_poses is not guaranteed to be set
        target_pose = task.env.config.task_config.object_poses[target_name]
        robot_pose = task.env.config.task_config.robot_base_pose
        T_world_robot = np.eye(4)
        T_world_robot[:3, 3] = robot_pose[:3]
        T_world_robot[:3, :3] = R.from_quat(robot_pose[3:7], scalar_first=True).as_matrix()
        T_world_target = np.eye(4)
        T_world_target[:3, 3] = target_pose[:3]
        T_world_target[:3, :3] = R.from_quat(target_pose[3:7], scalar_first=True).as_matrix()
        T_robot_target = np.linalg.inv(T_world_robot) @ T_world_target
        target_pos = T_robot_target[:3, 3]

        distractors_pos = []

        for (
            distractor_name,
            distractor_pose,
        ) in task.env.config.task_config.object_poses.items():
            if (
                distractor_name == target_name
                or "_".join(distractor_name.split("_")[0:1]) != target_category
            ):
                continue
            T_world_distractor = np.eye(4)
            T_world_distractor[:3, 3] = distractor_pose[:3]
            T_world_distractor[:3, :3] = R.from_quat(
                distractor_pose[3:7], scalar_first=True
            ).as_matrix()
            T_robot_distractor = np.linalg.inv(T_world_robot) @ T_world_distractor
            if (
                np.linalg.norm(T_robot_distractor[:3, 3] - T_robot_target[:3, 3]) > 1.0
                or np.linalg.norm(T_robot_distractor[:3, 3]) > 1.0
            ):
                continue
            distractors_pos.append(T_robot_distractor[:3, 3])

        if len(distractors_pos) > 0:
            distractors_array = np.array(distractors_pos)

            deltas = target_pos - distractors_array
            abs_deltas = np.abs(deltas)
            min_indices = np.argmin(abs_deltas, axis=0)
            min_components = np.array(
                [
                    deltas[min_indices[0], 0],
                    deltas[min_indices[1], 1],
                    deltas[min_indices[2], 2],
                ]
            )
            max_component_index = np.argmax(np.abs(min_components))
            min_component_value = min_components[max_component_index]
            if max_component_index == 1:
                object_name += " on the left" if min_component_value > 0 else " on the right"
            elif max_component_index == 0:
                object_name += " in the back" if min_component_value > 0 else " in front"
            else:
                object_name += " above" if min_component_value > 0 else " below"

    if self.task_type == "pick_and_place":
        # Get place receptacle name from config (format: "place_receptacle/<uid>")
        place_receptacle_full_name = task.env.config.task_config.place_receptacle_name
        if place_receptacle_full_name:
            receptacle_uid = place_receptacle_full_name.split("/")[-1]
            receptacle_short_descriptions: list[str] = ObjectMeta.short_descriptions(
                receptacle_uid
            )

            if not receptacle_short_descriptions:
                log.warning(
                    "No receptacle short descriptions found, defaulting to 'receptacle'"
                )
                receptacle_name = "receptacle"
            elif self.prompt_object_word_num == 0:
                description = receptacle_short_descriptions[3].lower()
                base_name = receptacle_short_descriptions[0].lower()
                receptacle_name = description.replace(base_name, "object")
            else:
                receptacle_name = receptacle_short_descriptions[
                    self.prompt_object_word_num - 1
                ].lower()
        else:
            log.warning("No place receptacle found in config, defaulting to 'receptacle'")
            receptacle_name = "receptacle"

        self._cached_prompt = self.prompt_templates[self.current_index].format(
            object_name, receptacle_name
        )
    else:
        self._cached_prompt = self.prompt_templates[self.current_index].format(object_name)

    log.info(f"The prompt is: {self._cached_prompt}")
    return self._cached_prompt

get_short_description ¶

get_short_description(object_uid)

Source code in molmo_spaces/policy/learned_policy/utils.py

def get_short_description(self, object_uid):
    return ObjectMeta.get_short_description(object_uid)

get_state ¶

get_state()

Source code in molmo_spaces/policy/learned_policy/utils.py

def get_state(self):
    return {
        "current_index": self.current_index,
        "cached_prompt": self._cached_prompt,
    }

get_target_object_uid ¶

get_target_object_uid(task)

Source code in molmo_spaces/policy/learned_policy/utils.py

def get_target_object_uid(self, task):
    return ObjectMeta.get_target_object_uid(task)

next ¶

next() -> None

Source code in molmo_spaces/policy/learned_policy/utils.py

def next(self) -> None:
    self.current_index = (self.current_index + 1) % len(self.prompt_templates)
    self._cached_prompt = None

set_state ¶

set_state(state)

Source code in molmo_spaces/policy/learned_policy/utils.py

def set_state(self, state):
    self.current_index = state["current_index"]
    self._cached_prompt = state["cached_prompt"]

generate_object_hash ¶

generate_object_hash(asset_id: str) -> str

Source code in molmo_spaces/policy/learned_policy/utils.py

def generate_object_hash(asset_id: str) -> str:
    hasher = hashlib.md5()
    hasher.update(asset_id.encode())
    return hasher.hexdigest()

resize_with_pad ¶

resize_with_pad(images, height, width)

Source code in molmo_spaces/policy/learned_policy/utils.py

def resize_with_pad(images, height, width):
    if images.shape[-3:-1] == (height, width):
        return images

    original_shape = images.shape
    images = images.reshape(-1, *original_shape[-3:])
    resized = np.stack(
        [
            _resize_with_pad_pil(Image.fromarray(im), height, width, method=Image.BILINEAR)
            for im in images
        ]
    )
    return resized.reshape(*original_shape[:-3], *resized.shape[-3:])

websocket_policy ¶

Classes:

Name	Description
`WebsocketPolicy`	Implements the Policy interface by communicating with a server over websocket.

Attributes:

Name	Type	Description
`logger`

logger `module-attribute` ¶

logger = getLogger(__name__)

WebsocketPolicy ¶

WebsocketPolicy(config: MlSpacesExpConfig, model_name: str, host: str = '127.0.0.1', port: int | None = None, connection_timeout: float | None = None)

Bases: InferencePolicy

Implements the Policy interface by communicating with a server over websocket.

See WebsocketPolicyServer for a corresponding server implementation.

Methods:

Name	Description
`add_auxiliary_objects`	Add auxiliary objects to the scene that might be required for the policy.
`close`
`get_action`
`get_all_phases`	Returns:
`get_info`
`get_phase`	Returns:
`get_server_metadata`
`infer`
`inference_model`
`model_output_to_action`
`obs_to_model_input`
`prepare_model`
`render`	Not required to be implemented by subclasses.
`reset`

Attributes:

Name	Type	Description
`config`
`current_phase`
`model_name`
`target_poses`
`task`
`task_type`

Source code in molmo_spaces/policy/learned_policy/websocket_policy.py

def __init__(
    self,
    config: MlSpacesExpConfig,
    model_name: str,
    host: str = "127.0.0.1",
    port: int | None = None,
    connection_timeout: float | None = None,
):
    task_type = getattr(config, "task_type", "pick_and_place")
    super().__init__(config, task_type)
    self.model_name = model_name
    self._last_prompt: str | None = None

    if host.startswith("ws"):
        self._uri = host
    else:
        self._uri = f"ws://{host}"
    if port is not None:
        self._uri += f":{port}"
    self._ws = None
    self._server_metadata = None
    self._prepared = False
    self._connection_timeout = connection_timeout

config `instance-attribute` ¶

config = config

current_phase `instance-attribute` ¶

current_phase = NONE_PHASE

model_name `instance-attribute` ¶

model_name = model_name

target_poses `instance-attribute` ¶

target_poses = {'grasp': eye(4)}

task `instance-attribute` ¶

task = task

task_type `instance-attribute` ¶

task_type = task_type

add_auxiliary_objects `staticmethod` ¶

add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py

@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None

close ¶

close()

Source code in molmo_spaces/policy/learned_policy/websocket_policy.py

def close(self):
    if self._ws is not None:
        logger.info("Closing websocket connection")
        self._ws.close()
        self._ws = None

get_action ¶

get_action(observation)

Source code in molmo_spaces/policy/base_policy.py

def get_action(self, observation):
    model_input = self.obs_to_model_input(observation)
    model_output = self.inference_model(model_input)
    action = self.model_output_to_action(model_output)
    return action

get_all_phases ¶

get_all_phases() -> dict[str | int]

Returns:

Type	Description
`dict[str \| int]`	A dictionary of all possible policy phases

Source code in molmo_spaces/policy/base_policy.py

def get_all_phases(self) -> dict[str | int]:
    """
    Returns:
        A dictionary of all possible policy phases
    """
    return {"unknown": 0}

get_info ¶

get_info() -> dict

Source code in molmo_spaces/policy/learned_policy/websocket_policy.py

def get_info(self) -> dict:
    info = super().get_info()
    info["policy_name"] = "websocket"
    info["policy_model_name"] = self.model_name
    info["prompt"] = self.task.get_task_description()
    return info

get_phase ¶

get_phase() -> str

Returns:

Type	Description
`str`	The policy phase

Source code in molmo_spaces/policy/base_policy.py

def get_phase(self) -> str:
    """
    Returns:
        The policy phase
    """
    return "unknown"

get_server_metadata ¶

get_server_metadata() -> dict

Source code in molmo_spaces/policy/learned_policy/websocket_policy.py

def get_server_metadata(self) -> dict:
    return self._server_metadata

infer ¶

infer(obs: dict) -> dict

Source code in molmo_spaces/policy/learned_policy/websocket_policy.py

def infer(self, obs: dict) -> dict:
    data = msgpack_numpy.packb(obs)
    self._ws.send(data)
    response = self._ws.recv(timeout=10)
    if isinstance(response, str):
        # we're expecting bytes; if the server sends a string, it's an error.
        raise RuntimeError(f"Error in inference server:\n{response}")
    return msgpack_numpy.unpackb(response)

inference_model ¶

inference_model(model_input)

Source code in molmo_spaces/policy/learned_policy/websocket_policy.py

def inference_model(self, model_input):
    self.prepare_model()
    data = msgpack_numpy.packb(model_input)
    self._ws.send(data)
    response = self._ws.recv()
    if isinstance(response, str):
        # we're expecting bytes; if the server sends a string, it's an error.
        raise RuntimeError(f"Error in inference server:\n{response}")
    return msgpack_numpy.unpackb(response)

model_output_to_action ¶

model_output_to_action(model_output)

Source code in molmo_spaces/policy/learned_policy/websocket_policy.py

def model_output_to_action(self, model_output):
    action = {
        "arm": model_output["arm"],
        "gripper": model_output["gripper"],
    }
    return action

obs_to_model_input ¶

obs_to_model_input(obs)

Source code in molmo_spaces/policy/learned_policy/websocket_policy.py

def obs_to_model_input(self, obs):
    if isinstance(obs, list):
        if len(obs) > 1:
            logger.warning(
                "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n"
                "WARNING: obs list has %d elements but only using the first one!\n"
                "This may indicate a batching issue - expected single observation.\n"
                "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!",
                len(obs),
            )
        obs = obs[0]
    model_input = {**obs}
    prompt = self.task.get_task_description()

    if self._last_prompt is None:
        self._last_prompt = prompt
    model_input["task"] = prompt
    return model_input

prepare_model ¶

prepare_model() -> None

Source code in molmo_spaces/policy/learned_policy/websocket_policy.py

def prepare_model(self) -> None:
    if not self._prepared:
        self._ws, self._server_metadata = self._wait_for_server()
        self._prepared = True

render ¶

render(obs)

Not required to be implemented by subclasses. Defaults to noop.

Source code in molmo_spaces/policy/base_policy.py

def render(self, obs):
    """
    Not required to be implemented by subclasses.
    Defaults to noop.
    """
    pass

reset ¶

reset() -> None

Source code in molmo_spaces/policy/learned_policy/websocket_policy.py

def reset(self) -> None:
    self.close()
    self._prepared = False
    self.prepare_model()

random_policy ¶

Classes:

Name	Description
`RandomPolicy`	A Random Policy that selects actions randomly.

RandomPolicy ¶

RandomPolicy(config: MlSpacesExpConfig, task: BaseMujocoTask | None = None)

Bases: BasePolicy

A Random Policy that selects actions randomly.

Methods:

Name	Description
`add_auxiliary_objects`	Add auxiliary objects to the scene that might be required for the policy.
`get_action`	Decide on a random action to take based on the action space.
`get_all_phases`	Returns:
`get_info`	Get additional information from the policy. Called after episode ended. This method can be
`get_phase`	Returns:
`reset`	Reset the policy state. No state to reset for RandomPolicy.

Attributes:

Name	Type	Description
`action_space`
`config`
`task`
`type`	`str`

Source code in molmo_spaces/policy/random_policy.py

def __init__(self, config: "MlSpacesExpConfig", task: BaseMujocoTask | None = None) -> None:
    raise NotImplementedError  # TODO(snehal): please fix
    self.config = config
    self.task = task
    self.action_space = action_space

action_space `instance-attribute` ¶

action_space = action_space

config `instance-attribute` ¶

config = config

task `instance-attribute` ¶

task = task

type `property` ¶

type: str

add_auxiliary_objects `staticmethod` ¶

add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py

@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None

get_action ¶

get_action(obervation)

Decide on a random action to take based on the action space.

Parameters:

Name	Type	Description	Default
`observation`		The current observation about the task or environment (not used).	required

Returns:

Type	Description
	A random action from the action space.

Source code in molmo_spaces/policy/random_policy.py

def get_action(self, obervation):
    """
    Decide on a random action to take based on the action space.

    Args:
        observation: The current observation about the task or environment (not used).

    Returns:
        A random action from the action space.
    """
    return self.action_space.sample()

get_all_phases ¶

get_all_phases() -> dict[str | int]

Returns:

Type	Description
`dict[str \| int]`	A dictionary of all possible policy phases

Source code in molmo_spaces/policy/base_policy.py

def get_all_phases(self) -> dict[str | int]:
    """
    Returns:
        A dictionary of all possible policy phases
    """
    return {"unknown": 0}

get_info ¶

get_info() -> dict

Get additional information from the policy. Called after episode ended. This method can be overridden by subclasses to provide extra information about the policy's state. Must be json serializable.

Returns:

Type	Description
`dict`	A dictionary containing additional information about the policy.

Source code in molmo_spaces/policy/base_policy.py

def get_info(self) -> dict:
    """
    Get additional information from the policy. Called after episode ended. This method can be
    overridden by subclasses to provide extra information about the policy's state.
    Must be json serializable.

    Returns:
        A dictionary containing additional information about the policy.
    """
    return {}

get_phase ¶

get_phase() -> str

Returns:

Type	Description
`str`	The policy phase

Source code in molmo_spaces/policy/base_policy.py

def get_phase(self) -> str:
    """
    Returns:
        The policy phase
    """
    return "unknown"

reset ¶

reset() -> None

Reset the policy state. No state to reset for RandomPolicy.

Source code in molmo_spaces/policy/random_policy.py

def reset(self) -> None:
    """
    Reset the policy state. No state to reset for RandomPolicy.
    """
    pass

solvers ¶

Modules:

Name	Description
`curobo_planner_policy`
`move_solver`
`navigation`
`object_manipulation`
`opening_solver`

curobo_planner_policy ¶

Classes:

Name	Description
`CuroboPlannerPolicy`	Base class for Curobo-based planner policies.

Attributes:

Name	Type	Description
`log`

log `module-attribute` ¶

log = getLogger(__name__)

CuroboPlannerPolicy ¶

CuroboPlannerPolicy(config: MlSpacesExpConfig, task: BaseMujocoTask | None = None)

Bases: PlannerPolicy

Base class for Curobo-based planner policies.

This class provides common functionality for motion planning using Curobo, including trajectory execution, coordinate frame transformations, and gripper control. Subclasses should implement task-specific planning logic.

Methods:

Name	Description
`add_auxiliary_objects`	Add auxiliary objects to the scene that might be required for the policy.
`batch_plan_trajectory`	Plan trajectory using batch motion planning.
`clip_to_velocity_constraint`	Clip action to respect velocity constraints.
`get_action`	Decide on the action to take based on the current observation of the or environment.
`get_all_phases`	Return dictionary mapping phase names to values.
`get_info`	Get additional information from the policy. Called after episode ended. This method can be
`get_look_at_action`	Get action to look at a target.
`get_phase`	Return the current phase.
`reset`	Reset the policy state.
`select_arm`	Select which arm to use based on distance to pickup object.
`solve_ik`	Solve inverse kinematics for a target pose and create interpolated trajectory.
`visualize_world_config_mesh`	Visualize the world configuration as a mesh file.

Attributes:

Name	Type	Description
`arm_end_idx`	`int`
`arm_side`	`str \| None`
`arm_start_idx`	`int`
`config`
`current_gripper_command`	`dict[str, float]`
`is_done`	`bool`	Property to expose completion state for task checking.
`planned_trajectory`	`list[list[float]] \| None`
`planner`	`CuroboPlanner \| None`
`planner_joint_ranges`	`dict[str, tuple[int, int]]`
`planners`	`dict[str, CuroboPlanner]`	Return dictionary of planner instances.
`profiler`
`retry_count`	`int`
`steps_spent_in_waypoint`	`int`
`task`
`trajectory_index`	`int`

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py

def __init__(self, config: MlSpacesExpConfig, task: BaseMujocoTask | None = None) -> None:
    super().__init__(config, task)
    self.config = config

    # Planner instances (to be set by subclasses)
    self.planner: CuroboPlanner | None = None
    self.planner_joint_ranges: dict[str, tuple[int, int]] = {}

    # Trajectory state
    self.planned_trajectory: list[list[float]] | None = None
    self.trajectory_index: int = 0
    self.steps_spent_in_waypoint: int = 0
    self.retry_count: int = 0

    # Arm selection
    self.arm_side: str | None = None
    self.arm_start_idx: int = 0
    self.arm_end_idx: int = 0

    # Gripper state
    self.current_gripper_command: dict[str, float] = {}

    # Profiler
    self.profiler = Profiler()

arm_end_idx `instance-attribute` ¶

arm_end_idx: int = 0

arm_side `instance-attribute` ¶

arm_side: str | None = None

arm_start_idx `instance-attribute` ¶

arm_start_idx: int = 0

config `instance-attribute` ¶

config = config

current_gripper_command `instance-attribute` ¶

current_gripper_command: dict[str, float] = {}

is_done `abstractmethod` `property` ¶

is_done: bool

Property to expose completion state for task checking.

planned_trajectory `instance-attribute` ¶

planned_trajectory: list[list[float]] | None = None

planner `instance-attribute` ¶

planner: CuroboPlanner | None = None

planner_joint_ranges `instance-attribute` ¶

planner_joint_ranges: dict[str, tuple[int, int]] = {}

planners `abstractmethod` `property` ¶

planners: dict[str, CuroboPlanner]

Return dictionary of planner instances.

profiler `instance-attribute` ¶

profiler = Profiler()

retry_count `instance-attribute` ¶

retry_count: int = 0

steps_spent_in_waypoint `instance-attribute` ¶

steps_spent_in_waypoint: int = 0

task `instance-attribute` ¶

task = task

trajectory_index `instance-attribute` ¶

trajectory_index: int = 0

add_auxiliary_objects `staticmethod` ¶

add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py

@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None

batch_plan_trajectory ¶

batch_plan_trajectory() -> None

Plan trajectory using batch motion planning.

Uses the current phase to determine goal poses and plans trajectories in batches for efficiency. Sets self.planned_trajectory to the best trajectory found.

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py

def batch_plan_trajectory(self) -> None:
    """Plan trajectory using batch motion planning.

    Uses the current phase to determine goal poses and plans trajectories
    in batches for efficiency. Sets self.planned_trajectory to the best
    trajectory found.
    """
    init_config = np.concatenate(
        [
            np.zeros(3),
            self.task.env.current_robot.robot_view.get_move_group(
                f"{self.arm_side}_arm"
            ).joint_pos,
        ]
    )

    # Setup collision avoidance if enabled
    if getattr(self.config.policy_config, "enable_collision_avoidance", False):
        if hasattr(self, "_setup_collision_avoidance_config"):
            self._setup_collision_avoidance_config()

    # Get goal poses based on current phase (to be provided by subclass)
    goal_poses = self._get_batch_goal_poses()
    if goal_poses is None or len(goal_poses) == 0:
        log.warning("[BATCH PLAN] No goal poses available")
        return

    total = goal_poses.shape[0]
    batch_size = getattr(self.config.policy_config, "batch_size", 8)
    max_batches = getattr(self.config.policy_config, "max_batch_plan_attempts", 4)
    num_poses = min(max_batches * batch_size, total)
    num_batches = (num_poses + batch_size - 1) // batch_size

    all_successful_trajectories = []
    current_phase = getattr(self, "current_phase", "unknown")

    for batch_start in range(0, num_poses, batch_size):
        batch_end = min(batch_start + batch_size, num_poses)
        batch = goal_poses[batch_start:batch_end]

        if hasattr(self, "_show_poses"):
            self._show_poses(batch, style="tcp")
        if self.task.viewer:
            self.task.viewer.sync()

        log.info(
            f"Processing batch {batch_start // batch_size + 1}/{num_batches}: "
            f"poses {batch_start}-{batch_end - 1}"
        )

        # Transform poses to base frame and convert to 7D format
        batch_base_frame_7d = []
        for pose in batch:
            pose_base = self._target_pose_to_base_frame(pose)
            batch_base_frame_7d.append(pose_base.tolist())

        # Prepare batch inputs for planner
        batch_len = len(batch_base_frame_7d)
        start_states = [init_config.tolist()] * batch_len

        # Plan batch
        log.info(
            f"Planning batch of {batch_len} {current_phase} poses with {self.arm_side} arm"
        )
        result = self.planner.plan_batch(
            start_states,
            batch_base_frame_7d,
            verbose=False,
        )

        # Check for successes
        successes = result.success.cpu().numpy()
        log.info(
            f"{self.arm_side.capitalize()} arm: {np.sum(successes)}/{batch_len} "
            f"{current_phase} planning successes"
        )

        # Process successful trajectories
        if np.any(successes):
            optimized_plan = result.optimized_plan
            position = optimized_plan.position
            if position.ndim == 2:
                position = position.unsqueeze(0)

            for i in range(batch_len):
                if not successes[i]:
                    continue

                trajectory = []
                for t in range(position.shape[1]):
                    waypoint = position[i, t].cpu().tolist()
                    trajectory.append(waypoint)

                self._transform_traj_to_world_frame(trajectory)
                all_successful_trajectories.append(trajectory)

        if all_successful_trajectories:
            log.info(
                f"Found {len(all_successful_trajectories)} successful trajectories, "
                "skipping remaining batches"
            )
            break

    if all_successful_trajectories:
        self.planned_trajectory = self._select_best_trajectory(all_successful_trajectories)
    else:
        log.warning("[BATCH PLAN] No successful trajectories found across all batches")

clip_to_velocity_constraint ¶

clip_to_velocity_constraint(action: dict[str, Any]) -> dict[str, Any]

Clip action to respect velocity constraints.

Parameters:

Name	Type	Description	Default
`action`	`dict[str, Any]`	Dictionary of commanded joint positions by move group.	required

Returns:

Type	Description
`dict[str, Any]`	Clipped action dictionary.

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py

def clip_to_velocity_constraint(self, action: dict[str, Any]) -> dict[str, Any]:
    """Clip action to respect velocity constraints.

    Args:
        action: Dictionary of commanded joint positions by move group.

    Returns:
        Clipped action dictionary.
    """
    velocity_constraints = getattr(self.config.policy_config, "velocity_constraints", {})
    clipped_action = action.copy()

    for move_group, commanded_action in action.items():
        if move_group in velocity_constraints:
            max_velocity_rad_per_s = velocity_constraints[move_group]
            move_group_view = self.task.env.robots[0].robot_view.get_move_group(move_group)
            move_group_joint_pos = move_group_view.joint_pos.copy()

            # Calculate difference, handling angular wraparound for base theta
            diff = commanded_action - move_group_joint_pos
            if move_group == "base":
                # Normalize theta difference to [-π, π]
                diff[2] = np.arctan2(np.sin(diff[2]), np.cos(diff[2]))

            velocity_rad_per_s = diff * 10.0

            # Clip velocity and recalculate commanded action
            clipped_velocity = np.clip(
                velocity_rad_per_s, -max_velocity_rad_per_s, max_velocity_rad_per_s
            )
            commanded_action = move_group_joint_pos + clipped_velocity / 10.0

            clipped_action[move_group] = commanded_action

    return clipped_action

get_action `abstractmethod` ¶

get_action(observation)

Decide on the action to take based on the current observation of the or environment. Information could be observations, goals in the case of an rl_agent, or it could be the full environment state in the case of a planner.

Parameters:

Name	Type	Description	Default
`observation`		The current information about the task or environment.	required

Returns:

Type	Description
	The action to take in response to the information.

Source code in molmo_spaces/policy/base_policy.py

@abstractmethod
def get_action(self, observation):
    """
    Decide on the action to take based on the current observation of the or environment.
    Information could be observations, goals in the case of an rl_agent, or it could be the full
    environment state in the case of a planner.

    Args:
        observation: The current information about the task or environment.

    Returns:
        The action to take in response to the information.
    """
    pass

get_all_phases `abstractmethod` ¶

get_all_phases() -> dict[str, int]

Return dictionary mapping phase names to values.

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py

@abstractmethod
def get_all_phases(self) -> dict[str, int]:
    """Return dictionary mapping phase names to values."""
    pass

get_info ¶

get_info() -> dict

Get additional information from the policy. Called after episode ended. This method can be overridden by subclasses to provide extra information about the policy's state. Must be json serializable.

Returns:

Type	Description
`dict`	A dictionary containing additional information about the policy.

Source code in molmo_spaces/policy/base_policy.py

def get_info(self) -> dict:
    """
    Get additional information from the policy. Called after episode ended. This method can be
    overridden by subclasses to provide extra information about the policy's state.
    Must be json serializable.

    Returns:
        A dictionary containing additional information about the policy.
    """
    return {}

get_look_at_action ¶

get_look_at_action() -> dict[str, Any]

Get action to look at a target.

Subclasses can override this to implement head tracking.

Returns:

Type	Description
`dict[str, Any]`	Dictionary with head control commands, or empty dict.

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py

def get_look_at_action(self) -> dict[str, Any]:
    """Get action to look at a target.

    Subclasses can override this to implement head tracking.

    Returns:
        Dictionary with head control commands, or empty dict.
    """
    return {}

get_phase `abstractmethod` ¶

get_phase() -> Any

Return the current phase.

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py

@abstractmethod
def get_phase(self) -> Any:
    """Return the current phase."""
    pass

reset ¶

reset() -> None

Reset the policy state.

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py

def reset(self) -> None:
    """Reset the policy state."""
    self.planned_trajectory = None
    self.trajectory_index = 0
    self.steps_spent_in_waypoint = 0
    self.retry_count = 0
    self.current_gripper_command = {}

select_arm ¶

select_arm() -> None

Select which arm to use based on distance to pickup object.

Also instantiates the motion planner for the selected arm. This lazy initialization saves ~11GB of GPU memory by only loading one arm's planner.

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py

def select_arm(self) -> None:
    """Select which arm to use based on distance to pickup object.

    Also instantiates the motion planner for the selected arm.
    This lazy initialization saves ~11GB of GPU memory by only loading one arm's planner.
    """
    task_config = self.config.task_config
    om = self.task.env.object_managers[self.task.env.current_batch_index]
    pickup_obj = om.get_object_by_name(task_config.pickup_obj_name)
    pickup_obj_pos = pickup_obj.position

    left_tcp_pose = self.task.env.current_robot.robot_view.get_move_group(
        "left_gripper"
    ).leaf_frame_to_world
    right_tcp_pose = self.task.env.current_robot.robot_view.get_move_group(
        "right_gripper"
    ).leaf_frame_to_world

    left_tcp_pos = left_tcp_pose[:3, 3]
    right_tcp_pos = right_tcp_pose[:3, 3]

    # Compute distances
    left_dist = np.linalg.norm(left_tcp_pos - pickup_obj_pos)
    right_dist = np.linalg.norm(right_tcp_pos - pickup_obj_pos)

    selected_arm = "left" if left_dist < right_dist else "right"
    log.info(
        f"Selected {selected_arm} arm (left dist: {left_dist:.3f}m, right dist: {right_dist:.3f}m)"
    )

    self.arm_side = selected_arm

    # Instantiate the planner for the selected arm only
    log.info(f"Instantiating motion planner for {selected_arm} arm")
    if selected_arm == "left":
        self.planner = CuroboPlanner(
            config=self.config.policy_config.left_curobo_planner_config
        )
        self.planner_joint_ranges = self.config.policy_config.left_planner_joint_ranges
    else:
        self.planner = CuroboPlanner(
            config=self.config.policy_config.right_curobo_planner_config
        )
        self.planner_joint_ranges = self.config.policy_config.right_planner_joint_ranges

    self.arm_start_idx = self.planner_joint_ranges[f"{self.arm_side}_arm"][0]
    self.arm_end_idx = self.planner_joint_ranges[f"{self.arm_side}_arm"][1]

solve_ik ¶

solve_ik(target_pose: ndarray) -> None

Solve inverse kinematics for a target pose and create interpolated trajectory.

Parameters:

Name	Type	Description	Default
`target_pose`	`ndarray`	4x4 transformation matrix for target end-effector pose in world frame.	required

Raises:

Type	Description
`ValueError`	If IK solution cannot be found.

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py

def solve_ik(self, target_pose: np.ndarray) -> None:
    """Solve inverse kinematics for a target pose and create interpolated trajectory.

    Args:
        target_pose: 4x4 transformation matrix for target end-effector pose in world frame.

    Raises:
        ValueError: If IK solution cannot be found.
    """
    init_config = np.concatenate(
        [
            np.zeros(3),
            self.task.env.current_robot.robot_view.get_move_group(
                f"{self.arm_side}_arm"
            ).joint_pos,
        ]
    )

    target_pose_7d = self._target_pose_to_base_frame(target_pose)
    joint_config, _ = self.planner.ik_solve(
        goal_pose=target_pose_7d.tolist(),
        seed_config=init_config.tolist(),
        disable_collision=True,
    )

    current_phase = getattr(self, "current_phase", "unknown")
    if joint_config is None:
        raise ValueError(f"Could not solve {current_phase} phase IK.")

    trajectory = self._interpolate_joint_trajectory(
        init_config,
        np.array(joint_config),
        num_steps=10,
    )
    self._transform_traj_to_world_frame(trajectory)
    self.planned_trajectory = trajectory

visualize_world_config_mesh ¶

visualize_world_config_mesh(world_cfg: WorldConfig) -> None

Visualize the world configuration as a mesh file.

Parameters:

Name	Type	Description	Default
`world_cfg`	`WorldConfig`	Curobo WorldConfig to visualize.	required

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py

def visualize_world_config_mesh(self, world_cfg: WorldConfig) -> None:
    """Visualize the world configuration as a mesh file.

    Args:
        world_cfg: Curobo WorldConfig to visualize.
    """
    from datetime import datetime

    import trimesh
    from curobo.geom.types import WorldConfig

    current_phase = getattr(self, "current_phase", "unknown")
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    file_name = f"world_config_mesh_{current_phase}_{timestamp}"
    obj_path = f"{file_name}.obj"
    initial_joint_configuration = np.array(self._get_current_joint_positions())

    try:
        # Create a combined scene with robot and world obstacles
        combined_scene = trimesh.scene.scene.Scene(base_frame="world_origin")

        # Add robot mesh if we have a joint configuration
        if initial_joint_configuration is not None and self.planner is not None:
            try:
                config_for_mesh = np.array(initial_joint_configuration).copy()
                # Zero out base joints to place robot at origin
                config_for_mesh[:3] = 0.0
                q_tensor = torch.tensor(config_for_mesh).unsqueeze(0).float().cuda()

                robot_spheres_batch = self.planner.motion_gen.kinematics.get_robot_as_spheres(
                    q_tensor
                )
                if robot_spheres_batch and len(robot_spheres_batch) > 0:
                    robot_spheres = robot_spheres_batch[0]
                    for i, sphere in enumerate(robot_spheres):
                        sphere_mesh = trimesh.creation.icosphere(radius=sphere.radius)
                        sphere_transform = np.eye(4)
                        sphere_transform[:3, 3] = sphere.pose[:3]
                        combined_scene.add_geometry(
                            sphere_mesh,
                            geom_name=f"robot_sphere_{i}",
                            parent_node_name="world_origin",
                            transform=sphere_transform,
                        )
                    log.info(
                        f"Added {len(robot_spheres)} robot collision spheres to visualization"
                    )

            except Exception as e:
                log.warning(f"Could not add robot mesh: {e}")

        # Add world obstacles to the scene
        try:
            world_scene = WorldConfig.get_scene_graph(world_cfg, process_color=True)
            for geom_name, geom in world_scene.geometry.items():
                transform = world_scene.graph.get(geom_name)[0]
                combined_scene.add_geometry(
                    geom,
                    geom_name=geom_name,
                    parent_node_name="world_origin",
                    transform=transform,
                )
        except Exception as e:
            log.warning(f"Could not add world obstacles: {e}")

        # Export combined scene
        if len(combined_scene.geometry) > 0:
            combined_scene.export(obj_path)
            log.info(f"Successfully saved combined robot + world mesh to {obj_path}")
        else:
            log.debug("Skipping mesh export - scene is empty")

    except ValueError as e:
        if "empty scene" in str(e).lower():
            log.debug("Skipping mesh export - world config is empty")
        else:
            raise

move_solver ¶

navigation ¶

Modules:

Name	Description
`astar_planner_policy`

astar_planner_policy ¶

Classes:

Name	Description
`AStarPlannerPolicy`
`AStarSmoothPlannerPolicy`

Attributes:

Name	Type	Description
`log`		This planner policy relies on the computation of a sparse set of (x,y) waypoints

log `module-attribute` ¶

log = getLogger(__name__)

This planner policy relies on the computation of a sparse set of (x,y) waypoints from the AStarPlanner. It then builds a navigation plan by interleaving rotation and translation phases with interpolated waypoints (either by slerp for rotation phases, or by linear interpolation for translation ones).

Note: Even though rotation phases could be ideally skipped for waypoints that are colinear (i.e., interpolated between two spatially distant planner waypoints), as the agent orientation is constant, we still allow a correction, since the controller can deviate from the ideal plan. We introduce corrections by means of two mechanisms: 1. Enforcing some intermediate waypoints between spatially distant ones, regardless of Euclidean distance (via path_interpolation_density, which can be kept low, e.g. 1) 2. Enforcing intermediate waypoint to keep consecutive ones under a limit distance (via path_max_inter_waypoint_dist, which should be kept low enough to prevent too much overshooting by the controller)

Note 2: For rotation, we limit the maximal arc length via path_max_inter_waypoint_angle, which we set by default to 10 degrees. This, combined with a fix in the holonomic base control to express the current yaw according to the intended motion direction to prevent wrapping errors, leads to smooth rotations.

Note 3: the current replanning heuristic is brittle, so we can just switch it off by e.g. setting plan_fail_after_waypoint_steps to a value larger than the task horizon.

Some TODOs

Use joint_pos_rel to decide upon failure to complete previous action
Select the nearest plannable random location near target instead of first one with valid plan

AStarPlannerPolicy ¶

AStarPlannerPolicy(config: MlSpacesExpConfig, task: BaseMujocoTask | None = None)

Bases: PlannerPolicy

Methods:

Name	Description
`add_auxiliary_objects`	Add auxiliary objects to the scene that might be required for the policy.
`build_policy_plan`
`current_waypoint`
`get_action`
`get_all_phases`	Returns:
`get_info`	Get additional information from the policy. Called after episode ended. This method can be
`get_phase`	Returns:
`interpolate_waypoints`	Interpolate waypoints between each pair of waypoints.
`max_angle_waypoints`
`max_dist_waypoints`
`planners`
`reset`
`skip_candidate`
`stop_plan`

Attributes:

Name	Type	Description
`candidate_objs`	`list[MlSpacesObject]`
`config`
`nav_goal_sampler`	`NavGoalSampler`
`nav_plan`
`nav_planner`
`robot_view`
`target_object`	`MlSpacesObject`	Get the nearest navigation target object for the current batch.
`target_pos_quat`
`task`

Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py

def __init__(self, config: MlSpacesExpConfig, task: BaseMujocoTask | None = None) -> None:
    super().__init__(config, task)

    self._target_pos_quat = None
    self._nav_goal_sampler = None

    self.config.policy_config.planner_config.agent_radius = (
        self.config.task_sampler_config.robot_safety_radius
    )

    self.nav_planner = AStarPlanner(
        self.config.policy_config.planner_config, self.task.env.current_model_path
    )

    self._current_waypoint = 0
    self._reached_waypoints = 0
    self._nav_plan = None
    self._target_pos_quat = None
    self._dists_to_waypoint = []
    self._retries_left = self.config.policy_config.plan_max_retries
    self._target_object = None
    self._candidate_objs = None
    self._skipped_candidates = set()
    self._replan_after = None

    self.robot_view = task.env.current_robot.robot_view

candidate_objs property ¶

candidate_objs: list[MlSpacesObject]

config instance-attribute ¶

config = config

nav_goal_sampler property ¶

nav_goal_sampler: NavGoalSampler

nav_plan property ¶

nav_plan

nav_planner instance-attribute ¶

nav_planner = AStarPlanner(planner_config, current_model_path)

robot_view instance-attribute ¶

robot_view = robot_view

target_object property ¶

target_object: MlSpacesObject

Get the nearest navigation target object for the current batch.

target_pos_quat property ¶

target_pos_quat

task instance-attribute ¶

task = task

add_auxiliary_objects staticmethod ¶

add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py

@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None

build_policy_plan ¶

build_policy_plan(world_waypoints)

Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py

def build_policy_plan(self, world_waypoints):
    world_waypoints = self.stop_plan(world_waypoints)

    # the first difference computes theta from first to second waypoint
    pos_deltas = world_waypoints[1:] - world_waypoints[:-1]
    # Here we have the thetas from waypoint i to i+1
    thetas = np.arctan2(pos_deltas[:, 1], pos_deltas[:, 0])[:, None]

    combined_waypoints = []

    # First, we orient toward the 1st waypoint from the 0-th waypoint
    start_theta = self.robot_view.get_noop_ctrl_dict(["base"])["base"][2]
    for theta in self.max_angle_waypoints(np.stack([[start_theta], thetas[0]])):
        combined_waypoints.append(np.concatenate((world_waypoints[0], theta)))

    for i in range(1, len(world_waypoints) - 1):
        # We move towards i-th waypoint from the (i-1)-th waypoint
        for waypoint in self.max_dist_waypoints(world_waypoints[i - 1 : i + 1]):
            combined_waypoints.append(np.concatenate((waypoint, thetas[i - 1])))

        # We orient towards (i+1)-th waypoint from the i-th waypoint
        for theta in self.max_angle_waypoints(thetas[i - 1 : i + 1]):
            combined_waypoints.append(np.concatenate((world_waypoints[i], theta)))

    # First arrive at final position with last movement direction
    for waypoint in self.max_dist_waypoints(world_waypoints[-2:]):
        combined_waypoints.append(np.concatenate((waypoint, thetas[-1])))

    # Then rotate to face the target
    final_pos = world_waypoints[-1]
    target_pos = self.target_object.position[:2]
    final_theta = np.arctan2(target_pos[1] - final_pos[1], target_pos[0] - final_pos[0])
    for theta in self.max_angle_waypoints(np.stack([thetas[-1], [final_theta]])):
        combined_waypoints.append(np.concatenate((final_pos, theta)))

    return np.array(combined_waypoints)

current_waypoint ¶

current_waypoint()

Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py

def current_waypoint(self):
    if self._current_waypoint < len(self.nav_plan):
        cur_distance = self.robot_view.distance_to(
            ["base"], self.nav_plan[self._current_waypoint]
        )
        pose = self.robot_view.base.pose
        angle = R.from_matrix(pose[:3, :3]).as_euler("xyz")[2:3]
        delta = pose[:2, 3]
        pose = np.concatenate((delta, angle))

        log.debug(
            f"Steps {self.task.num_steps_taken()}"
            f" Retries left {self._retries_left}"
            f" Waypoint {self._reached_waypoints}/{len(self.nav_plan) + self._reached_waypoints - self._current_waypoint}"
            f" {np.round(self.nav_plan[self._current_waypoint], 3)}"
            f" Pose {np.round(pose, 3)}"
            f" Dist {cur_distance:3f}"
        )

        if self.robot_view.is_close_to(["base"], self.nav_plan[self._current_waypoint]):
            self._reached_waypoints += 1
            self._dists_to_waypoint = []

            if self._replan_after is None:
                self._current_waypoint += 1
            else:
                if self._replan_after <= 1:
                    self._replan_after = None
                    self._nav_plan = None
                    self._target_pos_quat = None
                    if self.nav_plan is None:
                        log.warning("Terminating due to failure to replan.")
                        return None
                else:
                    self._replan_after -= 1
                    self._current_waypoint -= 1

        elif (
            len(self._dists_to_waypoint)
            > self.config.policy_config.plan_fail_after_waypoint_steps
            and min(self._dists_to_waypoint) - cur_distance
            <= self.config.policy_config.plan_fail_max_dist_delta
        ):
            self.nav_planner.blacklist.append(self.robot_view.base.pose[:3, 3].copy())
            self.nav_planner.apply_black_list()
            if self._retries_left > 0:
                if self._replan_after is None:

                    def waypoints_until_different_location():
                        back = 0
                        while (
                            np.linalg.norm(
                                self.nav_plan[self._current_waypoint][:2]
                                - self.nav_plan[self._current_waypoint - back][:2]
                            )
                            < 0.25  # TODO make this a config param
                        ):
                            if self._current_waypoint - back > 0:
                                back += 1
                            else:
                                break

                        return back

                    self._retries_left -= 1
                    self._replan_after = waypoints_until_different_location()
                    self._dists_to_waypoint = []
                    self._current_waypoint = max(self._current_waypoint - 1, 0)
                    log.warning(
                        f"Replanning requested after returning to the previous {self._replan_after} waypoints"
                        f" due to failure to progress with distance {cur_distance:.3f}"
                        f" to waypoint with {self._retries_left} retries left."
                    )
                else:
                    log.warning(
                        f"Terminating due to failure to return to previous waypoint"
                        f" with {self._replan_after} missing return waypoints"
                    )
                    return None
            else:
                log.warning(
                    f"Terminating due to failure to progress with distance {cur_distance:.3f} to waypoint"
                    f" and no plan retries left."
                )
                return None

        else:
            self._dists_to_waypoint.append(cur_distance)

    if self._current_waypoint < len(self.nav_plan):
        return self.nav_plan[self._current_waypoint]

    return None

get_action ¶

get_action(observation)

Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py

def get_action(self, observation):
    if self.nav_plan is None:
        # No plan possible, finish task immediately
        log.warning(
            f"[A* DONE] Planning failed - terminating episode at step {self.task.num_steps_taken()}"
            f" with {self._reached_waypoints} reached waypoints."
            f" Reason: A* could not find a valid path (see earlier PLAN FAIL logs for details)"
        )
        return self._build_done_action()

    # get next waypoint in the planned trajectory
    waypoint = self.current_waypoint()

    if waypoint is None:
        # All waypoints reached - navigation complete
        log.info(
            f"[A* DONE] Navigation complete - reached {self._reached_waypoints} waypoints"
            f" in {self.task.num_steps_taken()} steps."
        )
        return self._build_done_action()

    # Still navigating - return action to reach next waypoint
    return self._build_navigation_action(waypoint)

get_all_phases ¶

get_all_phases() -> dict[str | int]

Returns:

Type	Description
`dict[str \| int]`	A dictionary of all possible policy phases

Source code in molmo_spaces/policy/base_policy.py

def get_all_phases(self) -> dict[str | int]:
    """
    Returns:
        A dictionary of all possible policy phases
    """
    return {"unknown": 0}

get_info ¶

get_info() -> dict

Get additional information from the policy. Called after episode ended. This method can be overridden by subclasses to provide extra information about the policy's state. Must be json serializable.

Returns:

Type	Description
`dict`	A dictionary containing additional information about the policy.

Source code in molmo_spaces/policy/base_policy.py

def get_info(self) -> dict:
    """
    Get additional information from the policy. Called after episode ended. This method can be
    overridden by subclasses to provide extra information about the policy's state.
    Must be json serializable.

    Returns:
        A dictionary containing additional information about the policy.
    """
    return {}

get_phase ¶

get_phase() -> str

Returns:

Type	Description
`str`	The policy phase

Source code in molmo_spaces/policy/base_policy.py

def get_phase(self) -> str:
    """
    Returns:
        The policy phase
    """
    return "unknown"

interpolate_waypoints ¶

interpolate_waypoints(waypoints: ndarray) -> ndarray

Interpolate waypoints between each pair of waypoints.

Parameters:

Name	Type	Description	Default
`waypoints`	`ndarray`	original waypoints array, shape (N, 2)	required

Returns:

Type	Description
`ndarray`	interpolated waypoints array

Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py

def interpolate_waypoints(self, waypoints: np.ndarray) -> np.ndarray:
    """
    Interpolate waypoints between each pair of waypoints.

    Args:
        waypoints: original waypoints array, shape (N, 2)

    Returns:
        interpolated waypoints array
    """
    density = self.config.policy_config.path_interpolation_density
    if density <= 0 or waypoints is None or len(waypoints) <= 1:
        return waypoints

    # Use np.linspace for faster vectorized interpolation
    segments = []
    for i in range(len(waypoints) - 1):
        # Create density+2 points from waypoints[i] to waypoints[i+1], excluding endpoint
        t = np.linspace(0, 1, density + 2, endpoint=False)[1:]  # exclude start point
        segment = waypoints[i] + t[:, np.newaxis] * (waypoints[i + 1] - waypoints[i])
        segments.append(segment)
    segments.append(waypoints[-1:])  # add final waypoint

    return np.vstack([waypoints[0:1]] + segments)

max_angle_waypoints ¶

max_angle_waypoints(angles: ndarray) -> ndarray

Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py

def max_angle_waypoints(self, angles: np.ndarray) -> np.ndarray:
    assert angles.shape == (2, 1)

    angle = float(abs(normalize_ang_error(angles[1] - angles[0])))
    num_points = int(np.ceil(angle / self.config.policy_config.path_max_inter_waypoint_angle))
    if num_points <= 1:
        # Enofrce always at least one orientation correction
        return angles[1:]

    steps = np.linspace(0, 1, num_points + 1)[1:]
    r0 = R.from_euler("z", angles[0], degrees=False)
    r1 = R.from_euler("z", angles[1], degrees=False)
    rots = Slerp([0, 1], R.concatenate([r0, r1]))(steps)
    new_angles = rots.as_euler("xyz", degrees=False)[:, 2:]

    return new_angles

max_dist_waypoints ¶

max_dist_waypoints(waypoints: ndarray) -> ndarray

Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py

def max_dist_waypoints(self, waypoints: np.ndarray) -> np.ndarray:
    assert waypoints.shape == (2, 2)

    direction = waypoints[-1] - waypoints[0]

    dist = np.linalg.norm(direction)
    num_points = int(np.ceil(dist / self.config.policy_config.path_max_inter_waypoint_dist))
    if num_points <= 1:
        return waypoints[1:]

    stops = np.linspace(0, 1, num_points + 1)[1:]
    return waypoints[:1] + direction[None, :] * stops[:, None]

planners ¶

planners()

Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py

def planners(self):
    return self.nav_planner

reset ¶

reset()

Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py

def reset(self):
    self._current_waypoint = 0
    self._reached_waypoints = 0
    self._nav_plan = None
    self._target_pos_quat = None
    self._dists_to_waypoint = []
    self._retries_left = self.config.policy_config.plan_max_retries
    self._target_object = None
    self._candidate_objs = None
    self._skipped_candidates = set()
    self._replan_after = None
    self.nav_planner.blacklist.clear()

skip_candidate ¶

skip_candidate(obj_name)

Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py

def skip_candidate(self, obj_name):
    self._skipped_candidates.add(obj_name)

stop_plan ¶

stop_plan(waypoints: ndarray) -> ndarray

Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py

def stop_plan(self, waypoints: np.ndarray) -> np.ndarray:
    r = self.config.policy_config.path_min_dist_to_target_center

    if r == 0.0:
        return waypoints

    cc = self.target_object.position[:2]

    # 1. find first waypoint entering circle and not leaving again
    last_out = None
    for i in reversed(range(len(waypoints))):
        if np.linalg.norm(waypoints[i] - cc) > r:
            last_out = i
            break
    else:
        # all waypoints were under r, so use the first two waypoints only
        return waypoints[:2]

    # if all waypoints are further than r, keep them all
    if last_out == len(waypoints) - 1:
        return waypoints

    # If not, intersect circumference around object center and last segment
    # (x-c)^T(x-c) = r^2
    # with x = s + alpha d
    # resulting in alpha^2 * (d^Td) + alpha * [2 d^T(s-c)] +[(s-c)^T(s-c) - r^2] = 0
    # for convenience, we make d a unitary direction

    segment = waypoints[last_out : last_out + 2]
    s = segment[0]

    d = segment[1] - s
    d /= np.linalg.norm(d)  # so a == 1 in the 2nd order equation

    sc = s - cc
    b = 2 * d @ sc
    c = np.linalg.norm(sc) ** 2 - r**2

    # Discriminant should always be positive, as the two points differ
    # in their inclusion in the circle with given radius
    # (we enforce it's at least non-negative)
    disc = max(b**2 - 4 * c, 0)

    # We keep the smallest (entering) solution (minus sign)
    # the relative displacement from the last waypoint outside along the
    # direction to the first one inside needs to be positive
    # (we enforce it's at least non-negative)
    alpha = max((-b - np.sqrt(disc)) / 2, 0)

    intersection = s + alpha * d
    return np.concatenate([waypoints[: last_out + 1], intersection[None, :]])

AStarSmoothPlannerPolicy ¶

AStarSmoothPlannerPolicy(config: MlSpacesExpConfig, task: BaseMujocoTask | None = None)

Bases: AStarPlannerPolicy

Methods:

Name	Description
`add_auxiliary_objects`	Add auxiliary objects to the scene that might be required for the policy.
`build_policy_plan`
`current_waypoint`
`get_action`
`get_all_phases`	Returns:
`get_info`	Get additional information from the policy. Called after episode ended. This method can be
`get_phase`	Returns:
`interpolate_waypoints`	Interpolate waypoints between each pair of waypoints.
`max_angle_waypoints`
`max_dist_waypoints`
`planners`
`reset`
`skip_candidate`
`stop_plan`

Attributes:

Name	Type	Description
`candidate_objs`	`list[MlSpacesObject]`
`config`
`nav_goal_sampler`	`NavGoalSampler`
`nav_plan`
`nav_planner`
`robot_view`
`target_object`	`MlSpacesObject`	Get the nearest navigation target object for the current batch.
`target_pos_quat`
`task`

Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py

def __init__(self, config: MlSpacesExpConfig, task: BaseMujocoTask | None = None) -> None:
    super().__init__(config, task)

    self._target_pos_quat = None
    self._nav_goal_sampler = None

    self.config.policy_config.planner_config.agent_radius = (
        self.config.task_sampler_config.robot_safety_radius
    )

    self.nav_planner = AStarPlanner(
        self.config.policy_config.planner_config, self.task.env.current_model_path
    )

    self._current_waypoint = 0
    self._reached_waypoints = 0
    self._nav_plan = None
    self._target_pos_quat = None
    self._dists_to_waypoint = []
    self._retries_left = self.config.policy_config.plan_max_retries
    self._target_object = None
    self._candidate_objs = None
    self._skipped_candidates = set()
    self._replan_after = None

    self.robot_view = task.env.current_robot.robot_view

candidate_objs property ¶

candidate_objs: list[MlSpacesObject]

config instance-attribute ¶

config = config

nav_goal_sampler property ¶

nav_goal_sampler: NavGoalSampler

nav_plan property ¶

nav_plan

nav_planner instance-attribute ¶

nav_planner = AStarPlanner(planner_config, current_model_path)

robot_view instance-attribute ¶

robot_view = robot_view

target_object property ¶

target_object: MlSpacesObject

Get the nearest navigation target object for the current batch.

target_pos_quat property ¶

target_pos_quat

task instance-attribute ¶

task = task

add_auxiliary_objects staticmethod ¶

add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py

@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None

build_policy_plan ¶

build_policy_plan(world_waypoints)

Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py

def build_policy_plan(self, world_waypoints):
    world_waypoints = self.stop_plan(world_waypoints)

    plan_length = sum(
        np.linalg.norm(world_waypoints[it] - world_waypoints[it - 1])
        for it in range(1, len(world_waypoints))
    )
    num_points = 2 * int(
        np.ceil(plan_length / self.config.policy_config.path_max_inter_waypoint_dist)
    )

    tck, u = splprep(world_waypoints.transpose(), s=1e-5)
    u_new = np.linspace(0, 1, num_points)
    x_new, y_new = splev(u_new, tck)

    # First derivatives
    dx_du, dy_du = splev(u_new, tck, der=1)
    # Tangent angle (radians)
    thetas = np.arctan2(dy_du, dx_du)
    # TODO handle large theta deltas

    combined_waypoints = []

    # First, we orient toward the 1st waypoint from the 0-th waypoint
    start_theta = self.robot_view.get_noop_ctrl_dict(["base"])["base"][2]
    for theta in self.max_angle_waypoints(np.stack([start_theta, thetas[0]])[:, None]):
        combined_waypoints.append(np.concatenate((world_waypoints[0], theta)))

    for cur_x, cur_y, cur_theta in zip(x_new, y_new, thetas):
        combined_waypoints.append(np.stack((cur_x, cur_y, cur_theta)))

    # Then rotate to face the target
    final_pos = world_waypoints[-1]
    target_pos = self.target_object.position[:2]
    final_theta = np.arctan2(target_pos[1] - final_pos[1], target_pos[0] - final_pos[0])
    for theta in self.max_angle_waypoints(np.stack([thetas[-1], final_theta])[:, None]):
        combined_waypoints.append(np.concatenate((final_pos, theta)))

    return np.array(combined_waypoints)

current_waypoint ¶

current_waypoint()

Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py

def current_waypoint(self):
    if self._current_waypoint < len(self.nav_plan):
        cur_distance = self.robot_view.distance_to(
            ["base"], self.nav_plan[self._current_waypoint]
        )
        pose = self.robot_view.base.pose
        angle = R.from_matrix(pose[:3, :3]).as_euler("xyz")[2:3]
        delta = pose[:2, 3]
        pose = np.concatenate((delta, angle))

        log.debug(
            f"Steps {self.task.num_steps_taken()}"
            f" Retries left {self._retries_left}"
            f" Waypoint {self._reached_waypoints}/{len(self.nav_plan) + self._reached_waypoints - self._current_waypoint}"
            f" {np.round(self.nav_plan[self._current_waypoint], 3)}"
            f" Pose {np.round(pose, 3)}"
            f" Dist {cur_distance:3f}"
        )

        if self.robot_view.is_close_to(["base"], self.nav_plan[self._current_waypoint]):
            self._reached_waypoints += 1
            self._dists_to_waypoint = []

            if self._replan_after is None:
                self._current_waypoint += 1
            else:
                if self._replan_after <= 1:
                    self._replan_after = None
                    self._nav_plan = None
                    self._target_pos_quat = None
                    if self.nav_plan is None:
                        log.warning("Terminating due to failure to replan.")
                        return None
                else:
                    self._replan_after -= 1
                    self._current_waypoint -= 1

        elif (
            len(self._dists_to_waypoint)
            > self.config.policy_config.plan_fail_after_waypoint_steps
            and min(self._dists_to_waypoint) - cur_distance
            <= self.config.policy_config.plan_fail_max_dist_delta
        ):
            self.nav_planner.blacklist.append(self.robot_view.base.pose[:3, 3].copy())
            self.nav_planner.apply_black_list()
            if self._retries_left > 0:
                if self._replan_after is None:

                    def waypoints_until_different_location():
                        back = 0
                        while (
                            np.linalg.norm(
                                self.nav_plan[self._current_waypoint][:2]
                                - self.nav_plan[self._current_waypoint - back][:2]
                            )
                            < 0.25  # TODO make this a config param
                        ):
                            if self._current_waypoint - back > 0:
                                back += 1
                            else:
                                break

                        return back

                    self._retries_left -= 1
                    self._replan_after = waypoints_until_different_location()
                    self._dists_to_waypoint = []
                    self._current_waypoint = max(self._current_waypoint - 1, 0)
                    log.warning(
                        f"Replanning requested after returning to the previous {self._replan_after} waypoints"
                        f" due to failure to progress with distance {cur_distance:.3f}"
                        f" to waypoint with {self._retries_left} retries left."
                    )
                else:
                    log.warning(
                        f"Terminating due to failure to return to previous waypoint"
                        f" with {self._replan_after} missing return waypoints"
                    )
                    return None
            else:
                log.warning(
                    f"Terminating due to failure to progress with distance {cur_distance:.3f} to waypoint"
                    f" and no plan retries left."
                )
                return None

        else:
            self._dists_to_waypoint.append(cur_distance)

    if self._current_waypoint < len(self.nav_plan):
        return self.nav_plan[self._current_waypoint]

    return None

get_action ¶

get_action(observation)

Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py

def get_action(self, observation):
    if self.nav_plan is None:
        # No plan possible, finish task immediately
        log.warning(
            f"[A* DONE] Planning failed - terminating episode at step {self.task.num_steps_taken()}"
            f" with {self._reached_waypoints} reached waypoints."
            f" Reason: A* could not find a valid path (see earlier PLAN FAIL logs for details)"
        )
        return self._build_done_action()

    # get next waypoint in the planned trajectory
    waypoint = self.current_waypoint()

    if waypoint is None:
        # All waypoints reached - navigation complete
        log.info(
            f"[A* DONE] Navigation complete - reached {self._reached_waypoints} waypoints"
            f" in {self.task.num_steps_taken()} steps."
        )
        return self._build_done_action()

    # Still navigating - return action to reach next waypoint
    return self._build_navigation_action(waypoint)

get_all_phases ¶

get_all_phases() -> dict[str | int]

Returns:

Type	Description
`dict[str \| int]`	A dictionary of all possible policy phases

Source code in molmo_spaces/policy/base_policy.py

def get_all_phases(self) -> dict[str | int]:
    """
    Returns:
        A dictionary of all possible policy phases
    """
    return {"unknown": 0}

get_info ¶

get_info() -> dict

Get additional information from the policy. Called after episode ended. This method can be overridden by subclasses to provide extra information about the policy's state. Must be json serializable.

Returns:

Type	Description
`dict`	A dictionary containing additional information about the policy.

Source code in molmo_spaces/policy/base_policy.py

def get_info(self) -> dict:
    """
    Get additional information from the policy. Called after episode ended. This method can be
    overridden by subclasses to provide extra information about the policy's state.
    Must be json serializable.

    Returns:
        A dictionary containing additional information about the policy.
    """
    return {}

get_phase ¶

get_phase() -> str

Returns:

Type	Description
`str`	The policy phase

Source code in molmo_spaces/policy/base_policy.py

def get_phase(self) -> str:
    """
    Returns:
        The policy phase
    """
    return "unknown"

interpolate_waypoints ¶

interpolate_waypoints(waypoints: ndarray) -> ndarray

Interpolate waypoints between each pair of waypoints.

Parameters:

Name	Type	Description	Default
`waypoints`	`ndarray`	original waypoints array, shape (N, 2)	required

Returns:

Type	Description
`ndarray`	interpolated waypoints array

Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py

def interpolate_waypoints(self, waypoints: np.ndarray) -> np.ndarray:
    """
    Interpolate waypoints between each pair of waypoints.

    Args:
        waypoints: original waypoints array, shape (N, 2)

    Returns:
        interpolated waypoints array
    """
    density = self.config.policy_config.path_interpolation_density
    if density <= 0 or waypoints is None or len(waypoints) <= 1:
        return waypoints

    # Use np.linspace for faster vectorized interpolation
    segments = []
    for i in range(len(waypoints) - 1):
        # Create density+2 points from waypoints[i] to waypoints[i+1], excluding endpoint
        t = np.linspace(0, 1, density + 2, endpoint=False)[1:]  # exclude start point
        segment = waypoints[i] + t[:, np.newaxis] * (waypoints[i + 1] - waypoints[i])
        segments.append(segment)
    segments.append(waypoints[-1:])  # add final waypoint

    return np.vstack([waypoints[0:1]] + segments)

max_angle_waypoints ¶

max_angle_waypoints(angles: ndarray) -> ndarray

Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py

def max_angle_waypoints(self, angles: np.ndarray) -> np.ndarray:
    assert angles.shape == (2, 1)

    angle = float(abs(normalize_ang_error(angles[1] - angles[0])))
    num_points = int(np.ceil(angle / self.config.policy_config.path_max_inter_waypoint_angle))
    if num_points <= 1:
        # Enofrce always at least one orientation correction
        return angles[1:]

    steps = np.linspace(0, 1, num_points + 1)[1:]
    r0 = R.from_euler("z", angles[0], degrees=False)
    r1 = R.from_euler("z", angles[1], degrees=False)
    rots = Slerp([0, 1], R.concatenate([r0, r1]))(steps)
    new_angles = rots.as_euler("xyz", degrees=False)[:, 2:]

    return new_angles

max_dist_waypoints ¶

max_dist_waypoints(waypoints: ndarray) -> ndarray

Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py

def max_dist_waypoints(self, waypoints: np.ndarray) -> np.ndarray:
    assert waypoints.shape == (2, 2)

    direction = waypoints[-1] - waypoints[0]

    dist = np.linalg.norm(direction)
    num_points = int(np.ceil(dist / self.config.policy_config.path_max_inter_waypoint_dist))
    if num_points <= 1:
        return waypoints[1:]

    stops = np.linspace(0, 1, num_points + 1)[1:]
    return waypoints[:1] + direction[None, :] * stops[:, None]

planners ¶

planners()

Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py

def planners(self):
    return self.nav_planner

reset ¶

reset()

Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py

def reset(self):
    self._current_waypoint = 0
    self._reached_waypoints = 0
    self._nav_plan = None
    self._target_pos_quat = None
    self._dists_to_waypoint = []
    self._retries_left = self.config.policy_config.plan_max_retries
    self._target_object = None
    self._candidate_objs = None
    self._skipped_candidates = set()
    self._replan_after = None
    self.nav_planner.blacklist.clear()

skip_candidate ¶

skip_candidate(obj_name)

Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py

def skip_candidate(self, obj_name):
    self._skipped_candidates.add(obj_name)

stop_plan ¶

stop_plan(waypoints: ndarray) -> ndarray

Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py

def stop_plan(self, waypoints: np.ndarray) -> np.ndarray:
    r = self.config.policy_config.path_min_dist_to_target_center

    if r == 0.0:
        return waypoints

    cc = self.target_object.position[:2]

    # 1. find first waypoint entering circle and not leaving again
    last_out = None
    for i in reversed(range(len(waypoints))):
        if np.linalg.norm(waypoints[i] - cc) > r:
            last_out = i
            break
    else:
        # all waypoints were under r, so use the first two waypoints only
        return waypoints[:2]

    # if all waypoints are further than r, keep them all
    if last_out == len(waypoints) - 1:
        return waypoints

    # If not, intersect circumference around object center and last segment
    # (x-c)^T(x-c) = r^2
    # with x = s + alpha d
    # resulting in alpha^2 * (d^Td) + alpha * [2 d^T(s-c)] +[(s-c)^T(s-c) - r^2] = 0
    # for convenience, we make d a unitary direction

    segment = waypoints[last_out : last_out + 2]
    s = segment[0]

    d = segment[1] - s
    d /= np.linalg.norm(d)  # so a == 1 in the 2nd order equation

    sc = s - cc
    b = 2 * d @ sc
    c = np.linalg.norm(sc) ** 2 - r**2

    # Discriminant should always be positive, as the two points differ
    # in their inclusion in the circle with given radius
    # (we enforce it's at least non-negative)
    disc = max(b**2 - 4 * c, 0)

    # We keep the smallest (entering) solution (minus sign)
    # the relative displacement from the last waypoint outside along the
    # direction to the first one inside needs to be positive
    # (we enforce it's at least non-negative)
    alpha = max((-b - np.sqrt(disc)) / 2, 0)

    intersection = s + alpha * d
    return np.concatenate([waypoints[: last_out + 1], intersection[None, :]])

object_manipulation ¶

Modules:

Name	Description
`base_object_manipulation_planner_policy`
`curobo_open_close_planner_policy`
`curobo_pick_and_place_planner_policy`
`open_close_planner_policy`
`pick_and_place_color_planner_policy`
`pick_and_place_next_to_planner_policy`
`pick_and_place_planner_policy`
`pick_planner_policy`

base_object_manipulation_planner_policy ¶

Classes:

Name	Description
`ActionPrimitive`
`BaseObjectManipulationPlannerPolicy`
`GripperAction`
`JointMoveSegment`
`JointMoveSequence`
`MoveSegment`
`MoveSequence`
`NoopAction`
`TCPMoveSegment`
`TCPMoveSequence`

Attributes:

Name	Type	Description
`log`

log `module-attribute` ¶

log = getLogger(__name__)

ActionPrimitive ¶

ActionPrimitive(robot_view: RobotView, duration: float)

Bases: ABC

Methods:

Name	Description
`check_failure`
`elapsed_time`
`execute`
`get_current_action`
`get_current_phase`
`reset`

Attributes:

Name	Type	Description
`duration`
`robot_view`
`start_time`

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def __init__(self, robot_view: RobotView, duration: float):
    self.robot_view = robot_view
    self.duration = duration
    self.start_time = None

duration instance-attribute ¶

duration = duration

robot_view instance-attribute ¶

robot_view = robot_view

start_time instance-attribute ¶

start_time = None

check_failure ¶

check_failure() -> bool

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def check_failure(self) -> bool:
    return False

elapsed_time ¶

elapsed_time() -> float

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def elapsed_time(self) -> float:
    return self.robot_view.mj_data.time - self.start_time

execute abstractmethod ¶

execute() -> bool

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

@abstractmethod
def execute(self) -> bool:
    pass

get_current_action abstractmethod ¶

get_current_action() -> dict[str, Any]

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

@abstractmethod
def get_current_action(self) -> dict[str, Any]:
    raise NotImplementedError

get_current_phase ¶

get_current_phase() -> str

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def get_current_phase(self) -> str:
    return "unknown"

reset ¶

reset()

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def reset(self):
    self.start_time = None

BaseObjectManipulationPlannerPolicy ¶

BaseObjectManipulationPlannerPolicy(config: MlSpacesExpConfig, task: BaseMujocoTask)

Bases: PlannerPolicy

Methods:

Name	Description
`add_auxiliary_objects`
`check_feasible_ik`
`get_action`
`get_all_phases`
`get_info`	Get additional information from the policy. Called after episode ended. This method can be
`get_phase`
`reset`

Attributes:

Name	Type	Description
`action_idx`
`action_primitives`
`config`
`ik_warmed_up`
`planners`
`policy_config`
`retry_count`
`robot_view`
`sequential_ik_failures`
`target_poses`
`task`

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def __init__(self, config: MlSpacesExpConfig, task: BaseMujocoTask) -> None:
    assert isinstance(config.policy_config, ObjectManipulationPlannerPolicyConfig)
    super().__init__(config, task)
    self.policy_config = config.policy_config
    self.robot_view = task.env.current_robot.robot_view

    self.action_primitives = []
    self.action_idx = 0
    self.retry_count = 0
    self.target_poses = {}
    self.ik_warmed_up = False
    self.sequential_ik_failures = 0

action_idx instance-attribute ¶

action_idx = 0

action_primitives instance-attribute ¶

action_primitives = []

config instance-attribute ¶

config = config

ik_warmed_up instance-attribute ¶

ik_warmed_up = False

planners property ¶

planners

policy_config instance-attribute ¶

policy_config = policy_config

retry_count instance-attribute ¶

retry_count = 0

robot_view instance-attribute ¶

robot_view = robot_view

sequential_ik_failures instance-attribute ¶

sequential_ik_failures = 0

target_poses instance-attribute ¶

target_poses = {}

task instance-attribute ¶

task = task

add_auxiliary_objects staticmethod ¶

add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

@staticmethod
def add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None:
    PlannerPolicy.add_auxiliary_objects(config, spec)
    policy_config = config.policy_config
    assert isinstance(policy_config, ObjectManipulationPlannerPolicyConfig)
    if policy_config.filter_colliding_grasps:
        add_grasp_collision_bodies(
            spec,
            policy_config.grasp_collision_batch_size,
            policy_config.grasp_width,
            policy_config.grasp_length,
            policy_config.grasp_height,
            np.array(policy_config.grasp_base_pos),
        )

check_feasible_ik ¶

check_feasible_ik(pose: ndarray) -> bool

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def check_feasible_ik(self, pose: np.ndarray) -> bool:
    if pose.ndim > 2:
        assert pose.shape[1:] == (4, 4)
        batch_size = pose.shape[0]
        assert batch_size <= self.policy_config.grasp_feasibility_batch_size

        if batch_size == 0:
            return np.array([], dtype=bool)

        # pad to the batch size to avoid triggering recompilation of the IK solver
        if batch_size < self.policy_config.grasp_feasibility_batch_size:
            n_pad = self.policy_config.grasp_feasibility_batch_size - batch_size
            pose = np.concatenate([pose, np.broadcast_to(pose[-1:], (n_pad, 4, 4))])

        robot_view = self.task.env.current_robot.robot_view
        parallel_kinematics = self.task._env.robots[0].parallel_kinematics
        jp_dicts = parallel_kinematics.ik(
            pose,
            robot_view.get_qpos_dict(),
            robot_view.base.pose,
            rel_to_base=False,
            posture_weight=0.0,
        )
        return np.array([jp_dict is not None for jp_dict in jp_dicts[:batch_size]])
    else:
        assert pose.shape == (4, 4)
        robot_view = self.task.env.current_robot.robot_view
        kinematics = self.task.env.current_robot.kinematics
        gripper_mg_id = robot_view.get_gripper_movegroup_ids()[0]
        jp_dict = kinematics.ik(
            gripper_mg_id,
            pose,
            robot_view.move_group_ids(),
            robot_view.get_qpos_dict(),
            base_pose=robot_view.base.pose,
        )
        return jp_dict is not None

get_action ¶

get_action(info: dict[str, Any]) -> dict[str, Any]

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def get_action(self, info: dict[str, Any]) -> dict[str, Any]:
    if self._check_for_failures():
        return self._handle_failure()

    # get the action from the current action primitive, absorbing 0-length segments as necessary
    while self.action_idx < len(self.action_primitives):  # guaranteed to terminate
        act_prim = self.action_primitives[self.action_idx]
        proceed = act_prim.execute()
        # cannot have a 0-length segment that doesn't proceed
        assert proceed or act_prim.duration > 0.0
        if proceed:
            self.action_idx += 1
        if not proceed or act_prim.duration > 0.0:
            break
    if self.action_idx < len(self.action_primitives):
        action = act_prim.get_current_action()
    else:
        action = self.action_primitives[-1].get_current_action()
        action["done"] = True
    return action

get_all_phases ¶

get_all_phases()

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def get_all_phases(self):
    phases = super().get_all_phases()
    # Collect all possible phases here, even those not used for a particular trajectory computation
    new_phases = {
        "gripper-open": 1,
        "pregrasp": 2,
        "grasp": 3,
        "gripper-close": 4,
        "lift": 5,
        "preplace": 6,
        "place": 7,
        "retreat": 8,
        "go_home": 9,
    }
    phases.update(new_phases)
    return phases

get_info ¶

get_info() -> dict

Get additional information from the policy. Called after episode ended. This method can be overridden by subclasses to provide extra information about the policy's state. Must be json serializable.

Returns:

Type	Description
`dict`	A dictionary containing additional information about the policy.

Source code in molmo_spaces/policy/base_policy.py

def get_info(self) -> dict:
    """
    Get additional information from the policy. Called after episode ended. This method can be
    overridden by subclasses to provide extra information about the policy's state.
    Must be json serializable.

    Returns:
        A dictionary containing additional information about the policy.
    """
    return {}

get_phase ¶

get_phase() -> str

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def get_phase(self) -> str:
    if self.action_idx < len(self.action_primitives):
        act_prim = self.action_primitives[self.action_idx]
    else:
        act_prim = self.action_primitives[-1]
    return act_prim.get_current_phase()

reset ¶

reset(reset_retries: bool = True)

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def reset(self, reset_retries: bool = True):
    if not self.ik_warmed_up:
        with Timer() as warmup_time:
            self.task.env.current_robot.parallel_kinematics.warmup_ik(
                self.policy_config.grasp_feasibility_batch_size
            )
        self.ik_warmed_up = True
        log.info(f"Warmed up parallel IK solver in {warmup_time.value:.3f}s")

    self.action_primitives = self._compute_trajectory()

    self.action_idx = 0
    if reset_retries:
        self.retry_count = 0

    self.target_poses = {}
    for action_primitive in self.action_primitives:
        if isinstance(action_primitive, TCPMoveSequence):
            for move_segment in action_primitive.move_segments:
                self.target_poses[move_segment.name] = move_segment.end_pose
        elif isinstance(action_primitive, JointMoveSequence):
            gripper_mg_id = self.robot_view.get_gripper_movegroup_ids()[0]
            kinematics = self.task.env.current_robot.kinematics
            for move_segment in action_primitive.move_segments:
                end_qpos = move_segment.end_qpos
                base_pose = self.robot_view.base.pose
                self.target_poses[move_segment.name] = kinematics.fk(end_qpos, base_pose)[
                    gripper_mg_id
                ]

GripperAction ¶

GripperAction(robot_view: RobotView, target_open: bool, duration: float)

Bases: ActionPrimitive

Methods:

Name	Description
`check_failure`
`elapsed_time`
`execute`
`get_current_action`
`get_current_phase`
`reset`

Attributes:

Name	Type	Description
`duration`
`robot_view`
`start_time`
`target_open`

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def __init__(self, robot_view: RobotView, target_open: bool, duration: float) -> None:
    super().__init__(robot_view, duration)
    self.target_open = target_open

duration instance-attribute ¶

duration = duration

robot_view instance-attribute ¶

robot_view = robot_view

start_time instance-attribute ¶

start_time = None

target_open instance-attribute ¶

target_open = target_open

check_failure ¶

check_failure() -> bool

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def check_failure(self) -> bool:
    return False

elapsed_time ¶

elapsed_time() -> float

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def elapsed_time(self) -> float:
    return self.robot_view.mj_data.time - self.start_time

execute ¶

execute() -> bool

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def execute(self) -> bool:
    if self.start_time is None:
        self.start_time = self.robot_view.mj_data.time
        if self.target_open:
            log.info("Opening gripper...")
        else:
            log.info("Closing gripper...")

        mg_id = self.robot_view.get_gripper_movegroup_ids()[0]
        gripper = self.robot_view.get_gripper(mg_id)
        gripper.set_gripper_ctrl_open(self.target_open)

    return self.elapsed_time() >= self.duration

get_current_action ¶

get_current_action() -> dict[str, Any]

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def get_current_action(self) -> dict[str, Any]:
    return self.robot_view.get_ctrl_dict()

get_current_phase ¶

get_current_phase()

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def get_current_phase(self):
    if self.target_open:
        return "gripper-open"
    else:
        return "gripper-close"

reset ¶

reset()

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def reset(self):
    self.start_time = None

JointMoveSegment `dataclass` ¶

JointMoveSegment(name: str, start_qpos: dict[str, ndarray | list[float]] | None, end_qpos: dict[str, ndarray | list[float]], duration_s: float)

Bases: MoveSegment

Attributes:

Name	Type	Description
`duration`	`float`
`duration_s`	`float`
`end_qpos`	`dict[str, ndarray \| list[float]]`
`name`	`str`
`start_qpos`	`dict[str, ndarray \| list[float]] \| None`

duration property ¶

duration: float

duration_s instance-attribute ¶

duration_s: float

end_qpos instance-attribute ¶

end_qpos: dict[str, ndarray | list[float]]

name instance-attribute ¶

name: str

start_qpos instance-attribute ¶

start_qpos: dict[str, ndarray | list[float]] | None

JointMoveSequence ¶

JointMoveSequence(robot_view: RobotView, settle_time: float, move_segments: list[JointMoveSegment], is_holding_object: bool = False, gripper_empty_threshold: float = 0.0)

Bases: MoveSequence

Methods:

Name	Description
`check_failure`
`elapsed_time`
`execute`
`get_current_action`
`get_current_phase`
`reset`

Attributes:

Name	Type	Description
`duration`
`gripper_empty_threshold`
`is_holding_object`
`move_seg_idx`
`move_seg_start_time`
`move_segments`	`list[JointMoveSegment]`
`robot_view`
`settle_time`
`start_time`

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def __init__(
    self,
    robot_view: RobotView,
    settle_time: float,
    move_segments: list[JointMoveSegment],
    is_holding_object: bool = False,
    gripper_empty_threshold: float = 0.0,
) -> None:
    super().__init__(
        robot_view,
        settle_time,
        move_segments,
        is_holding_object,
        gripper_empty_threshold,
    )

duration instance-attribute ¶

duration = duration

gripper_empty_threshold instance-attribute ¶

gripper_empty_threshold = gripper_empty_threshold

is_holding_object instance-attribute ¶

is_holding_object = is_holding_object

move_seg_idx instance-attribute ¶

move_seg_idx = None

move_seg_start_time instance-attribute ¶

move_seg_start_time = None

move_segments property ¶

move_segments: list[JointMoveSegment]

robot_view instance-attribute ¶

robot_view = robot_view

settle_time instance-attribute ¶

settle_time = settle_time

start_time instance-attribute ¶

start_time = None

check_failure ¶

check_failure() -> bool

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def check_failure(self) -> bool:
    if super().check_failure():
        return True

    if self.is_holding_object:
        gripper_mg_id = self.robot_view.get_gripper_movegroup_ids()[0]
        gripper = self.robot_view.get_gripper(gripper_mg_id)
        if (
            gripper.inter_finger_dist
            < gripper.inter_finger_dist_range[0] + self.gripper_empty_threshold
        ):
            log.info(
                f"Object is not in grasp! {gripper.inter_finger_dist:.05f} < {gripper.inter_finger_dist_range[0] + self.gripper_empty_threshold:05f}"
            )
            return True

    return False

elapsed_time ¶

elapsed_time() -> float

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def elapsed_time(self) -> float:
    return self.robot_view.mj_data.time - self.start_time

execute ¶

execute() -> bool

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def execute(self) -> bool:
    ret = super().execute()

    assert self.move_seg_idx is not None
    if self.move_seg_idx < len(self.move_segments):
        move_seg = self.move_segments[self.move_seg_idx]
        if move_seg.start_qpos is None:
            move_seg.start_qpos = self.robot_view.get_qpos_dict()

    return ret

get_current_action ¶

get_current_action() -> dict[str, Any]

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def get_current_action(self) -> dict[str, Any]:
    elapsed_time = self.robot_view.mj_data.time - self.move_seg_start_time
    assert self.move_seg_idx is not None
    if self.move_seg_idx < len(self.move_segments):
        move_seg = self.move_segments[self.move_seg_idx]
        t = min(1.0, elapsed_time / move_seg.duration)

        curr_target_qpos = {**move_seg.start_qpos}
        for k in move_seg.end_qpos:
            q0 = np.asarray(move_seg.start_qpos[k])
            q1 = np.asarray(move_seg.end_qpos[k])
            curr_target_qpos[k] = q0 + (q1 - q0) * t
    else:
        move_seg = self.move_segments[-1]
        curr_target_qpos = {**move_seg.end_qpos}

    for mg_id in self.robot_view.get_gripper_movegroup_ids():
        if mg_id in curr_target_qpos:
            del curr_target_qpos[mg_id]
    return curr_target_qpos

get_current_phase ¶

get_current_phase() -> str

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def get_current_phase(self) -> str:
    if self.move_seg_idx is None:
        move_seg = self.move_segments[0]
    elif self.move_seg_idx < len(self.move_segments):
        move_seg = self.move_segments[self.move_seg_idx]
    else:
        move_seg = self.move_segments[-1]
    return move_seg.name

reset ¶

reset()

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def reset(self):
    super().reset()
    self.move_seg_idx = None
    self.move_seg_start_time = None

MoveSegment `dataclass` ¶

MoveSegment(name: str)

Attributes:

Name	Type	Description
`duration`	`float`
`name`	`str`

duration property ¶

duration: float

name instance-attribute ¶

name: str

MoveSequence ¶

MoveSequence(robot_view: RobotView, settle_time: float, move_segments: list[MoveSegment], is_holding_object: bool = False, gripper_empty_threshold: float = 0.0)

Bases: ActionPrimitive

Methods:

Name	Description
`check_failure`
`elapsed_time`
`execute`
`get_current_action`
`get_current_phase`
`reset`

Attributes:

Name	Type	Description
`duration`
`gripper_empty_threshold`
`is_holding_object`
`move_seg_idx`
`move_seg_start_time`
`robot_view`
`settle_time`
`start_time`

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def __init__(
    self,
    robot_view: RobotView,
    settle_time: float,
    move_segments: list[MoveSegment],
    is_holding_object: bool = False,
    gripper_empty_threshold: float = 0.0,
) -> None:
    super().__init__(robot_view, sum(seg.duration for seg in move_segments))
    self._move_segments = move_segments
    self.move_seg_idx = None
    self.move_seg_start_time = None
    self.settle_time = settle_time
    self.is_holding_object = is_holding_object
    self.gripper_empty_threshold = gripper_empty_threshold

duration instance-attribute ¶

duration = duration

gripper_empty_threshold instance-attribute ¶

gripper_empty_threshold = gripper_empty_threshold

is_holding_object instance-attribute ¶

is_holding_object = is_holding_object

move_seg_idx instance-attribute ¶

move_seg_idx = None

move_seg_start_time instance-attribute ¶

move_seg_start_time = None

robot_view instance-attribute ¶

robot_view = robot_view

settle_time instance-attribute ¶

settle_time = settle_time

start_time instance-attribute ¶

start_time = None

check_failure ¶

check_failure() -> bool

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def check_failure(self) -> bool:
    if super().check_failure():
        return True

    if self.is_holding_object:
        gripper_mg_id = self.robot_view.get_gripper_movegroup_ids()[0]
        gripper = self.robot_view.get_gripper(gripper_mg_id)
        if (
            gripper.inter_finger_dist
            < gripper.inter_finger_dist_range[0] + self.gripper_empty_threshold
        ):
            log.info(
                f"Object is not in grasp! {gripper.inter_finger_dist:.05f} < {gripper.inter_finger_dist_range[0] + self.gripper_empty_threshold:05f}"
            )
            return True

    return False

elapsed_time ¶

elapsed_time() -> float

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def elapsed_time(self) -> float:
    return self.robot_view.mj_data.time - self.start_time

execute ¶

execute() -> bool

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def execute(self) -> bool:
    if self.start_time is None:
        self.start_time = self.robot_view.mj_data.time
        self.move_seg_idx = None
        self.move_seg_start_time = self.start_time

    duration_cumsum = np.cumsum([0] + [seg.duration for seg in self._move_segments])
    idx = np.searchsorted(duration_cumsum, self.elapsed_time(), side="right").item() - 1
    if idx != self.move_seg_idx:
        if idx < len(self._move_segments):
            log.info(f"Moving to {self._move_segments[idx].name}")
        self.move_seg_idx = idx
        self.move_seg_start_time = self.robot_view.mj_data.time

    return self.elapsed_time() >= self.duration + self.settle_time

get_current_action ¶

get_current_action() -> dict[str, Any]

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def get_current_action(self) -> dict[str, Any]:
    elapsed_time = self.robot_view.mj_data.time - self.move_seg_start_time
    if self.move_seg_idx < len(self.move_segments):
        move_seg = self.move_segments[self.move_seg_idx]
        t = min(1.0, elapsed_time / move_seg.duration)

        lin_vel, ang_vel = transform_to_twist(
            np.linalg.inv(move_seg.start_pose) @ move_seg.end_pose
        )
        curr_target_pose = move_seg.start_pose @ twist_to_transform(lin_vel * t, ang_vel * t)
    else:
        move_seg = self.move_segments[-1]
        curr_target_pose = move_seg.end_pose

    # Solve IK for current target pose
    mg_id = self.robot_view.get_gripper_movegroup_ids()[0]
    return self.tcp_to_jp_fn(mg_id, curr_target_pose)

get_current_phase ¶

get_current_phase() -> str

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def get_current_phase(self) -> str:
    if self.move_seg_idx is None:
        move_seg = self.move_segments[0]
    elif self.move_seg_idx < len(self.move_segments):
        move_seg = self.move_segments[self.move_seg_idx]
    else:
        move_seg = self.move_segments[-1]
    return move_seg.name

reset ¶

reset()

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def reset(self):
    super().reset()
    self.move_seg_idx = None
    self.move_seg_start_time = None

NoopAction ¶

NoopAction(robot_view: RobotView, duration: float)

Bases: ActionPrimitive

Methods:

Name	Description
`check_failure`
`elapsed_time`
`execute`
`get_current_action`
`get_current_phase`
`reset`

Attributes:

Name	Type	Description
`duration`
`robot_view`
`start_time`

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def __init__(self, robot_view: RobotView, duration: float) -> None:
    super().__init__(robot_view, duration)
    self._action = None

duration instance-attribute ¶

duration = duration

robot_view instance-attribute ¶

robot_view = robot_view

start_time instance-attribute ¶

start_time = None

check_failure ¶

check_failure() -> bool

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def check_failure(self) -> bool:
    return False

elapsed_time ¶

elapsed_time() -> float

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def elapsed_time(self) -> float:
    return self.robot_view.mj_data.time - self.start_time

execute ¶

execute() -> bool

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def execute(self) -> bool:
    if self.start_time is None:
        self.start_time = self.robot_view.mj_data.time
    return self.elapsed_time() >= self.duration

get_current_action ¶

get_current_action() -> dict[str, Any]

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def get_current_action(self) -> dict[str, Any]:
    if self._action is None:
        self._action = self.robot_view.get_noop_ctrl_dict()
    return self._action

get_current_phase ¶

get_current_phase() -> str

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def get_current_phase(self) -> str:
    return "unknown"

reset ¶

reset()

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def reset(self):
    self.start_time = None

TCPMoveSegment `dataclass` ¶

TCPMoveSegment(name: str, start_pose: ndarray, end_pose: ndarray, speed: float)

Bases: MoveSegment

Attributes:

Name	Type	Description
`duration`	`float`
`end_pose`	`ndarray`
`name`	`str`
`speed`	`float`
`start_pose`	`ndarray`

duration property ¶

duration: float

end_pose instance-attribute ¶

end_pose: ndarray

name instance-attribute ¶

name: str

speed instance-attribute ¶

speed: float

start_pose instance-attribute ¶

start_pose: ndarray

TCPMoveSequence ¶

TCPMoveSequence(robot_view: RobotView, tcp_to_jp_fn: Callable[[str, ndarray], dict[str, Any]], settle_time: float, move_segments: list[TCPMoveSegment], is_holding_object: bool = False, gripper_empty_threshold: float = 0.0, tcp_pos_err_threshold: float = inf, tcp_rot_err_threshold: float = inf)

Bases: MoveSequence

Methods:

Name	Description
`check_failure`
`elapsed_time`
`execute`
`get_current_action`
`get_current_phase`
`get_current_target_pose`
`reset`

Attributes:

Name	Type	Description
`duration`
`gripper_empty_threshold`
`is_holding_object`
`move_seg_idx`
`move_seg_start_time`
`move_segments`	`list[TCPMoveSegment]`
`robot_view`
`settle_time`
`start_time`
`tcp_pos_err_threshold`
`tcp_rot_err_threshold`
`tcp_to_jp_fn`

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def __init__(
    self,
    robot_view: RobotView,
    tcp_to_jp_fn: Callable[[str, np.ndarray], dict[str, Any]],
    settle_time: float,
    move_segments: list[TCPMoveSegment],
    is_holding_object: bool = False,
    gripper_empty_threshold: float = 0.0,
    tcp_pos_err_threshold: float = np.inf,
    tcp_rot_err_threshold: float = np.inf,
) -> None:
    super().__init__(
        robot_view,
        settle_time,
        move_segments,
        is_holding_object,
        gripper_empty_threshold,
    )
    self.tcp_to_jp_fn = tcp_to_jp_fn
    self.tcp_pos_err_threshold = tcp_pos_err_threshold
    self.tcp_rot_err_threshold = tcp_rot_err_threshold

duration instance-attribute ¶

duration = duration

gripper_empty_threshold instance-attribute ¶

gripper_empty_threshold = gripper_empty_threshold

is_holding_object instance-attribute ¶

is_holding_object = is_holding_object

move_seg_idx instance-attribute ¶

move_seg_idx = None

move_seg_start_time instance-attribute ¶

move_seg_start_time = None

move_segments property ¶

move_segments: list[TCPMoveSegment]

robot_view instance-attribute ¶

robot_view = robot_view

settle_time instance-attribute ¶

settle_time = settle_time

start_time instance-attribute ¶

start_time = None

tcp_pos_err_threshold instance-attribute ¶

tcp_pos_err_threshold = tcp_pos_err_threshold

tcp_rot_err_threshold instance-attribute ¶

tcp_rot_err_threshold = tcp_rot_err_threshold

tcp_to_jp_fn instance-attribute ¶

tcp_to_jp_fn = tcp_to_jp_fn

check_failure ¶

check_failure() -> bool

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def check_failure(self) -> bool:
    if super().check_failure():
        return True
    elif self.move_seg_idx is None:
        # we haven't started moving yet
        return False

    curr_target_pose = self.get_current_target_pose()
    gripper_mg_id = self.robot_view.get_gripper_movegroup_ids()[0]
    gripper = self.robot_view.get_gripper(gripper_mg_id)

    trf = np.linalg.inv(gripper.leaf_frame_to_world) @ curr_target_pose
    pos_err = np.linalg.norm(trf[:3, 3])
    rot_err = R.from_matrix(trf[:3, :3]).magnitude()
    return pos_err > self.tcp_pos_err_threshold or rot_err > self.tcp_rot_err_threshold

elapsed_time ¶

elapsed_time() -> float

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def elapsed_time(self) -> float:
    return self.robot_view.mj_data.time - self.start_time

execute ¶

execute() -> bool

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def execute(self) -> bool:
    if self.start_time is None:
        self.start_time = self.robot_view.mj_data.time
        self.move_seg_idx = None
        self.move_seg_start_time = self.start_time

    duration_cumsum = np.cumsum([0] + [seg.duration for seg in self._move_segments])
    idx = np.searchsorted(duration_cumsum, self.elapsed_time(), side="right").item() - 1
    if idx != self.move_seg_idx:
        if idx < len(self._move_segments):
            log.info(f"Moving to {self._move_segments[idx].name}")
        self.move_seg_idx = idx
        self.move_seg_start_time = self.robot_view.mj_data.time

    return self.elapsed_time() >= self.duration + self.settle_time

get_current_action ¶

get_current_action() -> dict[str, Any]

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def get_current_action(self) -> dict[str, Any]:
    curr_target_pose = self.get_current_target_pose()

    # Solve IK for current target pose
    mg_id = self.robot_view.get_gripper_movegroup_ids()[0]
    return self.tcp_to_jp_fn(mg_id, curr_target_pose)

get_current_phase ¶

get_current_phase() -> str

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def get_current_phase(self) -> str:
    if self.move_seg_idx is None:
        move_seg = self.move_segments[0]
    elif self.move_seg_idx < len(self.move_segments):
        move_seg = self.move_segments[self.move_seg_idx]
    else:
        move_seg = self.move_segments[-1]
    return move_seg.name

get_current_target_pose ¶

get_current_target_pose() -> ndarray

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def get_current_target_pose(self) -> np.ndarray:
    elapsed_time = self.robot_view.mj_data.time - self.move_seg_start_time
    assert self.move_seg_idx is not None
    if self.move_seg_idx < len(self.move_segments):
        move_seg = self.move_segments[self.move_seg_idx]
        t = min(1.0, elapsed_time / move_seg.duration)

        lin_vel, ang_vel = transform_to_twist(
            np.linalg.inv(move_seg.start_pose) @ move_seg.end_pose
        )
        curr_target_pose = move_seg.start_pose @ twist_to_transform(lin_vel * t, ang_vel * t)
    else:
        move_seg = self.move_segments[-1]
        curr_target_pose = move_seg.end_pose
    return curr_target_pose

reset ¶

reset()

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def reset(self):
    super().reset()
    self.move_seg_idx = None
    self.move_seg_start_time = None

curobo_open_close_planner_policy ¶

Classes:

Name	Description
`CuroboOpenClosePlannerPolicy`	Curobo-based planner policy for open/close articulated object tasks.
`OpenClosePhase`

Attributes:

Name	Type	Description
`log`

log `module-attribute` ¶

log = getLogger(__name__)

CuroboOpenClosePlannerPolicy ¶

CuroboOpenClosePlannerPolicy(config: MlSpacesExpConfig, task: BaseMujocoTask)

Bases: CuroboPlannerPolicy, OpenClosePlannerPolicy

Curobo-based planner policy for open/close articulated object tasks.

Inherits common motion planning functionality from CuroboPlannerPolicy and task-specific functionality from OpenClosePlannerPolicy. Planning requests are made to a remote CuroboPlanner server via CuroboClient.

Methods:

Name	Description
`add_auxiliary_objects`
`batch_plan_trajectory`	Plan trajectory in batches.
`check_feasible_ik`
`clip_to_velocity_constraint`	Clip action to respect velocity constraints.
`get_action`
`get_all_phases`	Return list of all phase names.
`get_info`	Get additional information from the policy. Called after episode ended. This method can be
`get_look_at_action`	Get action to look at a target.
`get_phase`
`reset`
`select_arm`	Select which arm to use and set up the planner (local or remote).
`solve_ik`	Solve IK and create an interpolated trajectory.
`visualize_world_config_mesh`	Visualize the world configuration as a mesh file.

Attributes:

Name	Type	Description
`action_idx`
`action_primitives`
`arm_end_idx`	`int`
`arm_side`	`str \| None`
`arm_start_idx`	`int`
`articulation_pose_index`
`client`	`CuroboClient \| None`
`config`
`current_gripper_command`	`dict[str, float]`
`current_phase`
`grasping_timesteps`
`height_adjustment_steps`
`ik_warmed_up`
`is_done`	`bool`	Property to expose completion state for task checking.
`opening_timesteps`
`planned_trajectory`	`list[list[float]] \| None`
`planner`	`CuroboPlanner \| None`
`planner_joint_ranges`	`dict[str, tuple[int, int]]`
`planners`	`dict`
`policy_config`
`pre_grasp_poses`
`profiler`
`retry_count`	`int`
`robot_view`
`sequential_ik_failures`
`settle_steps`
`steps_spent_in_waypoint`	`int`
`target_poses`
`task`
`trajectory_index`	`int`

Source code in molmo_spaces/policy/solvers/object_manipulation/curobo_open_close_planner_policy.py

def __init__(self, config: MlSpacesExpConfig, task: BaseMujocoTask) -> None:
    # Initialize both parent classes
    CuroboPlannerPolicy.__init__(self, config, task)
    OpenClosePlannerPolicy.__init__(self, config, task)

    self.client: CuroboClient | None = None
    self._base_lock_joints: dict = {}  # fetched from server in select_arm()

    self.select_arm()
    self.pre_grasp_poses = None  # computed after height adjustment

    self.current_phase = OpenClosePhase.HEIGHT_SELECTION

    self.articulation_pose_index = 0
    self.grasping_timesteps = 0
    self.opening_timesteps = 0
    self.settle_steps = 0
    self.height_adjustment_steps = 0
    self._height_initial: float | None = None
    self._height_target: float | None = None

action_idx instance-attribute ¶

action_idx = 0

action_primitives instance-attribute ¶

action_primitives = []

arm_end_idx instance-attribute ¶

arm_end_idx: int = 0

arm_side instance-attribute ¶

arm_side: str | None = None

arm_start_idx instance-attribute ¶

arm_start_idx: int = 0

articulation_pose_index instance-attribute ¶

articulation_pose_index = 0

client instance-attribute ¶

client: CuroboClient | None = None

config instance-attribute ¶

config = config

current_gripper_command instance-attribute ¶

current_gripper_command: dict[str, float] = {}

current_phase instance-attribute ¶

current_phase = HEIGHT_SELECTION

grasping_timesteps instance-attribute ¶

grasping_timesteps = 0

height_adjustment_steps instance-attribute ¶

height_adjustment_steps = 0

ik_warmed_up instance-attribute ¶

ik_warmed_up = False

is_done property ¶

is_done: bool

Property to expose completion state for task checking.

opening_timesteps instance-attribute ¶

opening_timesteps = 0

planned_trajectory instance-attribute ¶

planned_trajectory: list[list[float]] | None = None

planner instance-attribute ¶

planner: CuroboPlanner | None = None

planner_joint_ranges instance-attribute ¶

planner_joint_ranges: dict[str, tuple[int, int]] = {}

planners property ¶

planners: dict

policy_config instance-attribute ¶

policy_config = policy_config

pre_grasp_poses instance-attribute ¶

pre_grasp_poses = None

profiler instance-attribute ¶

profiler = Profiler()

retry_count instance-attribute ¶

retry_count: int = 0

robot_view instance-attribute ¶

robot_view = robot_view

sequential_ik_failures instance-attribute ¶

sequential_ik_failures = 0

settle_steps instance-attribute ¶

settle_steps = 0

steps_spent_in_waypoint instance-attribute ¶

steps_spent_in_waypoint: int = 0

target_poses instance-attribute ¶

target_poses = {}

task instance-attribute ¶

task = task

trajectory_index instance-attribute ¶

trajectory_index: int = 0

add_auxiliary_objects staticmethod ¶

add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

@staticmethod
def add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None:
    PlannerPolicy.add_auxiliary_objects(config, spec)
    policy_config = config.policy_config
    assert isinstance(policy_config, ObjectManipulationPlannerPolicyConfig)
    if policy_config.filter_colliding_grasps:
        add_grasp_collision_bodies(
            spec,
            policy_config.grasp_collision_batch_size,
            policy_config.grasp_width,
            policy_config.grasp_length,
            policy_config.grasp_height,
            np.array(policy_config.grasp_base_pos),
        )

batch_plan_trajectory ¶

batch_plan_trajectory() -> None

Plan trajectory in batches.

Uses local planner or remote server depending on configuration. When using the server, lock joints and obstacles are passed atomically with each planning request to avoid races when multiple workers share the same server.

Source code in molmo_spaces/policy/solvers/object_manipulation/curobo_open_close_planner_policy.py

def batch_plan_trajectory(self) -> None:
    """Plan trajectory in batches.

    Uses local planner or remote server depending on configuration.
    When using the server, lock joints and obstacles are passed atomically
    with each planning request to avoid races when multiple workers share
    the same server.
    """
    if self._use_local_planner:
        return CuroboPlannerPolicy.batch_plan_trajectory(self)

    init_config = np.concatenate(
        [
            np.zeros(3),
            self.task.env.current_robot.robot_view.get_move_group(
                f"{self.arm_side}_arm"
            ).joint_pos,
        ]
    )

    # Build obstacle list for atomic world update during planning
    obstacles: list[dict] | None = None
    if getattr(self.config.policy_config, "enable_collision_avoidance", False):
        cuboids = self._get_nearby_object_collision_geometry()
        obstacles = self._cuboids_to_obstacle_list(cuboids)
        log.debug(
            f"Adding {len(cuboids)} cuboids to collision avoidance for phase: {self.current_phase}"
        )

    # Sample lock_joints atomically with the planning request
    lock_joints = self._get_lock_joints()

    goal_poses = self._get_batch_goal_poses()
    if goal_poses is None or len(goal_poses) == 0:
        log.warning("[BATCH PLAN] No goal poses available")
        return

    total = goal_poses.shape[0]
    batch_size = getattr(self.config.policy_config, "batch_size", 8)
    max_batches = getattr(self.config.policy_config, "max_batch_plan_attempts", 4)
    num_poses = min(max_batches * batch_size, total)
    num_batches = (num_poses + batch_size - 1) // batch_size

    all_successful_trajectories = []
    current_phase = getattr(self, "current_phase", "unknown")

    for batch_start in range(0, num_poses, batch_size):
        batch_end = min(batch_start + batch_size, num_poses)
        batch = goal_poses[batch_start:batch_end]

        if hasattr(self, "_show_poses"):
            self._show_poses(batch, style="tcp")
        if self.task.viewer:
            self.task.viewer.sync()

        log.info(
            f"Processing batch {batch_start // batch_size + 1}/{num_batches}: "
            f"poses {batch_start}-{batch_end - 1}"
        )

        batch_base_frame_7d = []
        for pose in batch:
            pose_base = self._target_pose_to_base_frame(pose)
            batch_base_frame_7d.append(pose_base.tolist())

        batch_len = len(batch_base_frame_7d)
        start_states = [init_config.tolist()] * batch_len

        log.info(
            f"Planning batch of {batch_len} {current_phase} poses with {self.arm_side} arm"
        )
        trajectories, successes = self.client.motion_plan_batch(
            joint_positions=start_states,
            goal_poses=batch_base_frame_7d,
            obstacles=obstacles,
            lock_joints=lock_joints,
        )

        num_successes = sum(successes)
        log.info(
            f"{self.arm_side.capitalize()} arm: {num_successes}/{batch_len} "
            f"{current_phase} planning successes"
        )

        for success, trajectory in zip(successes, trajectories):
            if not success or not trajectory:
                continue
            self._transform_traj_to_world_frame(trajectory)
            all_successful_trajectories.append(trajectory)

        if all_successful_trajectories:
            log.info(
                f"Found {len(all_successful_trajectories)} successful trajectories, "
                "skipping remaining batches"
            )
            break

    if all_successful_trajectories:
        self.planned_trajectory = self._select_best_trajectory(all_successful_trajectories)
    else:
        log.warning("[BATCH PLAN] No successful trajectories found across all batches")

check_feasible_ik ¶

check_feasible_ik(pose: ndarray) -> bool

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def check_feasible_ik(self, pose: np.ndarray) -> bool:
    if pose.ndim > 2:
        assert pose.shape[1:] == (4, 4)
        batch_size = pose.shape[0]
        assert batch_size <= self.policy_config.grasp_feasibility_batch_size

        if batch_size == 0:
            return np.array([], dtype=bool)

        # pad to the batch size to avoid triggering recompilation of the IK solver
        if batch_size < self.policy_config.grasp_feasibility_batch_size:
            n_pad = self.policy_config.grasp_feasibility_batch_size - batch_size
            pose = np.concatenate([pose, np.broadcast_to(pose[-1:], (n_pad, 4, 4))])

        robot_view = self.task.env.current_robot.robot_view
        parallel_kinematics = self.task._env.robots[0].parallel_kinematics
        jp_dicts = parallel_kinematics.ik(
            pose,
            robot_view.get_qpos_dict(),
            robot_view.base.pose,
            rel_to_base=False,
            posture_weight=0.0,
        )
        return np.array([jp_dict is not None for jp_dict in jp_dicts[:batch_size]])
    else:
        assert pose.shape == (4, 4)
        robot_view = self.task.env.current_robot.robot_view
        kinematics = self.task.env.current_robot.kinematics
        gripper_mg_id = robot_view.get_gripper_movegroup_ids()[0]
        jp_dict = kinematics.ik(
            gripper_mg_id,
            pose,
            robot_view.move_group_ids(),
            robot_view.get_qpos_dict(),
            base_pose=robot_view.base.pose,
        )
        return jp_dict is not None

clip_to_velocity_constraint ¶

clip_to_velocity_constraint(action: dict[str, Any]) -> dict[str, Any]

Clip action to respect velocity constraints.

Parameters:

Name	Type	Description	Default
`action`	`dict[str, Any]`	Dictionary of commanded joint positions by move group.	required

Returns:

Type	Description
`dict[str, Any]`	Clipped action dictionary.

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py

def clip_to_velocity_constraint(self, action: dict[str, Any]) -> dict[str, Any]:
    """Clip action to respect velocity constraints.

    Args:
        action: Dictionary of commanded joint positions by move group.

    Returns:
        Clipped action dictionary.
    """
    velocity_constraints = getattr(self.config.policy_config, "velocity_constraints", {})
    clipped_action = action.copy()

    for move_group, commanded_action in action.items():
        if move_group in velocity_constraints:
            max_velocity_rad_per_s = velocity_constraints[move_group]
            move_group_view = self.task.env.robots[0].robot_view.get_move_group(move_group)
            move_group_joint_pos = move_group_view.joint_pos.copy()

            # Calculate difference, handling angular wraparound for base theta
            diff = commanded_action - move_group_joint_pos
            if move_group == "base":
                # Normalize theta difference to [-π, π]
                diff[2] = np.arctan2(np.sin(diff[2]), np.cos(diff[2]))

            velocity_rad_per_s = diff * 10.0

            # Clip velocity and recalculate commanded action
            clipped_velocity = np.clip(
                velocity_rad_per_s, -max_velocity_rad_per_s, max_velocity_rad_per_s
            )
            commanded_action = move_group_joint_pos + clipped_velocity / 10.0

            clipped_action[move_group] = commanded_action

    return clipped_action

get_action ¶

get_action(info: dict[str, Any]) -> dict[str, Any]

Source code in molmo_spaces/policy/solvers/object_manipulation/curobo_open_close_planner_policy.py

def get_action(self, info: dict[str, Any]) -> dict[str, Any]:
    action_cmd = self.task.env.current_robot.robot_view.get_noop_ctrl_dict()
    if self.current_phase == OpenClosePhase.HEIGHT_SELECTION:
        action_cmd.update(self._execute_height_selection_phase())
    elif self.current_phase == OpenClosePhase.PREGRASP:
        action_cmd.update(self._execute_pre_grasp_phase())
    elif self.current_phase == OpenClosePhase.GRASP:
        action_cmd.update(self._execute_grasp_phase())
    elif self.current_phase == OpenClosePhase.ARTICULATE:
        action_cmd.update(self._execute_articulate_phase())
    elif self.current_phase == OpenClosePhase.POSTARTICULATE:
        action_cmd.update(self._execute_postarticulate_phase())
    elif self.current_phase == OpenClosePhase.DONE:
        return {"done": True}
    else:
        raise ValueError(f"Unknown phase: {self.current_phase}")

    action_cmd["torso"] = np.array([self.current_height])
    action_cmd = self.clip_to_velocity_constraint(action_cmd)
    return action_cmd

get_all_phases ¶

get_all_phases() -> dict[str, int]

Return list of all phase names.

Source code in molmo_spaces/policy/solvers/object_manipulation/curobo_open_close_planner_policy.py

def get_all_phases(self) -> dict[str, int]:
    """Return list of all phase names."""
    return {phase.value: i for i, phase in enumerate(OpenClosePhase)}

get_info ¶

get_info() -> dict

Get additional information from the policy. Called after episode ended. This method can be overridden by subclasses to provide extra information about the policy's state. Must be json serializable.

Returns:

Type	Description
`dict`	A dictionary containing additional information about the policy.

Source code in molmo_spaces/policy/base_policy.py

def get_info(self) -> dict:
    """
    Get additional information from the policy. Called after episode ended. This method can be
    overridden by subclasses to provide extra information about the policy's state.
    Must be json serializable.

    Returns:
        A dictionary containing additional information about the policy.
    """
    return {}

get_look_at_action ¶

get_look_at_action() -> dict[str, Any]

Get action to look at a target.

Subclasses can override this to implement head tracking.

Returns:

Type	Description
`dict[str, Any]`	Dictionary with head control commands, or empty dict.

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py

def get_look_at_action(self) -> dict[str, Any]:
    """Get action to look at a target.

    Subclasses can override this to implement head tracking.

    Returns:
        Dictionary with head control commands, or empty dict.
    """
    return {}

get_phase ¶

get_phase() -> OpenClosePhase

Source code in molmo_spaces/policy/solvers/object_manipulation/curobo_open_close_planner_policy.py

def get_phase(self) -> OpenClosePhase:
    return self.current_phase

reset ¶

reset(reset_retries: bool = True)

Source code in molmo_spaces/policy/solvers/object_manipulation/curobo_open_close_planner_policy.py

def reset(self, reset_retries: bool = True):
    # Call parent reset
    CuroboPlannerPolicy.reset(self)

    self.current_arm = None
    self.current_phase = OpenClosePhase.HEIGHT_SELECTION
    self.grasping_timesteps = 0
    self.opening_timesteps = 0
    self.articulation_pose_index = 0
    self.height_adjustment_steps = 0
    self.current_height = 0.0
    self._height_initial = None
    self._height_target = None
    self.pre_grasp_poses = None
    dummy_pose = np.eye(4)
    self.target_poses = {
        "grasp": dummy_pose,
        "pregrasp": dummy_pose,
        "place": dummy_pose,
    }
    self.settle_steps = 0
    from molmo_spaces.policy.solvers.object_manipulation.base_object_manipulation_planner_policy import (
        NoopAction,
    )

    self.action_primitives = [NoopAction(self.task.env.current_robot.robot_view, 0.0)]

select_arm ¶

select_arm() -> None

Select which arm to use and set up the planner (local or remote).

Source code in molmo_spaces/policy/solvers/object_manipulation/curobo_open_close_planner_policy.py

def select_arm(self) -> None:
    """Select which arm to use and set up the planner (local or remote)."""
    task_config = self.config.task_config
    om = self.task.env.object_managers[self.task.env.current_batch_index]
    pickup_obj = om.get_object_by_name(task_config.pickup_obj_name)
    pickup_obj_pos = pickup_obj.position

    left_tcp_pose = self.task.env.current_robot.robot_view.get_move_group(
        "left_gripper"
    ).leaf_frame_to_world
    right_tcp_pose = self.task.env.current_robot.robot_view.get_move_group(
        "right_gripper"
    ).leaf_frame_to_world

    left_dist = np.linalg.norm(left_tcp_pose[:3, 3] - pickup_obj_pos)
    right_dist = np.linalg.norm(right_tcp_pose[:3, 3] - pickup_obj_pos)

    selected_arm = "left" if left_dist < right_dist else "right"
    log.info(
        f"Selected {selected_arm} arm (left dist: {left_dist:.3f}m, right dist: {right_dist:.3f}m)"
    )

    self.arm_side = selected_arm

    if self._use_local_planner:
        self._select_arm_local(selected_arm)
    else:
        server_url = random.choice(self.config.policy_config.server_urls)
        log.info(f"Connecting to CuroboPlanner server at {server_url} (arm={selected_arm})")
        server_timeout = getattr(self.config.policy_config, "server_timeout", 120.0)
        self.client = CuroboClient(
            base_url=server_url, arm=selected_arm, timeout=server_timeout
        )

        # Fetch base lock_joints from server once; merged with live torso values per-request.
        self._base_lock_joints = self.client.get_lock_joints()
        log.info(f"Base lock_joints from server: {self._base_lock_joints}")

    if selected_arm == "left":
        self.planner_joint_ranges = self.config.policy_config.left_planner_joint_ranges
    else:
        self.planner_joint_ranges = self.config.policy_config.right_planner_joint_ranges

    self.arm_start_idx = self.planner_joint_ranges[f"{self.arm_side}_arm"][0]
    self.arm_end_idx = self.planner_joint_ranges[f"{self.arm_side}_arm"][1]

solve_ik ¶

solve_ik(target_pose: ndarray) -> None

Solve IK and create an interpolated trajectory.

Uses local planner or remote server depending on configuration. When using the server, lock joints and obstacles are passed atomically with the request to avoid races when multiple workers share the same server.

Parameters:

Name	Type	Description	Default
`target_pose`	`ndarray`	4x4 transformation matrix for target end-effector pose in world frame.	required

Source code in molmo_spaces/policy/solvers/object_manipulation/curobo_open_close_planner_policy.py

def solve_ik(self, target_pose: np.ndarray) -> None:
    """Solve IK and create an interpolated trajectory.

    Uses local planner or remote server depending on configuration.
    When using the server, lock joints and obstacles are passed atomically
    with the request to avoid races when multiple workers share the same server.

    Args:
        target_pose: 4x4 transformation matrix for target end-effector pose in world frame.
    """
    if self._use_local_planner:
        return CuroboPlannerPolicy.solve_ik(self, target_pose)

    init_config = np.concatenate(
        [
            np.zeros(3),
            self.task.env.current_robot.robot_view.get_move_group(
                f"{self.arm_side}_arm"
            ).joint_pos,
        ]
    )

    target_pose_7d = self._target_pose_to_base_frame(target_pose)
    lock_joints = self._get_lock_joints()
    joint_config, success = self.client.ik(
        goal_pose=target_pose_7d.tolist(),
        seed_config=init_config.tolist(),
        disable_collision=True,
        lock_joints=lock_joints,
    )

    current_phase = getattr(self, "current_phase", "unknown")
    if not success:
        raise ValueError(f"Could not solve {current_phase} phase IK.")

    trajectory = self._interpolate_joint_trajectory(
        init_config,
        np.array(joint_config),
        num_steps=10,
    )
    self._transform_traj_to_world_frame(trajectory)
    self.planned_trajectory = trajectory

visualize_world_config_mesh ¶

visualize_world_config_mesh(world_cfg: WorldConfig) -> None

Visualize the world configuration as a mesh file.

Parameters:

Name	Type	Description	Default
`world_cfg`	`WorldConfig`	Curobo WorldConfig to visualize.	required

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py

def visualize_world_config_mesh(self, world_cfg: WorldConfig) -> None:
    """Visualize the world configuration as a mesh file.

    Args:
        world_cfg: Curobo WorldConfig to visualize.
    """
    from datetime import datetime

    import trimesh
    from curobo.geom.types import WorldConfig

    current_phase = getattr(self, "current_phase", "unknown")
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    file_name = f"world_config_mesh_{current_phase}_{timestamp}"
    obj_path = f"{file_name}.obj"
    initial_joint_configuration = np.array(self._get_current_joint_positions())

    try:
        # Create a combined scene with robot and world obstacles
        combined_scene = trimesh.scene.scene.Scene(base_frame="world_origin")

        # Add robot mesh if we have a joint configuration
        if initial_joint_configuration is not None and self.planner is not None:
            try:
                config_for_mesh = np.array(initial_joint_configuration).copy()
                # Zero out base joints to place robot at origin
                config_for_mesh[:3] = 0.0
                q_tensor = torch.tensor(config_for_mesh).unsqueeze(0).float().cuda()

                robot_spheres_batch = self.planner.motion_gen.kinematics.get_robot_as_spheres(
                    q_tensor
                )
                if robot_spheres_batch and len(robot_spheres_batch) > 0:
                    robot_spheres = robot_spheres_batch[0]
                    for i, sphere in enumerate(robot_spheres):
                        sphere_mesh = trimesh.creation.icosphere(radius=sphere.radius)
                        sphere_transform = np.eye(4)
                        sphere_transform[:3, 3] = sphere.pose[:3]
                        combined_scene.add_geometry(
                            sphere_mesh,
                            geom_name=f"robot_sphere_{i}",
                            parent_node_name="world_origin",
                            transform=sphere_transform,
                        )
                    log.info(
                        f"Added {len(robot_spheres)} robot collision spheres to visualization"
                    )

            except Exception as e:
                log.warning(f"Could not add robot mesh: {e}")

        # Add world obstacles to the scene
        try:
            world_scene = WorldConfig.get_scene_graph(world_cfg, process_color=True)
            for geom_name, geom in world_scene.geometry.items():
                transform = world_scene.graph.get(geom_name)[0]
                combined_scene.add_geometry(
                    geom,
                    geom_name=geom_name,
                    parent_node_name="world_origin",
                    transform=transform,
                )
        except Exception as e:
            log.warning(f"Could not add world obstacles: {e}")

        # Export combined scene
        if len(combined_scene.geometry) > 0:
            combined_scene.export(obj_path)
            log.info(f"Successfully saved combined robot + world mesh to {obj_path}")
        else:
            log.debug("Skipping mesh export - scene is empty")

    except ValueError as e:
        if "empty scene" in str(e).lower():
            log.debug("Skipping mesh export - world config is empty")
        else:
            raise

OpenClosePhase ¶

Bases: str, Enum

Attributes:

Name	Type	Description
`ARTICULATE`
`DONE`
`GRASP`
`HEIGHT_SELECTION`
`POSTARTICULATE`
`PREGRASP`

ARTICULATE class-attribute instance-attribute ¶

ARTICULATE = 'articulate'

DONE class-attribute instance-attribute ¶

DONE = 'done'

GRASP class-attribute instance-attribute ¶

GRASP = 'grasp'

HEIGHT_SELECTION class-attribute instance-attribute ¶

HEIGHT_SELECTION = 'height_selection'

POSTARTICULATE class-attribute instance-attribute ¶

POSTARTICULATE = 'place'

PREGRASP class-attribute instance-attribute ¶

PREGRASP = 'pregrasp'

curobo_pick_and_place_planner_policy ¶

Classes:

Name	Description
`CuroboPickAndPlacePlannerPolicy`	Curobo-based planner policy for pick and place tasks.
`PickAndPlacePhase`

Attributes:

Name	Type	Description
`log`

log `module-attribute` ¶

log = getLogger(__name__)

CuroboPickAndPlacePlannerPolicy ¶

CuroboPickAndPlacePlannerPolicy(config: MlSpacesExpConfig, task: BaseMujocoTask)

Bases: CuroboPlannerPolicy, PickAndPlacePlannerPolicy

Curobo-based planner policy for pick and place tasks.

Inherits common motion planning functionality from CuroboPlannerPolicy and task-specific functionality from PickAndPlacePlannerPolicy. Planning requests are made to a remote CuroboPlanner server via CuroboClient.

Methods:

Name	Description
`add_auxiliary_objects`
`batch_plan_trajectory`	Plan trajectory in batches.
`check_feasible_ik`
`clip_to_velocity_constraint`	Clip action to respect velocity constraints.
`get_action`
`get_all_phases`	Return list of all phase names.
`get_info`	Get additional information from the policy. Called after episode ended. This method can be
`get_look_at_action`	Get action to look at a target.
`get_phase`
`reset`
`select_arm`	Select which arm to use and set up the planner (local or remote).
`solve_ik`	Solve IK and create an interpolated trajectory.
`visualize_world_config_mesh`	Visualize the world configuration as a mesh file.

Attributes:

Name	Type	Description
`action_idx`
`action_primitives`
`arm_end_idx`	`int`
`arm_side`	`str \| None`
`arm_start_idx`	`int`
`client`	`CuroboClient \| None`
`config`
`current_gripper_command`	`dict[str, float]`
`current_phase`
`grasping_timesteps`
`ik_warmed_up`
`is_done`	`bool`	Property to expose completion state for task checking.
`opening_timesteps`
`place_poses`
`planned_trajectory`	`list[list[float]] \| None`
`planner`	`CuroboPlanner \| None`
`planner_joint_ranges`	`dict[str, tuple[int, int]]`
`planners`	`dict`
`policy_config`
`pre_grasp_poses`
`profiler`
`retry_count`	`int`
`robot_view`
`sequential_ik_failures`
`settle_steps`
`steps_spent_in_waypoint`	`int`
`target_poses`
`task`
`trajectory_index`	`int`

Source code in molmo_spaces/policy/solvers/object_manipulation/curobo_pick_and_place_planner_policy.py

def __init__(self, config: MlSpacesExpConfig, task: BaseMujocoTask) -> None:
    # Initialize both parent classes
    CuroboPlannerPolicy.__init__(self, config, task)
    PickAndPlacePlannerPolicy.__init__(self, config, task)

    self.client: CuroboClient | None = None

    self.select_arm()
    self.pre_grasp_poses = self._get_pregrasp_poses()
    if isinstance(self.task, PickAndPlaceTask):
        self.place_poses = self._get_place_poses()

    self.current_phase = PickAndPlacePhase.PREGRASP

    self.grasping_timesteps = 0
    self.opening_timesteps = 0
    self.settle_steps = 0

action_idx instance-attribute ¶

action_idx = 0

action_primitives instance-attribute ¶

action_primitives = []

arm_end_idx instance-attribute ¶

arm_end_idx: int = 0

arm_side instance-attribute ¶

arm_side: str | None = None

arm_start_idx instance-attribute ¶

arm_start_idx: int = 0

client instance-attribute ¶

client: CuroboClient | None = None

config instance-attribute ¶

config = config

current_gripper_command instance-attribute ¶

current_gripper_command: dict[str, float] = {}

current_phase instance-attribute ¶

current_phase = PREGRASP

grasping_timesteps instance-attribute ¶

grasping_timesteps = 0

ik_warmed_up instance-attribute ¶

ik_warmed_up = False

is_done property ¶

is_done: bool

Property to expose completion state for task checking.

opening_timesteps instance-attribute ¶

opening_timesteps = 0

place_poses instance-attribute ¶

place_poses = _get_place_poses()

planned_trajectory instance-attribute ¶

planned_trajectory: list[list[float]] | None = None

planner instance-attribute ¶

planner: CuroboPlanner | None = None

planner_joint_ranges instance-attribute ¶

planner_joint_ranges: dict[str, tuple[int, int]] = {}

planners property ¶

planners: dict

policy_config instance-attribute ¶

policy_config = policy_config

pre_grasp_poses instance-attribute ¶

pre_grasp_poses = _get_pregrasp_poses()

profiler instance-attribute ¶

profiler = Profiler()

retry_count instance-attribute ¶

retry_count: int = 0

robot_view instance-attribute ¶

robot_view = robot_view

sequential_ik_failures instance-attribute ¶

sequential_ik_failures = 0

settle_steps instance-attribute ¶

settle_steps = 0

steps_spent_in_waypoint instance-attribute ¶

steps_spent_in_waypoint: int = 0

target_poses instance-attribute ¶

target_poses = {}

task instance-attribute ¶

task = task

trajectory_index instance-attribute ¶

trajectory_index: int = 0

add_auxiliary_objects staticmethod ¶

add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

@staticmethod
def add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None:
    PlannerPolicy.add_auxiliary_objects(config, spec)
    policy_config = config.policy_config
    assert isinstance(policy_config, ObjectManipulationPlannerPolicyConfig)
    if policy_config.filter_colliding_grasps:
        add_grasp_collision_bodies(
            spec,
            policy_config.grasp_collision_batch_size,
            policy_config.grasp_width,
            policy_config.grasp_length,
            policy_config.grasp_height,
            np.array(policy_config.grasp_base_pos),
        )

batch_plan_trajectory ¶

batch_plan_trajectory() -> None

Plan trajectory in batches.

Uses local planner or remote server depending on configuration. When using the server, obstacles are passed atomically with each planning request to avoid races when multiple workers share the same server.

Source code in molmo_spaces/policy/solvers/object_manipulation/curobo_pick_and_place_planner_policy.py

def batch_plan_trajectory(self) -> None:
    """Plan trajectory in batches.

    Uses local planner or remote server depending on configuration.
    When using the server, obstacles are passed atomically with each planning
    request to avoid races when multiple workers share the same server.
    """
    if self._use_local_planner:
        return CuroboPlannerPolicy.batch_plan_trajectory(self)

    init_config = np.concatenate(
        [
            np.zeros(3),
            self.task.env.current_robot.robot_view.get_move_group(
                f"{self.arm_side}_arm"
            ).joint_pos,
        ]
    )

    # Build obstacle list for atomic world update during planning
    obstacles: list[dict] | None = None
    if getattr(self.config.policy_config, "enable_collision_avoidance", False):
        cuboids = self._get_collision_cuboids()
        obstacles = self._cuboids_to_obstacle_list(cuboids)
        log.debug(
            f"Adding {len(cuboids)} cuboids to collision avoidance for phase: {self.current_phase}"
        )

    # Attach pickup object to EE link for place phase if configured
    if self.current_phase == PickAndPlacePhase.PLACE and self.config.policy_config.attach_obj:
        pickup_obj_name = self.config.task_config.pickup_obj_name
        joint_pos_for_attach = np.array(self._get_current_joint_positions()).copy()
        base_range = self.planner_joint_ranges["base"]
        joint_pos_for_attach[base_range[0] : base_range[1]] = 0.0
        self.client.attach_object(
            object_names=[pickup_obj_name],
            joint_position=joint_pos_for_attach.tolist(),
            attach_link_names=[f"attached_object_{self.arm_side}"],
        )
        log.info(
            f"Attached object '{pickup_obj_name}' to link 'attached_object_{self.arm_side}'"
        )

    # Get goal poses based on current phase
    goal_poses = self._get_batch_goal_poses()
    if goal_poses is None or len(goal_poses) == 0:
        log.warning("[BATCH PLAN] No goal poses available")
        return

    total = goal_poses.shape[0]
    batch_size = getattr(self.config.policy_config, "batch_size", 8)
    max_batches = getattr(self.config.policy_config, "max_batch_plan_attempts", 4)
    num_poses = min(max_batches * batch_size, total)
    num_batches = (num_poses + batch_size - 1) // batch_size

    all_successful_trajectories = []
    current_phase = getattr(self, "current_phase", "unknown")

    for batch_start in range(0, num_poses, batch_size):
        batch_end = min(batch_start + batch_size, num_poses)
        batch = goal_poses[batch_start:batch_end]

        if hasattr(self, "_show_poses"):
            self._show_poses(batch, style="tcp")
        if self.task.viewer:
            self.task.viewer.sync()

        log.info(
            f"Processing batch {batch_start // batch_size + 1}/{num_batches}: "
            f"poses {batch_start}-{batch_end - 1}"
        )

        # Transform poses to base frame and convert to 7D format
        batch_base_frame_7d = []
        for pose in batch:
            pose_base = self._target_pose_to_base_frame(pose)
            batch_base_frame_7d.append(pose_base.tolist())

        batch_len = len(batch_base_frame_7d)
        start_states = [init_config.tolist()] * batch_len

        log.info(
            f"Planning batch of {batch_len} {current_phase} poses with {self.arm_side} arm"
        )
        trajectories, successes = self.client.motion_plan_batch(
            joint_positions=start_states,
            goal_poses=batch_base_frame_7d,
            obstacles=obstacles,
        )

        num_successes = sum(successes)
        log.info(
            f"{self.arm_side.capitalize()} arm: {num_successes}/{batch_len} "
            f"{current_phase} planning successes"
        )

        for success, trajectory in zip(successes, trajectories):
            if not success or not trajectory:
                continue
            self._transform_traj_to_world_frame(trajectory)
            all_successful_trajectories.append(trajectory)

        if all_successful_trajectories:
            log.info(
                f"Found {len(all_successful_trajectories)} successful trajectories, "
                "skipping remaining batches"
            )
            break

    if all_successful_trajectories:
        self.planned_trajectory = self._select_best_trajectory(all_successful_trajectories)
    else:
        log.warning("[BATCH PLAN] No successful trajectories found across all batches")

check_feasible_ik ¶

check_feasible_ik(pose: ndarray) -> bool

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def check_feasible_ik(self, pose: np.ndarray) -> bool:
    if pose.ndim > 2:
        assert pose.shape[1:] == (4, 4)
        batch_size = pose.shape[0]
        assert batch_size <= self.policy_config.grasp_feasibility_batch_size

        if batch_size == 0:
            return np.array([], dtype=bool)

        # pad to the batch size to avoid triggering recompilation of the IK solver
        if batch_size < self.policy_config.grasp_feasibility_batch_size:
            n_pad = self.policy_config.grasp_feasibility_batch_size - batch_size
            pose = np.concatenate([pose, np.broadcast_to(pose[-1:], (n_pad, 4, 4))])

        robot_view = self.task.env.current_robot.robot_view
        parallel_kinematics = self.task._env.robots[0].parallel_kinematics
        jp_dicts = parallel_kinematics.ik(
            pose,
            robot_view.get_qpos_dict(),
            robot_view.base.pose,
            rel_to_base=False,
            posture_weight=0.0,
        )
        return np.array([jp_dict is not None for jp_dict in jp_dicts[:batch_size]])
    else:
        assert pose.shape == (4, 4)
        robot_view = self.task.env.current_robot.robot_view
        kinematics = self.task.env.current_robot.kinematics
        gripper_mg_id = robot_view.get_gripper_movegroup_ids()[0]
        jp_dict = kinematics.ik(
            gripper_mg_id,
            pose,
            robot_view.move_group_ids(),
            robot_view.get_qpos_dict(),
            base_pose=robot_view.base.pose,
        )
        return jp_dict is not None

clip_to_velocity_constraint ¶

clip_to_velocity_constraint(action: dict[str, Any]) -> dict[str, Any]

Clip action to respect velocity constraints.

Parameters:

Name	Type	Description	Default
`action`	`dict[str, Any]`	Dictionary of commanded joint positions by move group.	required

Returns:

Type	Description
`dict[str, Any]`	Clipped action dictionary.

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py

def clip_to_velocity_constraint(self, action: dict[str, Any]) -> dict[str, Any]:
    """Clip action to respect velocity constraints.

    Args:
        action: Dictionary of commanded joint positions by move group.

    Returns:
        Clipped action dictionary.
    """
    velocity_constraints = getattr(self.config.policy_config, "velocity_constraints", {})
    clipped_action = action.copy()

    for move_group, commanded_action in action.items():
        if move_group in velocity_constraints:
            max_velocity_rad_per_s = velocity_constraints[move_group]
            move_group_view = self.task.env.robots[0].robot_view.get_move_group(move_group)
            move_group_joint_pos = move_group_view.joint_pos.copy()

            # Calculate difference, handling angular wraparound for base theta
            diff = commanded_action - move_group_joint_pos
            if move_group == "base":
                # Normalize theta difference to [-π, π]
                diff[2] = np.arctan2(np.sin(diff[2]), np.cos(diff[2]))

            velocity_rad_per_s = diff * 10.0

            # Clip velocity and recalculate commanded action
            clipped_velocity = np.clip(
                velocity_rad_per_s, -max_velocity_rad_per_s, max_velocity_rad_per_s
            )
            commanded_action = move_group_joint_pos + clipped_velocity / 10.0

            clipped_action[move_group] = commanded_action

    return clipped_action

get_action ¶

get_action(info: dict[str, Any]) -> dict[str, Any]

Source code in molmo_spaces/policy/solvers/object_manipulation/curobo_pick_and_place_planner_policy.py

def get_action(self, info: dict[str, Any]) -> dict[str, Any]:
    action_cmd = self.task.env.current_robot.robot_view.get_noop_ctrl_dict()
    if self.current_phase == PickAndPlacePhase.PREGRASP:
        action_cmd.update(self._execute_pre_grasp_phase())
    elif self.current_phase == PickAndPlacePhase.GRASP:
        action_cmd.update(self._execute_grasp_phase())
    elif self.current_phase == PickAndPlacePhase.LIFT:
        action_cmd.update(self._execute_lift_phase())
    elif self.current_phase == PickAndPlacePhase.PLACE:
        action_cmd.update(self._execute_place_phase())
    elif self.current_phase == PickAndPlacePhase.POSTPLACE:
        action_cmd.update(self._execute_postplace_phase())
    elif self.current_phase == PickAndPlacePhase.DONE:
        return {"done": True}
    else:
        raise ValueError(f"Unknown phase: {self.current_phase}")
    action_cmd = self.clip_to_velocity_constraint(action_cmd)
    return action_cmd

get_all_phases ¶

get_all_phases() -> dict[str, int]

Return list of all phase names.

Source code in molmo_spaces/policy/solvers/object_manipulation/curobo_pick_and_place_planner_policy.py

def get_all_phases(self) -> dict[str, int]:
    """Return list of all phase names."""
    return {phase.value: i for i, phase in enumerate(PickAndPlacePhase)}

get_info ¶

get_info() -> dict

Get additional information from the policy. Called after episode ended. This method can be overridden by subclasses to provide extra information about the policy's state. Must be json serializable.

Returns:

Type	Description
`dict`	A dictionary containing additional information about the policy.

Source code in molmo_spaces/policy/base_policy.py

def get_info(self) -> dict:
    """
    Get additional information from the policy. Called after episode ended. This method can be
    overridden by subclasses to provide extra information about the policy's state.
    Must be json serializable.

    Returns:
        A dictionary containing additional information about the policy.
    """
    return {}

get_look_at_action ¶

get_look_at_action() -> dict[str, Any]

Get action to look at a target.

Subclasses can override this to implement head tracking.

Returns:

Type	Description
`dict[str, Any]`	Dictionary with head control commands, or empty dict.

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py

def get_look_at_action(self) -> dict[str, Any]:
    """Get action to look at a target.

    Subclasses can override this to implement head tracking.

    Returns:
        Dictionary with head control commands, or empty dict.
    """
    return {}

get_phase ¶

get_phase() -> PickAndPlacePhase

Source code in molmo_spaces/policy/solvers/object_manipulation/curobo_pick_and_place_planner_policy.py

def get_phase(self) -> PickAndPlacePhase:
    return self.current_phase

reset ¶

reset(reset_retries: bool = True)

Source code in molmo_spaces/policy/solvers/object_manipulation/curobo_pick_and_place_planner_policy.py

def reset(self, reset_retries: bool = True):
    # Call parent reset
    CuroboPlannerPolicy.reset(self)

    self.current_arm = None
    self.current_phase = PickAndPlacePhase.PREGRASP
    self.grasping_timesteps = 0
    self.opening_timesteps = 0
    dummy_pose = np.eye(4)
    self.target_poses = {
        "grasp": dummy_pose,
        "pregrasp": dummy_pose,
        "place": dummy_pose,
    }
    self.settle_steps = 0
    from molmo_spaces.policy.solvers.object_manipulation.base_object_manipulation_planner_policy import (
        NoopAction,
    )

    self.action_primitives = [NoopAction(self.task.env.current_robot.robot_view, 0.0)]

select_arm ¶

select_arm() -> None

Select which arm to use and set up the planner (local or remote).

Source code in molmo_spaces/policy/solvers/object_manipulation/curobo_pick_and_place_planner_policy.py

def select_arm(self) -> None:
    """Select which arm to use and set up the planner (local or remote)."""
    task_config = self.config.task_config
    om = self.task.env.object_managers[self.task.env.current_batch_index]
    pickup_obj = om.get_object_by_name(task_config.pickup_obj_name)
    pickup_obj_pos = pickup_obj.position

    left_tcp_pose = self.task.env.current_robot.robot_view.get_move_group(
        "left_gripper"
    ).leaf_frame_to_world
    right_tcp_pose = self.task.env.current_robot.robot_view.get_move_group(
        "right_gripper"
    ).leaf_frame_to_world

    left_dist = np.linalg.norm(left_tcp_pose[:3, 3] - pickup_obj_pos)
    right_dist = np.linalg.norm(right_tcp_pose[:3, 3] - pickup_obj_pos)

    selected_arm = "left" if left_dist < right_dist else "right"
    log.info(
        f"Selected {selected_arm} arm (left dist: {left_dist:.3f}m, right dist: {right_dist:.3f}m)"
    )

    self.arm_side = selected_arm

    if self._use_local_planner:
        self._select_arm_local(selected_arm)
    else:
        server_url = random.choice(self.config.policy_config.server_urls)
        log.info(f"Connecting to CuroboPlanner server at {server_url} (arm={selected_arm})")
        server_timeout = getattr(self.config.policy_config, "server_timeout", 120.0)
        self.client = CuroboClient(
            base_url=server_url, arm=selected_arm, timeout=server_timeout
        )

    if selected_arm == "left":
        self.planner_joint_ranges = self.config.policy_config.left_planner_joint_ranges
    else:
        self.planner_joint_ranges = self.config.policy_config.right_planner_joint_ranges

    self.arm_start_idx = self.planner_joint_ranges[f"{self.arm_side}_arm"][0]
    self.arm_end_idx = self.planner_joint_ranges[f"{self.arm_side}_arm"][1]

solve_ik ¶

solve_ik(target_pose: ndarray) -> None

Solve IK and create an interpolated trajectory.

Uses local planner or remote server depending on configuration.

Parameters:

Name	Type	Description	Default
`target_pose`	`ndarray`	4x4 transformation matrix for target end-effector pose in world frame.	required

Source code in molmo_spaces/policy/solvers/object_manipulation/curobo_pick_and_place_planner_policy.py

def solve_ik(self, target_pose: np.ndarray) -> None:
    """Solve IK and create an interpolated trajectory.

    Uses local planner or remote server depending on configuration.

    Args:
        target_pose: 4x4 transformation matrix for target end-effector pose in world frame.
    """
    if self._use_local_planner:
        return CuroboPlannerPolicy.solve_ik(self, target_pose)

    init_config = np.concatenate(
        [
            np.zeros(3),
            self.task.env.current_robot.robot_view.get_move_group(
                f"{self.arm_side}_arm"
            ).joint_pos,
        ]
    )

    target_pose_7d = self._target_pose_to_base_frame(target_pose)
    joint_config, success = self.client.ik(
        goal_pose=target_pose_7d.tolist(),
        seed_config=init_config.tolist(),
        disable_collision=True,
    )

    current_phase = getattr(self, "current_phase", "unknown")
    if not success:
        raise ValueError(f"Could not solve {current_phase} phase IK.")

    trajectory = self._interpolate_joint_trajectory(
        init_config,
        np.array(joint_config),
        num_steps=10,
    )
    self._transform_traj_to_world_frame(trajectory)
    self.planned_trajectory = trajectory

visualize_world_config_mesh ¶

visualize_world_config_mesh(world_cfg: WorldConfig) -> None

Visualize the world configuration as a mesh file.

Parameters:

Name	Type	Description	Default
`world_cfg`	`WorldConfig`	Curobo WorldConfig to visualize.	required

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py

def visualize_world_config_mesh(self, world_cfg: WorldConfig) -> None:
    """Visualize the world configuration as a mesh file.

    Args:
        world_cfg: Curobo WorldConfig to visualize.
    """
    from datetime import datetime

    import trimesh
    from curobo.geom.types import WorldConfig

    current_phase = getattr(self, "current_phase", "unknown")
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    file_name = f"world_config_mesh_{current_phase}_{timestamp}"
    obj_path = f"{file_name}.obj"
    initial_joint_configuration = np.array(self._get_current_joint_positions())

    try:
        # Create a combined scene with robot and world obstacles
        combined_scene = trimesh.scene.scene.Scene(base_frame="world_origin")

        # Add robot mesh if we have a joint configuration
        if initial_joint_configuration is not None and self.planner is not None:
            try:
                config_for_mesh = np.array(initial_joint_configuration).copy()
                # Zero out base joints to place robot at origin
                config_for_mesh[:3] = 0.0
                q_tensor = torch.tensor(config_for_mesh).unsqueeze(0).float().cuda()

                robot_spheres_batch = self.planner.motion_gen.kinematics.get_robot_as_spheres(
                    q_tensor
                )
                if robot_spheres_batch and len(robot_spheres_batch) > 0:
                    robot_spheres = robot_spheres_batch[0]
                    for i, sphere in enumerate(robot_spheres):
                        sphere_mesh = trimesh.creation.icosphere(radius=sphere.radius)
                        sphere_transform = np.eye(4)
                        sphere_transform[:3, 3] = sphere.pose[:3]
                        combined_scene.add_geometry(
                            sphere_mesh,
                            geom_name=f"robot_sphere_{i}",
                            parent_node_name="world_origin",
                            transform=sphere_transform,
                        )
                    log.info(
                        f"Added {len(robot_spheres)} robot collision spheres to visualization"
                    )

            except Exception as e:
                log.warning(f"Could not add robot mesh: {e}")

        # Add world obstacles to the scene
        try:
            world_scene = WorldConfig.get_scene_graph(world_cfg, process_color=True)
            for geom_name, geom in world_scene.geometry.items():
                transform = world_scene.graph.get(geom_name)[0]
                combined_scene.add_geometry(
                    geom,
                    geom_name=geom_name,
                    parent_node_name="world_origin",
                    transform=transform,
                )
        except Exception as e:
            log.warning(f"Could not add world obstacles: {e}")

        # Export combined scene
        if len(combined_scene.geometry) > 0:
            combined_scene.export(obj_path)
            log.info(f"Successfully saved combined robot + world mesh to {obj_path}")
        else:
            log.debug("Skipping mesh export - scene is empty")

    except ValueError as e:
        if "empty scene" in str(e).lower():
            log.debug("Skipping mesh export - world config is empty")
        else:
            raise

PickAndPlacePhase ¶

Bases: str, Enum

Attributes:

Name	Type	Description
`DONE`
`GRASP`
`LIFT`
`PLACE`
`POSTPLACE`
`PREGRASP`

DONE class-attribute instance-attribute ¶

DONE = 'done'

GRASP class-attribute instance-attribute ¶

GRASP = 'grasp'

LIFT class-attribute instance-attribute ¶

LIFT = 'lift'

PLACE class-attribute instance-attribute ¶

PLACE = 'place'

POSTPLACE class-attribute instance-attribute ¶

POSTPLACE = 'postplace'

PREGRASP class-attribute instance-attribute ¶

PREGRASP = 'pregrasp'

open_close_planner_policy ¶

Classes:

Name	Description
`OpenClosePlannerPolicy`

Attributes:

Name	Type	Description
`log`

log `module-attribute` ¶

log = getLogger(__name__)

OpenClosePlannerPolicy ¶

OpenClosePlannerPolicy(config: MlSpacesExpConfig, task: BaseMujocoTask)

Bases: BaseObjectManipulationPlannerPolicy

Methods:

Name	Description
`add_auxiliary_objects`
`check_feasible_ik`
`get_action`
`get_all_phases`
`get_info`	Get additional information from the policy. Called after episode ended. This method can be
`get_phase`
`reset`

Attributes:

Name	Type	Description
`action_idx`
`action_primitives`
`config`
`ik_warmed_up`
`planners`
`policy_config`
`retry_count`
`robot_view`
`sequential_ik_failures`
`target_poses`
`task`

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def __init__(self, config: MlSpacesExpConfig, task: BaseMujocoTask) -> None:
    assert isinstance(config.policy_config, ObjectManipulationPlannerPolicyConfig)
    super().__init__(config, task)
    self.policy_config = config.policy_config
    self.robot_view = task.env.current_robot.robot_view

    self.action_primitives = []
    self.action_idx = 0
    self.retry_count = 0
    self.target_poses = {}
    self.ik_warmed_up = False
    self.sequential_ik_failures = 0

action_idx instance-attribute ¶

action_idx = 0

action_primitives instance-attribute ¶

action_primitives = []

config instance-attribute ¶

config = config

ik_warmed_up instance-attribute ¶

ik_warmed_up = False

planners property ¶

planners

policy_config instance-attribute ¶

policy_config = policy_config

retry_count instance-attribute ¶

retry_count = 0

robot_view instance-attribute ¶

robot_view = robot_view

sequential_ik_failures instance-attribute ¶

sequential_ik_failures = 0

target_poses instance-attribute ¶

target_poses = {}

task instance-attribute ¶

task = task

add_auxiliary_objects staticmethod ¶

add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

@staticmethod
def add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None:
    PlannerPolicy.add_auxiliary_objects(config, spec)
    policy_config = config.policy_config
    assert isinstance(policy_config, ObjectManipulationPlannerPolicyConfig)
    if policy_config.filter_colliding_grasps:
        add_grasp_collision_bodies(
            spec,
            policy_config.grasp_collision_batch_size,
            policy_config.grasp_width,
            policy_config.grasp_length,
            policy_config.grasp_height,
            np.array(policy_config.grasp_base_pos),
        )

check_feasible_ik ¶

check_feasible_ik(pose: ndarray) -> bool

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def check_feasible_ik(self, pose: np.ndarray) -> bool:
    if pose.ndim > 2:
        assert pose.shape[1:] == (4, 4)
        batch_size = pose.shape[0]
        assert batch_size <= self.policy_config.grasp_feasibility_batch_size

        if batch_size == 0:
            return np.array([], dtype=bool)

        # pad to the batch size to avoid triggering recompilation of the IK solver
        if batch_size < self.policy_config.grasp_feasibility_batch_size:
            n_pad = self.policy_config.grasp_feasibility_batch_size - batch_size
            pose = np.concatenate([pose, np.broadcast_to(pose[-1:], (n_pad, 4, 4))])

        robot_view = self.task.env.current_robot.robot_view
        parallel_kinematics = self.task._env.robots[0].parallel_kinematics
        jp_dicts = parallel_kinematics.ik(
            pose,
            robot_view.get_qpos_dict(),
            robot_view.base.pose,
            rel_to_base=False,
            posture_weight=0.0,
        )
        return np.array([jp_dict is not None for jp_dict in jp_dicts[:batch_size]])
    else:
        assert pose.shape == (4, 4)
        robot_view = self.task.env.current_robot.robot_view
        kinematics = self.task.env.current_robot.kinematics
        gripper_mg_id = robot_view.get_gripper_movegroup_ids()[0]
        jp_dict = kinematics.ik(
            gripper_mg_id,
            pose,
            robot_view.move_group_ids(),
            robot_view.get_qpos_dict(),
            base_pose=robot_view.base.pose,
        )
        return jp_dict is not None

get_action ¶

get_action(info: dict[str, Any]) -> dict[str, Any]

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def get_action(self, info: dict[str, Any]) -> dict[str, Any]:
    if self._check_for_failures():
        return self._handle_failure()

    # get the action from the current action primitive, absorbing 0-length segments as necessary
    while self.action_idx < len(self.action_primitives):  # guaranteed to terminate
        act_prim = self.action_primitives[self.action_idx]
        proceed = act_prim.execute()
        # cannot have a 0-length segment that doesn't proceed
        assert proceed or act_prim.duration > 0.0
        if proceed:
            self.action_idx += 1
        if not proceed or act_prim.duration > 0.0:
            break
    if self.action_idx < len(self.action_primitives):
        action = act_prim.get_current_action()
    else:
        action = self.action_primitives[-1].get_current_action()
        action["done"] = True
    return action

get_all_phases ¶

get_all_phases()

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def get_all_phases(self):
    phases = super().get_all_phases()
    # Collect all possible phases here, even those not used for a particular trajectory computation
    new_phases = {
        "gripper-open": 1,
        "pregrasp": 2,
        "grasp": 3,
        "gripper-close": 4,
        "lift": 5,
        "preplace": 6,
        "place": 7,
        "retreat": 8,
        "go_home": 9,
    }
    phases.update(new_phases)
    return phases

get_info ¶

get_info() -> dict

Get additional information from the policy. Called after episode ended. This method can be overridden by subclasses to provide extra information about the policy's state. Must be json serializable.

Returns:

Type	Description
`dict`	A dictionary containing additional information about the policy.

Source code in molmo_spaces/policy/base_policy.py

def get_info(self) -> dict:
    """
    Get additional information from the policy. Called after episode ended. This method can be
    overridden by subclasses to provide extra information about the policy's state.
    Must be json serializable.

    Returns:
        A dictionary containing additional information about the policy.
    """
    return {}

get_phase ¶

get_phase() -> str

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def get_phase(self) -> str:
    if self.action_idx < len(self.action_primitives):
        act_prim = self.action_primitives[self.action_idx]
    else:
        act_prim = self.action_primitives[-1]
    return act_prim.get_current_phase()

reset ¶

reset(reset_retries: bool = True)

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def reset(self, reset_retries: bool = True):
    if not self.ik_warmed_up:
        with Timer() as warmup_time:
            self.task.env.current_robot.parallel_kinematics.warmup_ik(
                self.policy_config.grasp_feasibility_batch_size
            )
        self.ik_warmed_up = True
        log.info(f"Warmed up parallel IK solver in {warmup_time.value:.3f}s")

    self.action_primitives = self._compute_trajectory()

    self.action_idx = 0
    if reset_retries:
        self.retry_count = 0

    self.target_poses = {}
    for action_primitive in self.action_primitives:
        if isinstance(action_primitive, TCPMoveSequence):
            for move_segment in action_primitive.move_segments:
                self.target_poses[move_segment.name] = move_segment.end_pose
        elif isinstance(action_primitive, JointMoveSequence):
            gripper_mg_id = self.robot_view.get_gripper_movegroup_ids()[0]
            kinematics = self.task.env.current_robot.kinematics
            for move_segment in action_primitive.move_segments:
                end_qpos = move_segment.end_qpos
                base_pose = self.robot_view.base.pose
                self.target_poses[move_segment.name] = kinematics.fk(end_qpos, base_pose)[
                    gripper_mg_id
                ]

pick_and_place_color_planner_policy ¶

Classes:

Name	Description
`PickAndPlaceColorPlannerPolicy`	Planner policy for pick and place color task.

PickAndPlaceColorPlannerPolicy ¶

PickAndPlaceColorPlannerPolicy(config: MlSpacesExpConfig, task: BaseMujocoTask)

Bases: PickAndPlacePlannerPolicy

Planner policy for pick and place color task.

Methods:

Name	Description
`add_auxiliary_objects`
`check_feasible_ik`
`get_action`
`get_all_phases`
`get_info`	Get additional information from the policy. Called after episode ended. This method can be
`get_phase`
`reset`

Attributes:

Name	Type	Description
`action_idx`
`action_primitives`
`config`
`ik_warmed_up`
`planners`
`policy_config`
`retry_count`
`robot_view`
`sequential_ik_failures`
`target_poses`
`task`

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def __init__(self, config: MlSpacesExpConfig, task: BaseMujocoTask) -> None:
    assert isinstance(config.policy_config, ObjectManipulationPlannerPolicyConfig)
    super().__init__(config, task)
    self.policy_config = config.policy_config
    self.robot_view = task.env.current_robot.robot_view

    self.action_primitives = []
    self.action_idx = 0
    self.retry_count = 0
    self.target_poses = {}
    self.ik_warmed_up = False
    self.sequential_ik_failures = 0

action_idx instance-attribute ¶

action_idx = 0

action_primitives instance-attribute ¶

action_primitives = []

config instance-attribute ¶

config = config

ik_warmed_up instance-attribute ¶

ik_warmed_up = False

planners property ¶

planners

policy_config instance-attribute ¶

policy_config = policy_config

retry_count instance-attribute ¶

retry_count = 0

robot_view instance-attribute ¶

robot_view = robot_view

sequential_ik_failures instance-attribute ¶

sequential_ik_failures = 0

target_poses instance-attribute ¶

target_poses = {}

task instance-attribute ¶

task = task

add_auxiliary_objects staticmethod ¶

add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

@staticmethod
def add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None:
    PlannerPolicy.add_auxiliary_objects(config, spec)
    policy_config = config.policy_config
    assert isinstance(policy_config, ObjectManipulationPlannerPolicyConfig)
    if policy_config.filter_colliding_grasps:
        add_grasp_collision_bodies(
            spec,
            policy_config.grasp_collision_batch_size,
            policy_config.grasp_width,
            policy_config.grasp_length,
            policy_config.grasp_height,
            np.array(policy_config.grasp_base_pos),
        )

check_feasible_ik ¶

check_feasible_ik(pose: ndarray) -> bool

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def check_feasible_ik(self, pose: np.ndarray) -> bool:
    if pose.ndim > 2:
        assert pose.shape[1:] == (4, 4)
        batch_size = pose.shape[0]
        assert batch_size <= self.policy_config.grasp_feasibility_batch_size

        if batch_size == 0:
            return np.array([], dtype=bool)

        # pad to the batch size to avoid triggering recompilation of the IK solver
        if batch_size < self.policy_config.grasp_feasibility_batch_size:
            n_pad = self.policy_config.grasp_feasibility_batch_size - batch_size
            pose = np.concatenate([pose, np.broadcast_to(pose[-1:], (n_pad, 4, 4))])

        robot_view = self.task.env.current_robot.robot_view
        parallel_kinematics = self.task._env.robots[0].parallel_kinematics
        jp_dicts = parallel_kinematics.ik(
            pose,
            robot_view.get_qpos_dict(),
            robot_view.base.pose,
            rel_to_base=False,
            posture_weight=0.0,
        )
        return np.array([jp_dict is not None for jp_dict in jp_dicts[:batch_size]])
    else:
        assert pose.shape == (4, 4)
        robot_view = self.task.env.current_robot.robot_view
        kinematics = self.task.env.current_robot.kinematics
        gripper_mg_id = robot_view.get_gripper_movegroup_ids()[0]
        jp_dict = kinematics.ik(
            gripper_mg_id,
            pose,
            robot_view.move_group_ids(),
            robot_view.get_qpos_dict(),
            base_pose=robot_view.base.pose,
        )
        return jp_dict is not None

get_action ¶

get_action(info: dict[str, Any]) -> dict[str, Any]

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def get_action(self, info: dict[str, Any]) -> dict[str, Any]:
    if self._check_for_failures():
        return self._handle_failure()

    # get the action from the current action primitive, absorbing 0-length segments as necessary
    while self.action_idx < len(self.action_primitives):  # guaranteed to terminate
        act_prim = self.action_primitives[self.action_idx]
        proceed = act_prim.execute()
        # cannot have a 0-length segment that doesn't proceed
        assert proceed or act_prim.duration > 0.0
        if proceed:
            self.action_idx += 1
        if not proceed or act_prim.duration > 0.0:
            break
    if self.action_idx < len(self.action_primitives):
        action = act_prim.get_current_action()
    else:
        action = self.action_primitives[-1].get_current_action()
        action["done"] = True
    return action

get_all_phases ¶

get_all_phases()

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def get_all_phases(self):
    phases = super().get_all_phases()
    # Collect all possible phases here, even those not used for a particular trajectory computation
    new_phases = {
        "gripper-open": 1,
        "pregrasp": 2,
        "grasp": 3,
        "gripper-close": 4,
        "lift": 5,
        "preplace": 6,
        "place": 7,
        "retreat": 8,
        "go_home": 9,
    }
    phases.update(new_phases)
    return phases

get_info ¶

get_info() -> dict

Get additional information from the policy. Called after episode ended. This method can be overridden by subclasses to provide extra information about the policy's state. Must be json serializable.

Returns:

Type	Description
`dict`	A dictionary containing additional information about the policy.

Source code in molmo_spaces/policy/base_policy.py

def get_info(self) -> dict:
    """
    Get additional information from the policy. Called after episode ended. This method can be
    overridden by subclasses to provide extra information about the policy's state.
    Must be json serializable.

    Returns:
        A dictionary containing additional information about the policy.
    """
    return {}

get_phase ¶

get_phase() -> str

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def get_phase(self) -> str:
    if self.action_idx < len(self.action_primitives):
        act_prim = self.action_primitives[self.action_idx]
    else:
        act_prim = self.action_primitives[-1]
    return act_prim.get_current_phase()

reset ¶

reset(reset_retries: bool = True)

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def reset(self, reset_retries: bool = True):
    if not self.ik_warmed_up:
        with Timer() as warmup_time:
            self.task.env.current_robot.parallel_kinematics.warmup_ik(
                self.policy_config.grasp_feasibility_batch_size
            )
        self.ik_warmed_up = True
        log.info(f"Warmed up parallel IK solver in {warmup_time.value:.3f}s")

    self.action_primitives = self._compute_trajectory()

    self.action_idx = 0
    if reset_retries:
        self.retry_count = 0

    self.target_poses = {}
    for action_primitive in self.action_primitives:
        if isinstance(action_primitive, TCPMoveSequence):
            for move_segment in action_primitive.move_segments:
                self.target_poses[move_segment.name] = move_segment.end_pose
        elif isinstance(action_primitive, JointMoveSequence):
            gripper_mg_id = self.robot_view.get_gripper_movegroup_ids()[0]
            kinematics = self.task.env.current_robot.kinematics
            for move_segment in action_primitive.move_segments:
                end_qpos = move_segment.end_qpos
                base_pose = self.robot_view.base.pose
                self.target_poses[move_segment.name] = kinematics.fk(end_qpos, base_pose)[
                    gripper_mg_id
                ]

pick_and_place_next_to_planner_policy ¶

Classes:

Name	Description
`PickAndPlaceNextToPlannerPolicy`	Planner policy for pick and place next to task.

PickAndPlaceNextToPlannerPolicy ¶

PickAndPlaceNextToPlannerPolicy(config: MlSpacesExpConfig, task: BaseMujocoTask)

Bases: PickAndPlacePlannerPolicy

Planner policy for pick and place next to task.

Methods:

Name	Description
`add_auxiliary_objects`
`check_feasible_ik`
`get_action`
`get_all_phases`
`get_info`	Get additional information from the policy. Called after episode ended. This method can be
`get_phase`
`reset`

Attributes:

Name	Type	Description
`action_idx`
`action_primitives`
`config`
`ik_warmed_up`
`planners`
`policy_config`
`retry_count`
`robot_view`
`sequential_ik_failures`
`target_poses`
`task`

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def __init__(self, config: MlSpacesExpConfig, task: BaseMujocoTask) -> None:
    assert isinstance(config.policy_config, ObjectManipulationPlannerPolicyConfig)
    super().__init__(config, task)
    self.policy_config = config.policy_config
    self.robot_view = task.env.current_robot.robot_view

    self.action_primitives = []
    self.action_idx = 0
    self.retry_count = 0
    self.target_poses = {}
    self.ik_warmed_up = False
    self.sequential_ik_failures = 0

action_idx instance-attribute ¶

action_idx = 0

action_primitives instance-attribute ¶

action_primitives = []

config instance-attribute ¶

config = config

ik_warmed_up instance-attribute ¶

ik_warmed_up = False

planners property ¶

planners

policy_config instance-attribute ¶

policy_config = policy_config

retry_count instance-attribute ¶

retry_count = 0

robot_view instance-attribute ¶

robot_view = robot_view

sequential_ik_failures instance-attribute ¶

sequential_ik_failures = 0

target_poses instance-attribute ¶

target_poses = {}

task instance-attribute ¶

task = task

add_auxiliary_objects staticmethod ¶

add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

@staticmethod
def add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None:
    PlannerPolicy.add_auxiliary_objects(config, spec)
    policy_config = config.policy_config
    assert isinstance(policy_config, ObjectManipulationPlannerPolicyConfig)
    if policy_config.filter_colliding_grasps:
        add_grasp_collision_bodies(
            spec,
            policy_config.grasp_collision_batch_size,
            policy_config.grasp_width,
            policy_config.grasp_length,
            policy_config.grasp_height,
            np.array(policy_config.grasp_base_pos),
        )

check_feasible_ik ¶

check_feasible_ik(pose: ndarray) -> bool

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def check_feasible_ik(self, pose: np.ndarray) -> bool:
    if pose.ndim > 2:
        assert pose.shape[1:] == (4, 4)
        batch_size = pose.shape[0]
        assert batch_size <= self.policy_config.grasp_feasibility_batch_size

        if batch_size == 0:
            return np.array([], dtype=bool)

        # pad to the batch size to avoid triggering recompilation of the IK solver
        if batch_size < self.policy_config.grasp_feasibility_batch_size:
            n_pad = self.policy_config.grasp_feasibility_batch_size - batch_size
            pose = np.concatenate([pose, np.broadcast_to(pose[-1:], (n_pad, 4, 4))])

        robot_view = self.task.env.current_robot.robot_view
        parallel_kinematics = self.task._env.robots[0].parallel_kinematics
        jp_dicts = parallel_kinematics.ik(
            pose,
            robot_view.get_qpos_dict(),
            robot_view.base.pose,
            rel_to_base=False,
            posture_weight=0.0,
        )
        return np.array([jp_dict is not None for jp_dict in jp_dicts[:batch_size]])
    else:
        assert pose.shape == (4, 4)
        robot_view = self.task.env.current_robot.robot_view
        kinematics = self.task.env.current_robot.kinematics
        gripper_mg_id = robot_view.get_gripper_movegroup_ids()[0]
        jp_dict = kinematics.ik(
            gripper_mg_id,
            pose,
            robot_view.move_group_ids(),
            robot_view.get_qpos_dict(),
            base_pose=robot_view.base.pose,
        )
        return jp_dict is not None

get_action ¶

get_action(info: dict[str, Any]) -> dict[str, Any]

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def get_action(self, info: dict[str, Any]) -> dict[str, Any]:
    if self._check_for_failures():
        return self._handle_failure()

    # get the action from the current action primitive, absorbing 0-length segments as necessary
    while self.action_idx < len(self.action_primitives):  # guaranteed to terminate
        act_prim = self.action_primitives[self.action_idx]
        proceed = act_prim.execute()
        # cannot have a 0-length segment that doesn't proceed
        assert proceed or act_prim.duration > 0.0
        if proceed:
            self.action_idx += 1
        if not proceed or act_prim.duration > 0.0:
            break
    if self.action_idx < len(self.action_primitives):
        action = act_prim.get_current_action()
    else:
        action = self.action_primitives[-1].get_current_action()
        action["done"] = True
    return action

get_all_phases ¶

get_all_phases()

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def get_all_phases(self):
    phases = super().get_all_phases()
    # Collect all possible phases here, even those not used for a particular trajectory computation
    new_phases = {
        "gripper-open": 1,
        "pregrasp": 2,
        "grasp": 3,
        "gripper-close": 4,
        "lift": 5,
        "preplace": 6,
        "place": 7,
        "retreat": 8,
        "go_home": 9,
    }
    phases.update(new_phases)
    return phases

get_info ¶

get_info() -> dict

Get additional information from the policy. Called after episode ended. This method can be overridden by subclasses to provide extra information about the policy's state. Must be json serializable.

Returns:

Type	Description
`dict`	A dictionary containing additional information about the policy.

Source code in molmo_spaces/policy/base_policy.py

def get_info(self) -> dict:
    """
    Get additional information from the policy. Called after episode ended. This method can be
    overridden by subclasses to provide extra information about the policy's state.
    Must be json serializable.

    Returns:
        A dictionary containing additional information about the policy.
    """
    return {}

get_phase ¶

get_phase() -> str

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def get_phase(self) -> str:
    if self.action_idx < len(self.action_primitives):
        act_prim = self.action_primitives[self.action_idx]
    else:
        act_prim = self.action_primitives[-1]
    return act_prim.get_current_phase()

reset ¶

reset(reset_retries: bool = True)

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def reset(self, reset_retries: bool = True):
    if not self.ik_warmed_up:
        with Timer() as warmup_time:
            self.task.env.current_robot.parallel_kinematics.warmup_ik(
                self.policy_config.grasp_feasibility_batch_size
            )
        self.ik_warmed_up = True
        log.info(f"Warmed up parallel IK solver in {warmup_time.value:.3f}s")

    self.action_primitives = self._compute_trajectory()

    self.action_idx = 0
    if reset_retries:
        self.retry_count = 0

    self.target_poses = {}
    for action_primitive in self.action_primitives:
        if isinstance(action_primitive, TCPMoveSequence):
            for move_segment in action_primitive.move_segments:
                self.target_poses[move_segment.name] = move_segment.end_pose
        elif isinstance(action_primitive, JointMoveSequence):
            gripper_mg_id = self.robot_view.get_gripper_movegroup_ids()[0]
            kinematics = self.task.env.current_robot.kinematics
            for move_segment in action_primitive.move_segments:
                end_qpos = move_segment.end_qpos
                base_pose = self.robot_view.base.pose
                self.target_poses[move_segment.name] = kinematics.fk(end_qpos, base_pose)[
                    gripper_mg_id
                ]

pick_and_place_planner_policy ¶

Classes:

Name	Description
`PickAndPlacePlannerPolicy`

Attributes:

Name	Type	Description
`log`

log `module-attribute` ¶

log = getLogger(__name__)

PickAndPlacePlannerPolicy ¶

PickAndPlacePlannerPolicy(config: MlSpacesExpConfig, task: BaseMujocoTask)

Bases: BaseObjectManipulationPlannerPolicy

Methods:

Name	Description
`add_auxiliary_objects`
`check_feasible_ik`
`get_action`
`get_all_phases`
`get_info`	Get additional information from the policy. Called after episode ended. This method can be
`get_phase`
`reset`

Attributes:

Name	Type	Description
`action_idx`
`action_primitives`
`config`
`ik_warmed_up`
`planners`
`policy_config`
`retry_count`
`robot_view`
`sequential_ik_failures`
`target_poses`
`task`

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def __init__(self, config: MlSpacesExpConfig, task: BaseMujocoTask) -> None:
    assert isinstance(config.policy_config, ObjectManipulationPlannerPolicyConfig)
    super().__init__(config, task)
    self.policy_config = config.policy_config
    self.robot_view = task.env.current_robot.robot_view

    self.action_primitives = []
    self.action_idx = 0
    self.retry_count = 0
    self.target_poses = {}
    self.ik_warmed_up = False
    self.sequential_ik_failures = 0

action_idx instance-attribute ¶

action_idx = 0

action_primitives instance-attribute ¶

action_primitives = []

config instance-attribute ¶

config = config

ik_warmed_up instance-attribute ¶

ik_warmed_up = False

planners property ¶

planners

policy_config instance-attribute ¶

policy_config = policy_config

retry_count instance-attribute ¶

retry_count = 0

robot_view instance-attribute ¶

robot_view = robot_view

sequential_ik_failures instance-attribute ¶

sequential_ik_failures = 0

target_poses instance-attribute ¶

target_poses = {}

task instance-attribute ¶

task = task

add_auxiliary_objects staticmethod ¶

add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

@staticmethod
def add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None:
    PlannerPolicy.add_auxiliary_objects(config, spec)
    policy_config = config.policy_config
    assert isinstance(policy_config, ObjectManipulationPlannerPolicyConfig)
    if policy_config.filter_colliding_grasps:
        add_grasp_collision_bodies(
            spec,
            policy_config.grasp_collision_batch_size,
            policy_config.grasp_width,
            policy_config.grasp_length,
            policy_config.grasp_height,
            np.array(policy_config.grasp_base_pos),
        )

check_feasible_ik ¶

check_feasible_ik(pose: ndarray) -> bool

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def check_feasible_ik(self, pose: np.ndarray) -> bool:
    if pose.ndim > 2:
        assert pose.shape[1:] == (4, 4)
        batch_size = pose.shape[0]
        assert batch_size <= self.policy_config.grasp_feasibility_batch_size

        if batch_size == 0:
            return np.array([], dtype=bool)

        # pad to the batch size to avoid triggering recompilation of the IK solver
        if batch_size < self.policy_config.grasp_feasibility_batch_size:
            n_pad = self.policy_config.grasp_feasibility_batch_size - batch_size
            pose = np.concatenate([pose, np.broadcast_to(pose[-1:], (n_pad, 4, 4))])

        robot_view = self.task.env.current_robot.robot_view
        parallel_kinematics = self.task._env.robots[0].parallel_kinematics
        jp_dicts = parallel_kinematics.ik(
            pose,
            robot_view.get_qpos_dict(),
            robot_view.base.pose,
            rel_to_base=False,
            posture_weight=0.0,
        )
        return np.array([jp_dict is not None for jp_dict in jp_dicts[:batch_size]])
    else:
        assert pose.shape == (4, 4)
        robot_view = self.task.env.current_robot.robot_view
        kinematics = self.task.env.current_robot.kinematics
        gripper_mg_id = robot_view.get_gripper_movegroup_ids()[0]
        jp_dict = kinematics.ik(
            gripper_mg_id,
            pose,
            robot_view.move_group_ids(),
            robot_view.get_qpos_dict(),
            base_pose=robot_view.base.pose,
        )
        return jp_dict is not None

get_action ¶

get_action(info: dict[str, Any]) -> dict[str, Any]

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def get_action(self, info: dict[str, Any]) -> dict[str, Any]:
    if self._check_for_failures():
        return self._handle_failure()

    # get the action from the current action primitive, absorbing 0-length segments as necessary
    while self.action_idx < len(self.action_primitives):  # guaranteed to terminate
        act_prim = self.action_primitives[self.action_idx]
        proceed = act_prim.execute()
        # cannot have a 0-length segment that doesn't proceed
        assert proceed or act_prim.duration > 0.0
        if proceed:
            self.action_idx += 1
        if not proceed or act_prim.duration > 0.0:
            break
    if self.action_idx < len(self.action_primitives):
        action = act_prim.get_current_action()
    else:
        action = self.action_primitives[-1].get_current_action()
        action["done"] = True
    return action

get_all_phases ¶

get_all_phases()

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def get_all_phases(self):
    phases = super().get_all_phases()
    # Collect all possible phases here, even those not used for a particular trajectory computation
    new_phases = {
        "gripper-open": 1,
        "pregrasp": 2,
        "grasp": 3,
        "gripper-close": 4,
        "lift": 5,
        "preplace": 6,
        "place": 7,
        "retreat": 8,
        "go_home": 9,
    }
    phases.update(new_phases)
    return phases

get_info ¶

get_info() -> dict

Get additional information from the policy. Called after episode ended. This method can be overridden by subclasses to provide extra information about the policy's state. Must be json serializable.

Returns:

Type	Description
`dict`	A dictionary containing additional information about the policy.

Source code in molmo_spaces/policy/base_policy.py

def get_info(self) -> dict:
    """
    Get additional information from the policy. Called after episode ended. This method can be
    overridden by subclasses to provide extra information about the policy's state.
    Must be json serializable.

    Returns:
        A dictionary containing additional information about the policy.
    """
    return {}

get_phase ¶

get_phase() -> str

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def get_phase(self) -> str:
    if self.action_idx < len(self.action_primitives):
        act_prim = self.action_primitives[self.action_idx]
    else:
        act_prim = self.action_primitives[-1]
    return act_prim.get_current_phase()

reset ¶

reset(reset_retries: bool = True)

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def reset(self, reset_retries: bool = True):
    if not self.ik_warmed_up:
        with Timer() as warmup_time:
            self.task.env.current_robot.parallel_kinematics.warmup_ik(
                self.policy_config.grasp_feasibility_batch_size
            )
        self.ik_warmed_up = True
        log.info(f"Warmed up parallel IK solver in {warmup_time.value:.3f}s")

    self.action_primitives = self._compute_trajectory()

    self.action_idx = 0
    if reset_retries:
        self.retry_count = 0

    self.target_poses = {}
    for action_primitive in self.action_primitives:
        if isinstance(action_primitive, TCPMoveSequence):
            for move_segment in action_primitive.move_segments:
                self.target_poses[move_segment.name] = move_segment.end_pose
        elif isinstance(action_primitive, JointMoveSequence):
            gripper_mg_id = self.robot_view.get_gripper_movegroup_ids()[0]
            kinematics = self.task.env.current_robot.kinematics
            for move_segment in action_primitive.move_segments:
                end_qpos = move_segment.end_qpos
                base_pose = self.robot_view.base.pose
                self.target_poses[move_segment.name] = kinematics.fk(end_qpos, base_pose)[
                    gripper_mg_id
                ]

pick_planner_policy ¶

Classes:

Name	Description
`PickPlannerPolicy`

Attributes:

Name	Type	Description
`log`

log `module-attribute` ¶

log = getLogger(__name__)

PickPlannerPolicy ¶

PickPlannerPolicy(config: MlSpacesExpConfig, task: BaseMujocoTask)

Bases: BaseObjectManipulationPlannerPolicy

Methods:

Name	Description
`add_auxiliary_objects`
`check_feasible_ik`
`get_action`
`get_all_phases`
`get_info`	Get additional information from the policy. Called after episode ended. This method can be
`get_phase`
`reset`

Attributes:

Name	Type	Description
`action_idx`
`action_primitives`
`config`
`ik_warmed_up`
`planners`
`policy_config`
`retry_count`
`robot_view`
`sequential_ik_failures`
`target_poses`
`task`

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def __init__(self, config: MlSpacesExpConfig, task: BaseMujocoTask) -> None:
    assert isinstance(config.policy_config, ObjectManipulationPlannerPolicyConfig)
    super().__init__(config, task)
    self.policy_config = config.policy_config
    self.robot_view = task.env.current_robot.robot_view

    self.action_primitives = []
    self.action_idx = 0
    self.retry_count = 0
    self.target_poses = {}
    self.ik_warmed_up = False
    self.sequential_ik_failures = 0

action_idx instance-attribute ¶

action_idx = 0

action_primitives instance-attribute ¶

action_primitives = []

config instance-attribute ¶

config = config

ik_warmed_up instance-attribute ¶

ik_warmed_up = False

planners property ¶

planners

policy_config instance-attribute ¶

policy_config = policy_config

retry_count instance-attribute ¶

retry_count = 0

robot_view instance-attribute ¶

robot_view = robot_view

sequential_ik_failures instance-attribute ¶

sequential_ik_failures = 0

target_poses instance-attribute ¶

target_poses = {}

task instance-attribute ¶

task = task

add_auxiliary_objects staticmethod ¶

add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

@staticmethod
def add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None:
    PlannerPolicy.add_auxiliary_objects(config, spec)
    policy_config = config.policy_config
    assert isinstance(policy_config, ObjectManipulationPlannerPolicyConfig)
    if policy_config.filter_colliding_grasps:
        add_grasp_collision_bodies(
            spec,
            policy_config.grasp_collision_batch_size,
            policy_config.grasp_width,
            policy_config.grasp_length,
            policy_config.grasp_height,
            np.array(policy_config.grasp_base_pos),
        )

check_feasible_ik ¶

check_feasible_ik(pose: ndarray) -> bool

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def check_feasible_ik(self, pose: np.ndarray) -> bool:
    if pose.ndim > 2:
        assert pose.shape[1:] == (4, 4)
        batch_size = pose.shape[0]
        assert batch_size <= self.policy_config.grasp_feasibility_batch_size

        if batch_size == 0:
            return np.array([], dtype=bool)

        # pad to the batch size to avoid triggering recompilation of the IK solver
        if batch_size < self.policy_config.grasp_feasibility_batch_size:
            n_pad = self.policy_config.grasp_feasibility_batch_size - batch_size
            pose = np.concatenate([pose, np.broadcast_to(pose[-1:], (n_pad, 4, 4))])

        robot_view = self.task.env.current_robot.robot_view
        parallel_kinematics = self.task._env.robots[0].parallel_kinematics
        jp_dicts = parallel_kinematics.ik(
            pose,
            robot_view.get_qpos_dict(),
            robot_view.base.pose,
            rel_to_base=False,
            posture_weight=0.0,
        )
        return np.array([jp_dict is not None for jp_dict in jp_dicts[:batch_size]])
    else:
        assert pose.shape == (4, 4)
        robot_view = self.task.env.current_robot.robot_view
        kinematics = self.task.env.current_robot.kinematics
        gripper_mg_id = robot_view.get_gripper_movegroup_ids()[0]
        jp_dict = kinematics.ik(
            gripper_mg_id,
            pose,
            robot_view.move_group_ids(),
            robot_view.get_qpos_dict(),
            base_pose=robot_view.base.pose,
        )
        return jp_dict is not None

get_action ¶

get_action(info: dict[str, Any]) -> dict[str, Any]

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def get_action(self, info: dict[str, Any]) -> dict[str, Any]:
    if self._check_for_failures():
        return self._handle_failure()

    # get the action from the current action primitive, absorbing 0-length segments as necessary
    while self.action_idx < len(self.action_primitives):  # guaranteed to terminate
        act_prim = self.action_primitives[self.action_idx]
        proceed = act_prim.execute()
        # cannot have a 0-length segment that doesn't proceed
        assert proceed or act_prim.duration > 0.0
        if proceed:
            self.action_idx += 1
        if not proceed or act_prim.duration > 0.0:
            break
    if self.action_idx < len(self.action_primitives):
        action = act_prim.get_current_action()
    else:
        action = self.action_primitives[-1].get_current_action()
        action["done"] = True
    return action

get_all_phases ¶

get_all_phases()

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def get_all_phases(self):
    phases = super().get_all_phases()
    # Collect all possible phases here, even those not used for a particular trajectory computation
    new_phases = {
        "gripper-open": 1,
        "pregrasp": 2,
        "grasp": 3,
        "gripper-close": 4,
        "lift": 5,
        "preplace": 6,
        "place": 7,
        "retreat": 8,
        "go_home": 9,
    }
    phases.update(new_phases)
    return phases

get_info ¶

get_info() -> dict

Get additional information from the policy. Called after episode ended. This method can be overridden by subclasses to provide extra information about the policy's state. Must be json serializable.

Returns:

Type	Description
`dict`	A dictionary containing additional information about the policy.

Source code in molmo_spaces/policy/base_policy.py

def get_info(self) -> dict:
    """
    Get additional information from the policy. Called after episode ended. This method can be
    overridden by subclasses to provide extra information about the policy's state.
    Must be json serializable.

    Returns:
        A dictionary containing additional information about the policy.
    """
    return {}

get_phase ¶

get_phase() -> str

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def get_phase(self) -> str:
    if self.action_idx < len(self.action_primitives):
        act_prim = self.action_primitives[self.action_idx]
    else:
        act_prim = self.action_primitives[-1]
    return act_prim.get_current_phase()

reset ¶

reset(reset_retries: bool = True)

Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py

def reset(self, reset_retries: bool = True):
    if not self.ik_warmed_up:
        with Timer() as warmup_time:
            self.task.env.current_robot.parallel_kinematics.warmup_ik(
                self.policy_config.grasp_feasibility_batch_size
            )
        self.ik_warmed_up = True
        log.info(f"Warmed up parallel IK solver in {warmup_time.value:.3f}s")

    self.action_primitives = self._compute_trajectory()

    self.action_idx = 0
    if reset_retries:
        self.retry_count = 0

    self.target_poses = {}
    for action_primitive in self.action_primitives:
        if isinstance(action_primitive, TCPMoveSequence):
            for move_segment in action_primitive.move_segments:
                self.target_poses[move_segment.name] = move_segment.end_pose
        elif isinstance(action_primitive, JointMoveSequence):
            gripper_mg_id = self.robot_view.get_gripper_movegroup_ids()[0]
            kinematics = self.task.env.current_robot.kinematics
            for move_segment in action_primitive.move_segments:
                end_qpos = move_segment.end_qpos
                base_pose = self.robot_view.base.pose
                self.target_poses[move_segment.name] = kinematics.fk(end_qpos, base_pose)[
                    gripper_mg_id
                ]

opening_solver ¶

Classes:

Name	Description
`DoorOpeningPhase`	Enumeration of door opening phases.
`DoorOpeningPlannerPolicy`	Planner policy for RBY1 door opening tasks using motion planning.

Attributes:

Name	Type	Description
`log`

log `module-attribute` ¶

log = getLogger(__name__)

DoorOpeningPhase ¶

Bases: IntEnum

Enumeration of door opening phases.

Attributes:

Name	Type	Description
`COMPLETE`
`GRASP_HANDLE`
`NAVIGATE_TO_DOOR`
`OPEN_DOOR`
`REACH_GRASP`
`REACH_PRE_GRASP`
`RECOVERY`
`RELEASE_HANDLE`

COMPLETE `class-attribute` `instance-attribute` ¶

COMPLETE = 6

GRASP_HANDLE `class-attribute` `instance-attribute` ¶

GRASP_HANDLE = 3

NAVIGATE_TO_DOOR `class-attribute` `instance-attribute` ¶

NAVIGATE_TO_DOOR = 0

OPEN_DOOR `class-attribute` `instance-attribute` ¶

OPEN_DOOR = 4

REACH_GRASP `class-attribute` `instance-attribute` ¶

REACH_GRASP = 2

REACH_PRE_GRASP `class-attribute` `instance-attribute` ¶

REACH_PRE_GRASP = 1

RECOVERY `class-attribute` `instance-attribute` ¶

RECOVERY = 7

RELEASE_HANDLE `class-attribute` `instance-attribute` ¶

RELEASE_HANDLE = 5

DoorOpeningPlannerPolicy ¶

DoorOpeningPlannerPolicy(config: MlSpacesExpConfig, task: DoorOpeningTask | None = None)

Bases: CuroboPlannerPolicy

Planner policy for RBY1 door opening tasks using motion planning.

Inherits common motion planning functionality from CuroboPlannerPolicy.

Methods:

Name	Description
`add_auxiliary_objects`	Add auxiliary objects to the scene that might be required for the policy.
`batch_plan_trajectory`	Plan trajectory using batch motion planning.
`clip_to_velocity_constraint`	Clip action to respect velocity constraints.
`get_action`	Get the next action based on current task state and phase.
`get_all_phases`	Return a dictionary mapping all phase names to their enum values.
`get_info`	Get additional information from the policy. Called after episode ended. This method can be
`get_look_at_action`
`get_phase`	Return the current phase name as a string.
`reset`	Reset the policy state.
`select_arm`	Select which arm to use based on distance to pickup object.
`select_arm_for_opening`	Select the arm for opening the door.
`solve_ik`	Solve inverse kinematics for a target pose and create interpolated trajectory.
`visualize_world_config_mesh`	Visualize the world configuration as a mesh file.

Attributes:

Name	Type	Description
`arm_end_idx`	`int`
`arm_side`
`arm_start_idx`	`int`
`articulate_deltas`
`config`
`curr_articulation_step`
`current_gripper_command`	`dict[str, float]`
`current_phase`
`first_pushing_articulation_deltas`
`grasping_timesteps`
`is_done`	`bool`	Property to expose completion state for task checking.
`joint_position_tolerance`
`left_motion_planner`
`max_steps_per_waypoint`
`num_steps_for_articulation`
`phase_name`	`str`	Property to expose current phase name for task tracking.
`phase_value`	`int`	Property to expose current phase name for task tracking.
`planned_trajectory`	`list[list[float]] \| None`
`planner`	`CuroboPlanner \| None`
`planner_joint_ranges`	`dict[str, tuple[int, int]]`
`planners`	`dict[str, CuroboPlanner]`
`planning_failures`
`pre_grasp_distance`
`profiler`
`recovery_step_count`
`retry_count`	`int`
`right_motion_planner`
`steps_spent_in_waypoint`	`int`
`task`
`trajectory_index`	`int`

Source code in molmo_spaces/policy/solvers/opening_solver.py

def __init__(
    self,
    config: MlSpacesExpConfig,
    task: DoorOpeningTask | None = None,
) -> None:
    super().__init__(config, task)
    self.task = task  # Added to help IDE typing

    # This policy uses both planners (unlike pick-and-place which lazy-loads one)
    self.left_motion_planner = CuroboPlanner(
        config=config.policy_config.left_curobo_planner_config
    )
    self.right_motion_planner = CuroboPlanner(
        config=config.policy_config.right_curobo_planner_config
    )

    # Planner policy state
    self.arm_side = None
    self.current_phase = DoorOpeningPhase.REACH_PRE_GRASP
    self.planning_failures = 0

    # Trajectory execution parameters
    self.max_steps_per_waypoint = self.config.policy_config.max_steps_per_waypoint
    self.joint_position_tolerance = self.config.policy_config.joint_position_tolerance

    # Door opening parameters
    self.pre_grasp_distance = self.config.policy_config.pre_grasp_distance
    self.articulate_deltas = self.config.policy_config.articulation_deltas
    self.first_pushing_articulation_deltas = (
        self.config.policy_config.first_pushing_articulation_deltas
    )

    # Phase-specific state
    self.grasping_timesteps = 0
    self.recovery_step_count = 0

    # Articulation tracking
    self.curr_articulation_step = 0
    self.num_steps_for_articulation = len(self.articulate_deltas)

arm_end_idx `instance-attribute` ¶

arm_end_idx: int = 0

arm_side `instance-attribute` ¶

arm_side = None

arm_start_idx `instance-attribute` ¶

arm_start_idx: int = 0

articulate_deltas `instance-attribute` ¶

articulate_deltas = articulation_deltas

config `instance-attribute` ¶

config = config

curr_articulation_step `instance-attribute` ¶

curr_articulation_step = 0

current_gripper_command `instance-attribute` ¶

current_gripper_command: dict[str, float] = {}

current_phase `instance-attribute` ¶

current_phase = REACH_PRE_GRASP

first_pushing_articulation_deltas `instance-attribute` ¶

first_pushing_articulation_deltas = first_pushing_articulation_deltas

grasping_timesteps `instance-attribute` ¶

grasping_timesteps = 0

is_done `property` ¶

is_done: bool

Property to expose completion state for task checking.

joint_position_tolerance `instance-attribute` ¶

joint_position_tolerance = joint_position_tolerance

left_motion_planner `instance-attribute` ¶

left_motion_planner = CuroboPlanner(config=left_curobo_planner_config)

max_steps_per_waypoint `instance-attribute` ¶

max_steps_per_waypoint = max_steps_per_waypoint

num_steps_for_articulation `instance-attribute` ¶

num_steps_for_articulation = len(articulate_deltas)

phase_name `property` ¶

phase_name: str

Property to expose current phase name for task tracking.

phase_value `property` ¶

phase_value: int

Property to expose current phase name for task tracking.

planned_trajectory `instance-attribute` ¶

planned_trajectory: list[list[float]] | None = None

planner `instance-attribute` ¶

planner: CuroboPlanner | None = None

planner_joint_ranges `instance-attribute` ¶

planner_joint_ranges: dict[str, tuple[int, int]] = {}

planners `property` ¶

planners: dict[str, CuroboPlanner]

planning_failures `instance-attribute` ¶

planning_failures = 0

pre_grasp_distance `instance-attribute` ¶

pre_grasp_distance = pre_grasp_distance

profiler `instance-attribute` ¶

profiler = Profiler()

recovery_step_count `instance-attribute` ¶

recovery_step_count = 0

retry_count `instance-attribute` ¶

retry_count: int = 0

right_motion_planner `instance-attribute` ¶

right_motion_planner = CuroboPlanner(config=right_curobo_planner_config)

steps_spent_in_waypoint `instance-attribute` ¶

steps_spent_in_waypoint: int = 0

task `instance-attribute` ¶

task = task

trajectory_index `instance-attribute` ¶

trajectory_index: int = 0

add_auxiliary_objects `staticmethod` ¶

add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py

@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None

batch_plan_trajectory ¶

batch_plan_trajectory() -> None

Plan trajectory using batch motion planning.

Uses the current phase to determine goal poses and plans trajectories in batches for efficiency. Sets self.planned_trajectory to the best trajectory found.

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py

def batch_plan_trajectory(self) -> None:
    """Plan trajectory using batch motion planning.

    Uses the current phase to determine goal poses and plans trajectories
    in batches for efficiency. Sets self.planned_trajectory to the best
    trajectory found.
    """
    init_config = np.concatenate(
        [
            np.zeros(3),
            self.task.env.current_robot.robot_view.get_move_group(
                f"{self.arm_side}_arm"
            ).joint_pos,
        ]
    )

    # Setup collision avoidance if enabled
    if getattr(self.config.policy_config, "enable_collision_avoidance", False):
        if hasattr(self, "_setup_collision_avoidance_config"):
            self._setup_collision_avoidance_config()

    # Get goal poses based on current phase (to be provided by subclass)
    goal_poses = self._get_batch_goal_poses()
    if goal_poses is None or len(goal_poses) == 0:
        log.warning("[BATCH PLAN] No goal poses available")
        return

    total = goal_poses.shape[0]
    batch_size = getattr(self.config.policy_config, "batch_size", 8)
    max_batches = getattr(self.config.policy_config, "max_batch_plan_attempts", 4)
    num_poses = min(max_batches * batch_size, total)
    num_batches = (num_poses + batch_size - 1) // batch_size

    all_successful_trajectories = []
    current_phase = getattr(self, "current_phase", "unknown")

    for batch_start in range(0, num_poses, batch_size):
        batch_end = min(batch_start + batch_size, num_poses)
        batch = goal_poses[batch_start:batch_end]

        if hasattr(self, "_show_poses"):
            self._show_poses(batch, style="tcp")
        if self.task.viewer:
            self.task.viewer.sync()

        log.info(
            f"Processing batch {batch_start // batch_size + 1}/{num_batches}: "
            f"poses {batch_start}-{batch_end - 1}"
        )

        # Transform poses to base frame and convert to 7D format
        batch_base_frame_7d = []
        for pose in batch:
            pose_base = self._target_pose_to_base_frame(pose)
            batch_base_frame_7d.append(pose_base.tolist())

        # Prepare batch inputs for planner
        batch_len = len(batch_base_frame_7d)
        start_states = [init_config.tolist()] * batch_len

        # Plan batch
        log.info(
            f"Planning batch of {batch_len} {current_phase} poses with {self.arm_side} arm"
        )
        result = self.planner.plan_batch(
            start_states,
            batch_base_frame_7d,
            verbose=False,
        )

        # Check for successes
        successes = result.success.cpu().numpy()
        log.info(
            f"{self.arm_side.capitalize()} arm: {np.sum(successes)}/{batch_len} "
            f"{current_phase} planning successes"
        )

        # Process successful trajectories
        if np.any(successes):
            optimized_plan = result.optimized_plan
            position = optimized_plan.position
            if position.ndim == 2:
                position = position.unsqueeze(0)

            for i in range(batch_len):
                if not successes[i]:
                    continue

                trajectory = []
                for t in range(position.shape[1]):
                    waypoint = position[i, t].cpu().tolist()
                    trajectory.append(waypoint)

                self._transform_traj_to_world_frame(trajectory)
                all_successful_trajectories.append(trajectory)

        if all_successful_trajectories:
            log.info(
                f"Found {len(all_successful_trajectories)} successful trajectories, "
                "skipping remaining batches"
            )
            break

    if all_successful_trajectories:
        self.planned_trajectory = self._select_best_trajectory(all_successful_trajectories)
    else:
        log.warning("[BATCH PLAN] No successful trajectories found across all batches")

clip_to_velocity_constraint ¶

clip_to_velocity_constraint(action: dict[str, Any]) -> dict[str, Any]

Clip action to respect velocity constraints.

Parameters:

Name	Type	Description	Default
`action`	`dict[str, Any]`	Dictionary of commanded joint positions by move group.	required

Returns:

Type	Description
`dict[str, Any]`	Clipped action dictionary.

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py

def clip_to_velocity_constraint(self, action: dict[str, Any]) -> dict[str, Any]:
    """Clip action to respect velocity constraints.

    Args:
        action: Dictionary of commanded joint positions by move group.

    Returns:
        Clipped action dictionary.
    """
    velocity_constraints = getattr(self.config.policy_config, "velocity_constraints", {})
    clipped_action = action.copy()

    for move_group, commanded_action in action.items():
        if move_group in velocity_constraints:
            max_velocity_rad_per_s = velocity_constraints[move_group]
            move_group_view = self.task.env.robots[0].robot_view.get_move_group(move_group)
            move_group_joint_pos = move_group_view.joint_pos.copy()

            # Calculate difference, handling angular wraparound for base theta
            diff = commanded_action - move_group_joint_pos
            if move_group == "base":
                # Normalize theta difference to [-π, π]
                diff[2] = np.arctan2(np.sin(diff[2]), np.cos(diff[2]))

            velocity_rad_per_s = diff * 10.0

            # Clip velocity and recalculate commanded action
            clipped_velocity = np.clip(
                velocity_rad_per_s, -max_velocity_rad_per_s, max_velocity_rad_per_s
            )
            commanded_action = move_group_joint_pos + clipped_velocity / 10.0

            clipped_action[move_group] = commanded_action

    return clipped_action

get_action ¶

get_action(info: dict[str, Any]) -> dict[str, Any]

Get the next action based on current task state and phase.

Source code in molmo_spaces/policy/solvers/opening_solver.py

def get_action(self, info: dict[str, Any]) -> dict[str, Any]:
    """Get the next action based on current task state and phase."""
    self._print_distance_to_handle()

    if self.arm_side is None:
        self.arm_side = self.select_arm_for_opening()
        if self.config.policy_config.verbose:
            log.info(f"[ARM] Selected {self.arm_side} arm for door opening")

    action_cmd = {}
    if self.current_phase == DoorOpeningPhase.NAVIGATE_TO_DOOR:
        action_cmd = self._execute_navigate_phase()
    elif self.current_phase == DoorOpeningPhase.REACH_PRE_GRASP:
        action_cmd = self._execute_reach_pre_grasp_phase()
    elif self.current_phase == DoorOpeningPhase.REACH_GRASP:
        action_cmd = self._execute_reach_grasp_phase()
    elif self.current_phase == DoorOpeningPhase.GRASP_HANDLE:
        action_cmd = self._execute_grasp_handle_phase()
    elif self.current_phase == DoorOpeningPhase.OPEN_DOOR:
        action_cmd = self._execute_open_phase()
    elif self.current_phase == DoorOpeningPhase.RELEASE_HANDLE:
        action_cmd = self._execute_release_phase()
    elif self.current_phase == DoorOpeningPhase.COMPLETE:
        action_cmd = self._execute_complete_phase()
    elif self.current_phase == DoorOpeningPhase.RECOVERY:
        action_cmd = self._execute_recovery_phase()
    else:
        raise ValueError(f"Unknown phase: {self.current_phase}")

    action_cmd.update(self.get_look_at_action())
    action_cmd = self.clip_to_velocity_constraint(action_cmd)

    return action_cmd

get_all_phases ¶

get_all_phases() -> dict[str, int]

Return a dictionary mapping all phase names to their enum values.

Source code in molmo_spaces/policy/solvers/opening_solver.py

def get_all_phases(self) -> dict[str, int]:
    """Return a dictionary mapping all phase names to their enum values."""
    return {phase.name: phase.value for phase in DoorOpeningPhase}

get_info ¶

get_info() -> dict

Get additional information from the policy. Called after episode ended. This method can be overridden by subclasses to provide extra information about the policy's state. Must be json serializable.

Returns:

Type	Description
`dict`	A dictionary containing additional information about the policy.

Source code in molmo_spaces/policy/base_policy.py

def get_info(self) -> dict:
    """
    Get additional information from the policy. Called after episode ended. This method can be
    overridden by subclasses to provide extra information about the policy's state.
    Must be json serializable.

    Returns:
        A dictionary containing additional information about the policy.
    """
    return {}

get_look_at_action ¶

get_look_at_action() -> dict[str, Any]

Source code in molmo_spaces/policy/solvers/opening_solver.py

def get_look_at_action(self) -> dict[str, Any]:
    target_pos = self.task.get_door_handle_position()
    return {"head": self.task.get_target_head_pose(target_pos)}

get_phase ¶

get_phase() -> str

Return the current phase name as a string.

Source code in molmo_spaces/policy/solvers/opening_solver.py

def get_phase(self) -> str:
    """Return the current phase name as a string."""
    return self.current_phase.name

reset ¶

reset() -> None

Reset the policy state.

Source code in molmo_spaces/policy/solvers/opening_solver.py

def reset(self) -> None:
    """Reset the policy state."""
    super().reset()
    log.debug("[RESET] Resetting DoorOpeningPlannerPolicy state")
    self.arm_side = None
    self.current_phase = DoorOpeningPhase.REACH_PRE_GRASP
    self.grasping_timesteps = 0
    self.planning_failures = 0
    self.curr_articulation_step = 0
    self.recovery_step_count = 0

    # Reset motion planners
    self.left_motion_planner.reset()
    self.right_motion_planner.reset()

select_arm ¶

select_arm() -> None

Select which arm to use based on distance to pickup object.

Also instantiates the motion planner for the selected arm. This lazy initialization saves ~11GB of GPU memory by only loading one arm's planner.

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py

def select_arm(self) -> None:
    """Select which arm to use based on distance to pickup object.

    Also instantiates the motion planner for the selected arm.
    This lazy initialization saves ~11GB of GPU memory by only loading one arm's planner.
    """
    task_config = self.config.task_config
    om = self.task.env.object_managers[self.task.env.current_batch_index]
    pickup_obj = om.get_object_by_name(task_config.pickup_obj_name)
    pickup_obj_pos = pickup_obj.position

    left_tcp_pose = self.task.env.current_robot.robot_view.get_move_group(
        "left_gripper"
    ).leaf_frame_to_world
    right_tcp_pose = self.task.env.current_robot.robot_view.get_move_group(
        "right_gripper"
    ).leaf_frame_to_world

    left_tcp_pos = left_tcp_pose[:3, 3]
    right_tcp_pos = right_tcp_pose[:3, 3]

    # Compute distances
    left_dist = np.linalg.norm(left_tcp_pos - pickup_obj_pos)
    right_dist = np.linalg.norm(right_tcp_pos - pickup_obj_pos)

    selected_arm = "left" if left_dist < right_dist else "right"
    log.info(
        f"Selected {selected_arm} arm (left dist: {left_dist:.3f}m, right dist: {right_dist:.3f}m)"
    )

    self.arm_side = selected_arm

    # Instantiate the planner for the selected arm only
    log.info(f"Instantiating motion planner for {selected_arm} arm")
    if selected_arm == "left":
        self.planner = CuroboPlanner(
            config=self.config.policy_config.left_curobo_planner_config
        )
        self.planner_joint_ranges = self.config.policy_config.left_planner_joint_ranges
    else:
        self.planner = CuroboPlanner(
            config=self.config.policy_config.right_curobo_planner_config
        )
        self.planner_joint_ranges = self.config.policy_config.right_planner_joint_ranges

    self.arm_start_idx = self.planner_joint_ranges[f"{self.arm_side}_arm"][0]
    self.arm_end_idx = self.planner_joint_ranges[f"{self.arm_side}_arm"][1]

select_arm_for_opening ¶

select_arm_for_opening() -> str

Select the arm for opening the door.

Also sets self.arm_side and self.planner_joint_ranges for compatibility with base class methods.

Source code in molmo_spaces/policy/solvers/opening_solver.py

def select_arm_for_opening(self) -> str:
    """Select the arm for opening the door.

    Also sets self.arm_side and self.planner_joint_ranges for compatibility
    with base class methods.
    """
    robot_base_pose_tf = self.task.env.robots[0].get_world_pose_tf_mat()
    door_joint_pos_world = self.task.get_door_joint_position()
    door_handle_pos_world = self.task.get_door_handle_position()

    log.debug("\n[ARM SELECTION] Analyzing door geometry:")
    log.debug(f"  - Robot base position: {robot_base_pose_tf[:3, 3]}")
    log.debug(f"  - Door joint position: {door_joint_pos_world}")
    log.debug(f"  - Door handle position: {door_handle_pos_world}")

    door_joint_pos_robot = np.linalg.inv(robot_base_pose_tf) @ np.array(
        [*door_joint_pos_world, 1.0]
    )
    door_handle_pos_robot = np.linalg.inv(robot_base_pose_tf) @ np.array(
        [*door_handle_pos_world, 1.0]
    )

    selected_arm = "left" if door_joint_pos_robot[1] > door_handle_pos_robot[1] else "right"

    log.debug(f"  - Door joint in robot frame: {door_joint_pos_robot[:3]}")
    log.debug(f"  - Door handle in robot frame: {door_handle_pos_robot[:3]}")
    log.debug(
        f"  - Selected arm: {selected_arm} (joint Y: {door_joint_pos_robot[1]:.3f}, "
        f"handle Y: {door_handle_pos_robot[1]:.3f})"
    )

    # Set base class properties for compatibility with shared methods
    self.arm_side = selected_arm
    self.planner_joint_ranges = self._get_planner_joint_ranges()

    return selected_arm

solve_ik ¶

solve_ik(target_pose: ndarray) -> None

Solve inverse kinematics for a target pose and create interpolated trajectory.

Parameters:

Name	Type	Description	Default
`target_pose`	`ndarray`	4x4 transformation matrix for target end-effector pose in world frame.	required

Raises:

Type	Description
`ValueError`	If IK solution cannot be found.

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py

def solve_ik(self, target_pose: np.ndarray) -> None:
    """Solve inverse kinematics for a target pose and create interpolated trajectory.

    Args:
        target_pose: 4x4 transformation matrix for target end-effector pose in world frame.

    Raises:
        ValueError: If IK solution cannot be found.
    """
    init_config = np.concatenate(
        [
            np.zeros(3),
            self.task.env.current_robot.robot_view.get_move_group(
                f"{self.arm_side}_arm"
            ).joint_pos,
        ]
    )

    target_pose_7d = self._target_pose_to_base_frame(target_pose)
    joint_config, _ = self.planner.ik_solve(
        goal_pose=target_pose_7d.tolist(),
        seed_config=init_config.tolist(),
        disable_collision=True,
    )

    current_phase = getattr(self, "current_phase", "unknown")
    if joint_config is None:
        raise ValueError(f"Could not solve {current_phase} phase IK.")

    trajectory = self._interpolate_joint_trajectory(
        init_config,
        np.array(joint_config),
        num_steps=10,
    )
    self._transform_traj_to_world_frame(trajectory)
    self.planned_trajectory = trajectory

visualize_world_config_mesh ¶

visualize_world_config_mesh(world_cfg: WorldConfig) -> None

Visualize the world configuration as a mesh file.

Parameters:

Name	Type	Description	Default
`world_cfg`	`WorldConfig`	Curobo WorldConfig to visualize.	required

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py

def visualize_world_config_mesh(self, world_cfg: WorldConfig) -> None:
    """Visualize the world configuration as a mesh file.

    Args:
        world_cfg: Curobo WorldConfig to visualize.
    """
    from datetime import datetime

    import trimesh
    from curobo.geom.types import WorldConfig

    current_phase = getattr(self, "current_phase", "unknown")
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    file_name = f"world_config_mesh_{current_phase}_{timestamp}"
    obj_path = f"{file_name}.obj"
    initial_joint_configuration = np.array(self._get_current_joint_positions())

    try:
        # Create a combined scene with robot and world obstacles
        combined_scene = trimesh.scene.scene.Scene(base_frame="world_origin")

        # Add robot mesh if we have a joint configuration
        if initial_joint_configuration is not None and self.planner is not None:
            try:
                config_for_mesh = np.array(initial_joint_configuration).copy()
                # Zero out base joints to place robot at origin
                config_for_mesh[:3] = 0.0
                q_tensor = torch.tensor(config_for_mesh).unsqueeze(0).float().cuda()

                robot_spheres_batch = self.planner.motion_gen.kinematics.get_robot_as_spheres(
                    q_tensor
                )
                if robot_spheres_batch and len(robot_spheres_batch) > 0:
                    robot_spheres = robot_spheres_batch[0]
                    for i, sphere in enumerate(robot_spheres):
                        sphere_mesh = trimesh.creation.icosphere(radius=sphere.radius)
                        sphere_transform = np.eye(4)
                        sphere_transform[:3, 3] = sphere.pose[:3]
                        combined_scene.add_geometry(
                            sphere_mesh,
                            geom_name=f"robot_sphere_{i}",
                            parent_node_name="world_origin",
                            transform=sphere_transform,
                        )
                    log.info(
                        f"Added {len(robot_spheres)} robot collision spheres to visualization"
                    )

            except Exception as e:
                log.warning(f"Could not add robot mesh: {e}")

        # Add world obstacles to the scene
        try:
            world_scene = WorldConfig.get_scene_graph(world_cfg, process_color=True)
            for geom_name, geom in world_scene.geometry.items():
                transform = world_scene.graph.get(geom_name)[0]
                combined_scene.add_geometry(
                    geom,
                    geom_name=geom_name,
                    parent_node_name="world_origin",
                    transform=transform,
                )
        except Exception as e:
            log.warning(f"Could not add world obstacles: {e}")

        # Export combined scene
        if len(combined_scene.geometry) > 0:
            combined_scene.export(obj_path)
            log.info(f"Successfully saved combined robot + world mesh to {obj_path}")
        else:
            log.debug("Skipping mesh export - scene is empty")

    except ValueError as e:
        if "empty scene" in str(e).lower():
            log.debug("Skipping mesh export - world config is empty")
        else:
            raise

policy¶

policy ¶

base_policy ¶

NONE_PHASE module-attribute ¶

BasePolicy ¶

config instance-attribute ¶

task instance-attribute ¶

add_auxiliary_objects staticmethod ¶

get_action abstractmethod ¶

get_all_phases ¶

get_info ¶

get_phase ¶

reset abstractmethod ¶

InferencePolicy ¶

config instance-attribute ¶

current_phase instance-attribute ¶

target_poses instance-attribute ¶

task instance-attribute ¶

task_type instance-attribute ¶

add_auxiliary_objects staticmethod ¶

get_action ¶

get_all_phases ¶

get_info ¶

get_phase ¶

inference_model abstractmethod ¶

model_output_to_action abstractmethod ¶

obs_to_model_input abstractmethod ¶

prepare_model abstractmethod ¶

render ¶

reset abstractmethod ¶

PlannerPolicy ¶

config instance-attribute ¶

task instance-attribute ¶

add_auxiliary_objects staticmethod ¶

get_action abstractmethod ¶

get_all_phases ¶

get_info ¶

get_phase ¶

planners abstractmethod ¶

reset abstractmethod ¶

StatefulPolicy ¶

config instance-attribute ¶

task instance-attribute ¶

add_auxiliary_objects staticmethod ¶

get_action abstractmethod ¶

get_all_phases ¶

get_info ¶

get_phase ¶

get_state abstractmethod ¶

reset abstractmethod ¶

set_state abstractmethod ¶

dummy_policy ¶

BrownianMotionPolicy ¶

config instance-attribute ¶

current_phase instance-attribute ¶

std instance-attribute ¶

target_poses instance-attribute ¶

task instance-attribute ¶

type property ¶

add_auxiliary_objects staticmethod ¶

get_action ¶

get_all_phases ¶

get_info ¶

get_phase ¶

reset ¶

DummyPolicy ¶

config instance-attribute ¶

current_phase instance-attribute ¶

target_poses instance-attribute ¶

task instance-attribute ¶

type property ¶

add_auxiliary_objects staticmethod ¶

get_action ¶

get_all_phases ¶

get_info ¶

get_phase ¶

reset ¶

learned_policy ¶

bimanual_yam_pi_policy ¶

log module-attribute ¶

NONE_PHASE `module-attribute` ¶

config `instance-attribute` ¶

task `instance-attribute` ¶

add_auxiliary_objects `staticmethod` ¶

get_action `abstractmethod` ¶

reset `abstractmethod` ¶

config `instance-attribute` ¶

current_phase `instance-attribute` ¶

target_poses `instance-attribute` ¶

task `instance-attribute` ¶

task_type `instance-attribute` ¶

add_auxiliary_objects `staticmethod` ¶

inference_model `abstractmethod` ¶

model_output_to_action `abstractmethod` ¶

obs_to_model_input `abstractmethod` ¶

prepare_model `abstractmethod` ¶

reset `abstractmethod` ¶

config `instance-attribute` ¶

task `instance-attribute` ¶

add_auxiliary_objects `staticmethod` ¶

get_action `abstractmethod` ¶

planners `abstractmethod` ¶

reset `abstractmethod` ¶

config `instance-attribute` ¶

task `instance-attribute` ¶

add_auxiliary_objects `staticmethod` ¶

get_action `abstractmethod` ¶

get_state `abstractmethod` ¶

reset `abstractmethod` ¶

set_state `abstractmethod` ¶

config `instance-attribute` ¶

current_phase `instance-attribute` ¶

std `instance-attribute` ¶

target_poses `instance-attribute` ¶

task `instance-attribute` ¶

type `property` ¶

add_auxiliary_objects `staticmethod` ¶

config `instance-attribute` ¶

current_phase `instance-attribute` ¶

target_poses `instance-attribute` ¶

task `instance-attribute` ¶

type `property` ¶

add_auxiliary_objects `staticmethod` ¶

log `module-attribute` ¶

GRIPPER_MAX `class-attribute` `instance-attribute` ¶

STATE_NAMES `class-attribute` `instance-attribute` ¶

actions_buffer `instance-attribute` ¶

buffer_length `instance-attribute` ¶

camera_mapping `instance-attribute` ¶

checkpoint_path `instance-attribute` ¶

client `instance-attribute` ¶

config `instance-attribute` ¶

current_buffer_index `instance-attribute` ¶

current_phase `instance-attribute` ¶

grasping_type `instance-attribute` ¶

remote_config `instance-attribute` ¶

starting_time `instance-attribute` ¶

target_poses `instance-attribute` ¶

task `instance-attribute` ¶

task_type `instance-attribute` ¶

del ¶

add_auxiliary_objects `staticmethod` ¶

log `module-attribute` ¶

config `instance-attribute` ¶

current_phase `instance-attribute` ¶

grasping_threshold `instance-attribute` ¶

grasping_type `instance-attribute` ¶

model `instance-attribute` ¶

remote_config `instance-attribute` ¶

target_poses `instance-attribute` ¶

task `instance-attribute` ¶

task_type `instance-attribute` ¶

use_exo `instance-attribute` ¶

use_vlm `instance-attribute` ¶

add_auxiliary_objects `staticmethod` ¶

PING_INTERVAL_SECS `module-attribute` ¶

PING_TIMEOUT_SECS `module-attribute` ¶

log `module-attribute` ¶

checkpoint_path `instance-attribute` ¶

chunk_size `instance-attribute` ¶