Skip to content

policy

policy

Modules:

Name Description
base_policy

Abstract Policy class to run a policy in the environment for data collection or evaluation.

dummy_policy
learned_policy
random_policy
solvers

base_policy

Abstract Policy class to run a policy in the environment for data collection or evaluation. This class is designed to be subclassed and implemented with specific policies. For example, the policy can be a planner, a human teleop interface, a reinforcement learning agent, or any other type of agent that can interact with the environment to collect data.

Classes:

Name Description
BasePolicy

Abstract base class for policies.

InferencePolicy
PlannerPolicy
StatefulPolicy

Attributes:

Name Type Description
NONE_PHASE

NONE_PHASE module-attribute

NONE_PHASE = -1

BasePolicy

BasePolicy(config: MlSpacesExpConfig, task: BaseMujocoTask | None = None)

Bases: ABC

Abstract base class for policies.

This class provides a template for policies that can be used to interact with an environment for the purpose of data collection or evaluation. It declares methods that should be implemented by any concrete policy class.

Methods:

Name Description
add_auxiliary_objects

Add auxiliary objects to the scene that might be required for the policy.

get_action

Decide on the action to take based on the current observation of the or environment.

get_all_phases

Returns:

get_info

Get additional information from the policy. Called after episode ended. This method can be

get_phase

Returns:

reset

Reset the policy's internal state.

Attributes:

Name Type Description
config
task
Source code in molmo_spaces/policy/base_policy.py
def __init__(self, config: "MlSpacesExpConfig", task: BaseMujocoTask | None = None) -> None:
    self.config = config
    self.task = task
config instance-attribute
config = config
task instance-attribute
task = task
add_auxiliary_objects staticmethod
add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py
@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None
get_action abstractmethod
get_action(observation)

Decide on the action to take based on the current observation of the or environment. Information could be observations, goals in the case of an rl_agent, or it could be the full environment state in the case of a planner.

Parameters:

Name Type Description Default
observation

The current information about the task or environment.

required

Returns:

Type Description

The action to take in response to the information.

Source code in molmo_spaces/policy/base_policy.py
@abstractmethod
def get_action(self, observation):
    """
    Decide on the action to take based on the current observation of the or environment.
    Information could be observations, goals in the case of an rl_agent, or it could be the full
    environment state in the case of a planner.

    Args:
        observation: The current information about the task or environment.

    Returns:
        The action to take in response to the information.
    """
    pass
get_all_phases
get_all_phases() -> dict[str | int]

Returns:

Type Description
dict[str | int]

A dictionary of all possible policy phases

Source code in molmo_spaces/policy/base_policy.py
def get_all_phases(self) -> dict[str | int]:
    """
    Returns:
        A dictionary of all possible policy phases
    """
    return {"unknown": 0}
get_info
get_info() -> dict

Get additional information from the policy. Called after episode ended. This method can be overridden by subclasses to provide extra information about the policy's state. Must be json serializable.

Returns:

Type Description
dict

A dictionary containing additional information about the policy.

Source code in molmo_spaces/policy/base_policy.py
def get_info(self) -> dict:
    """
    Get additional information from the policy. Called after episode ended. This method can be
    overridden by subclasses to provide extra information about the policy's state.
    Must be json serializable.

    Returns:
        A dictionary containing additional information about the policy.
    """
    return {}
get_phase
get_phase() -> str

Returns:

Type Description
str

The policy phase

Source code in molmo_spaces/policy/base_policy.py
def get_phase(self) -> str:
    """
    Returns:
        The policy phase
    """
    return "unknown"
reset abstractmethod
reset()

Reset the policy's internal state.

This method should be implemented by each subclass to reset the policy's internal state. It is typically called at the beginning of each episode or task.

Source code in molmo_spaces/policy/base_policy.py
@abstractmethod
def reset(self):
    """
    Reset the policy's internal state.

    This method should be implemented by each subclass to reset the policy's internal state.
    It is typically called at the beginning of each episode or task.
    """
    pass

InferencePolicy

InferencePolicy(config: MlSpacesExpConfig, task_type)

Bases: BasePolicy

Methods:

Name Description
add_auxiliary_objects

Add auxiliary objects to the scene that might be required for the policy.

get_action
get_all_phases

Returns:

get_info
get_phase

Returns:

inference_model
model_output_to_action
obs_to_model_input
prepare_model
render

Not required to be implemented by subclasses.

reset

Attributes:

Name Type Description
config
current_phase
target_poses
task
task_type
Source code in molmo_spaces/policy/base_policy.py
def __init__(self, config: "MlSpacesExpConfig", task_type) -> None:
    super().__init__(config)
    self.task_type = task_type

    # TODO(max): remove these (added to silence warnings)
    self.target_poses = {"grasp": np.eye(4)}
    self.current_phase = NONE_PHASE
config instance-attribute
config = config
current_phase instance-attribute
current_phase = NONE_PHASE
target_poses instance-attribute
target_poses = {'grasp': eye(4)}
task instance-attribute
task = task
task_type instance-attribute
task_type = task_type
add_auxiliary_objects staticmethod
add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py
@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None
get_action
get_action(observation)
Source code in molmo_spaces/policy/base_policy.py
def get_action(self, observation):
    model_input = self.obs_to_model_input(observation)
    model_output = self.inference_model(model_input)
    action = self.model_output_to_action(model_output)
    return action
get_all_phases
get_all_phases() -> dict[str | int]

Returns:

Type Description
dict[str | int]

A dictionary of all possible policy phases

Source code in molmo_spaces/policy/base_policy.py
def get_all_phases(self) -> dict[str | int]:
    """
    Returns:
        A dictionary of all possible policy phases
    """
    return {"unknown": 0}
get_info
get_info() -> dict
Source code in molmo_spaces/policy/base_policy.py
def get_info(self) -> dict:
    info = super().get_info()
    info["task_type"] = self.task_type
    info["timestamp"] = time.time()
    return info
get_phase
get_phase() -> str

Returns:

Type Description
str

The policy phase

Source code in molmo_spaces/policy/base_policy.py
def get_phase(self) -> str:
    """
    Returns:
        The policy phase
    """
    return "unknown"
inference_model abstractmethod
inference_model(model_input)
Source code in molmo_spaces/policy/base_policy.py
@abstractmethod
def inference_model(self, model_input):
    raise NotImplementedError
model_output_to_action abstractmethod
model_output_to_action(model_output)
Source code in molmo_spaces/policy/base_policy.py
@abstractmethod
def model_output_to_action(self, model_output):
    raise NotImplementedError
obs_to_model_input abstractmethod
obs_to_model_input(obs)
Source code in molmo_spaces/policy/base_policy.py
@abstractmethod
def obs_to_model_input(self, obs):
    raise NotImplementedError
prepare_model abstractmethod
prepare_model(model_name: str)
Source code in molmo_spaces/policy/base_policy.py
@abstractmethod
def prepare_model(self, model_name: str):
    raise NotImplementedError
render
render(obs)

Not required to be implemented by subclasses. Defaults to noop.

Source code in molmo_spaces/policy/base_policy.py
def render(self, obs):
    """
    Not required to be implemented by subclasses.
    Defaults to noop.
    """
    pass
reset abstractmethod
reset()
Source code in molmo_spaces/policy/base_policy.py
@abstractmethod
def reset(self):
    raise NotImplementedError

PlannerPolicy

PlannerPolicy(config: MlSpacesExpConfig, task: BaseMujocoTask | None = None)

Bases: BasePolicy

Methods:

Name Description
add_auxiliary_objects

Add auxiliary objects to the scene that might be required for the policy.

get_action

Decide on the action to take based on the current observation of the or environment.

get_all_phases

Returns:

get_info

Get additional information from the policy. Called after episode ended. This method can be

get_phase

Returns:

planners

Abstract property representing the list or dict of planner instances.

reset

Reset the policy's internal state.

Attributes:

Name Type Description
config
task
Source code in molmo_spaces/policy/base_policy.py
def __init__(self, config: "MlSpacesExpConfig", task: BaseMujocoTask | None = None) -> None:
    self.config = config
    self.task = task
config instance-attribute
config = config
task instance-attribute
task = task
add_auxiliary_objects staticmethod
add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py
@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None
get_action abstractmethod
get_action(observation)

Decide on the action to take based on the current observation of the or environment. Information could be observations, goals in the case of an rl_agent, or it could be the full environment state in the case of a planner.

Parameters:

Name Type Description Default
observation

The current information about the task or environment.

required

Returns:

Type Description

The action to take in response to the information.

Source code in molmo_spaces/policy/base_policy.py
@abstractmethod
def get_action(self, observation):
    """
    Decide on the action to take based on the current observation of the or environment.
    Information could be observations, goals in the case of an rl_agent, or it could be the full
    environment state in the case of a planner.

    Args:
        observation: The current information about the task or environment.

    Returns:
        The action to take in response to the information.
    """
    pass
get_all_phases
get_all_phases() -> dict[str | int]

Returns:

Type Description
dict[str | int]

A dictionary of all possible policy phases

Source code in molmo_spaces/policy/base_policy.py
def get_all_phases(self) -> dict[str | int]:
    """
    Returns:
        A dictionary of all possible policy phases
    """
    return {"unknown": 0}
get_info
get_info() -> dict

Get additional information from the policy. Called after episode ended. This method can be overridden by subclasses to provide extra information about the policy's state. Must be json serializable.

Returns:

Type Description
dict

A dictionary containing additional information about the policy.

Source code in molmo_spaces/policy/base_policy.py
def get_info(self) -> dict:
    """
    Get additional information from the policy. Called after episode ended. This method can be
    overridden by subclasses to provide extra information about the policy's state.
    Must be json serializable.

    Returns:
        A dictionary containing additional information about the policy.
    """
    return {}
get_phase
get_phase() -> str

Returns:

Type Description
str

The policy phase

Source code in molmo_spaces/policy/base_policy.py
def get_phase(self) -> str:
    """
    Returns:
        The policy phase
    """
    return "unknown"
planners abstractmethod
planners() -> None

Abstract property representing the list or dict of planner instances.

Source code in molmo_spaces/policy/base_policy.py
@abstractmethod
def planners(self) -> None:
    """Abstract property representing the list or dict of planner instances."""
    pass
reset abstractmethod
reset()

Reset the policy's internal state.

This method should be implemented by each subclass to reset the policy's internal state. It is typically called at the beginning of each episode or task.

Source code in molmo_spaces/policy/base_policy.py
@abstractmethod
def reset(self):
    """
    Reset the policy's internal state.

    This method should be implemented by each subclass to reset the policy's internal state.
    It is typically called at the beginning of each episode or task.
    """
    pass

StatefulPolicy

StatefulPolicy(config: MlSpacesExpConfig, task: BaseMujocoTask | None = None)

Bases: BasePolicy

Methods:

Name Description
add_auxiliary_objects

Add auxiliary objects to the scene that might be required for the policy.

get_action

Decide on the action to take based on the current observation of the or environment.

get_all_phases

Returns:

get_info

Get additional information from the policy. Called after episode ended. This method can be

get_phase

Returns:

get_state
reset

Reset the policy's internal state.

set_state

Attributes:

Name Type Description
config
task
Source code in molmo_spaces/policy/base_policy.py
def __init__(self, config: "MlSpacesExpConfig", task: BaseMujocoTask | None = None) -> None:
    self.config = config
    self.task = task
config instance-attribute
config = config
task instance-attribute
task = task
add_auxiliary_objects staticmethod
add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py
@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None
get_action abstractmethod
get_action(observation)

Decide on the action to take based on the current observation of the or environment. Information could be observations, goals in the case of an rl_agent, or it could be the full environment state in the case of a planner.

Parameters:

Name Type Description Default
observation

The current information about the task or environment.

required

Returns:

Type Description

The action to take in response to the information.

Source code in molmo_spaces/policy/base_policy.py
@abstractmethod
def get_action(self, observation):
    """
    Decide on the action to take based on the current observation of the or environment.
    Information could be observations, goals in the case of an rl_agent, or it could be the full
    environment state in the case of a planner.

    Args:
        observation: The current information about the task or environment.

    Returns:
        The action to take in response to the information.
    """
    pass
get_all_phases
get_all_phases() -> dict[str | int]

Returns:

Type Description
dict[str | int]

A dictionary of all possible policy phases

Source code in molmo_spaces/policy/base_policy.py
def get_all_phases(self) -> dict[str | int]:
    """
    Returns:
        A dictionary of all possible policy phases
    """
    return {"unknown": 0}
get_info
get_info() -> dict

Get additional information from the policy. Called after episode ended. This method can be overridden by subclasses to provide extra information about the policy's state. Must be json serializable.

Returns:

Type Description
dict

A dictionary containing additional information about the policy.

Source code in molmo_spaces/policy/base_policy.py
def get_info(self) -> dict:
    """
    Get additional information from the policy. Called after episode ended. This method can be
    overridden by subclasses to provide extra information about the policy's state.
    Must be json serializable.

    Returns:
        A dictionary containing additional information about the policy.
    """
    return {}
get_phase
get_phase() -> str

Returns:

Type Description
str

The policy phase

Source code in molmo_spaces/policy/base_policy.py
def get_phase(self) -> str:
    """
    Returns:
        The policy phase
    """
    return "unknown"
get_state abstractmethod
get_state()
Source code in molmo_spaces/policy/base_policy.py
@abstractmethod
def get_state(self):
    raise NotImplementedError
reset abstractmethod
reset()

Reset the policy's internal state.

This method should be implemented by each subclass to reset the policy's internal state. It is typically called at the beginning of each episode or task.

Source code in molmo_spaces/policy/base_policy.py
@abstractmethod
def reset(self):
    """
    Reset the policy's internal state.

    This method should be implemented by each subclass to reset the policy's internal state.
    It is typically called at the beginning of each episode or task.
    """
    pass
set_state abstractmethod
set_state(state)
Source code in molmo_spaces/policy/base_policy.py
@abstractmethod
def set_state(self, state):
    raise NotImplementedError

dummy_policy

Classes:

Name Description
BrownianMotionPolicy

Policy that applies Gaussian noise increments over noop control, resulting in Brownian motion.

DummyPolicy

A Dummy Policy that return null actions.

BrownianMotionPolicy

BrownianMotionPolicy(config: MlSpacesExpConfig, task: BaseMujocoTask | None = None)

Bases: DummyPolicy

Policy that applies Gaussian noise increments over noop control, resulting in Brownian motion.

Methods:

Name Description
add_auxiliary_objects

Add auxiliary objects to the scene that might be required for the policy.

get_action
get_all_phases

Returns:

get_info

Get additional information from the policy. Called after episode ended. This method can be

get_phase

Returns:

reset

Reset the policy state. No state to reset for DummyPolicy.

Attributes:

Name Type Description
config
current_phase
std
target_poses
task
type
Source code in molmo_spaces/policy/dummy_policy.py
def __init__(self, config: "MlSpacesExpConfig", task: BaseMujocoTask | None = None) -> None:
    super().__init__(config, task)
    self.task = task
    self.std = config.policy_config.std
config instance-attribute
config = config
current_phase instance-attribute
current_phase = NONE_PHASE
std instance-attribute
std = std
target_poses instance-attribute
target_poses = {'grasp': eye(4)}
task instance-attribute
task = task
type property
type
add_auxiliary_objects staticmethod
add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py
@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None
get_action
get_action(observation) -> dict
Source code in molmo_spaces/policy/dummy_policy.py
def get_action(self, observation) -> dict:
    robot_view = self.task.env.current_robot.robot_view
    action = robot_view.get_noop_ctrl_dict()
    for mg_id, ctrl in action.items():
        action[mg_id] = ctrl + np.random.normal(loc=0.0, scale=self.std, size=ctrl.shape)
    return action
get_all_phases
get_all_phases() -> dict[str | int]

Returns:

Type Description
dict[str | int]

A dictionary of all possible policy phases

Source code in molmo_spaces/policy/base_policy.py
def get_all_phases(self) -> dict[str | int]:
    """
    Returns:
        A dictionary of all possible policy phases
    """
    return {"unknown": 0}
get_info
get_info() -> dict

Get additional information from the policy. Called after episode ended. This method can be overridden by subclasses to provide extra information about the policy's state. Must be json serializable.

Returns:

Type Description
dict

A dictionary containing additional information about the policy.

Source code in molmo_spaces/policy/base_policy.py
def get_info(self) -> dict:
    """
    Get additional information from the policy. Called after episode ended. This method can be
    overridden by subclasses to provide extra information about the policy's state.
    Must be json serializable.

    Returns:
        A dictionary containing additional information about the policy.
    """
    return {}
get_phase
get_phase() -> str

Returns:

Type Description
str

The policy phase

Source code in molmo_spaces/policy/base_policy.py
def get_phase(self) -> str:
    """
    Returns:
        The policy phase
    """
    return "unknown"
reset
reset()

Reset the policy state. No state to reset for DummyPolicy.

Source code in molmo_spaces/policy/dummy_policy.py
def reset(self):
    """
    Reset the policy state. No state to reset for DummyPolicy.
    """
    pass

DummyPolicy

DummyPolicy(config: MlSpacesExpConfig, task: BaseMujocoTask | None = None)

Bases: BasePolicy

A Dummy Policy that return null actions.

Methods:

Name Description
add_auxiliary_objects

Add auxiliary objects to the scene that might be required for the policy.

get_action

Dummy action to take based on the action space.

get_all_phases

Returns:

get_info

Get additional information from the policy. Called after episode ended. This method can be

get_phase

Returns:

reset

Reset the policy state. No state to reset for DummyPolicy.

Attributes:

Name Type Description
config
current_phase
target_poses
task
type
Source code in molmo_spaces/policy/dummy_policy.py
def __init__(self, config: "MlSpacesExpConfig", task: BaseMujocoTask | None = None) -> None:
    self.config = config
    # Required attributes for sensors that expect a policy with target poses
    self.target_poses = {"grasp": np.eye(4)}
    self.current_phase = NONE_PHASE
config instance-attribute
config = config
current_phase instance-attribute
current_phase = NONE_PHASE
target_poses instance-attribute
target_poses = {'grasp': eye(4)}
task instance-attribute
task = task
type property
type
add_auxiliary_objects staticmethod
add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py
@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None
get_action
get_action(obervation)

Dummy action to take based on the action space.

Parameters:

Name Type Description Default
info

The current information about the task or environment (not used).

required

Returns:

Source code in molmo_spaces/policy/dummy_policy.py
def get_action(self, obervation):
    """
    Dummy action to take based on the action space.

    Args:
        info: The current information about the task or environment (not used).

    Returns:
    """
    return dict()
get_all_phases
get_all_phases() -> dict[str | int]

Returns:

Type Description
dict[str | int]

A dictionary of all possible policy phases

Source code in molmo_spaces/policy/base_policy.py
def get_all_phases(self) -> dict[str | int]:
    """
    Returns:
        A dictionary of all possible policy phases
    """
    return {"unknown": 0}
get_info
get_info() -> dict

Get additional information from the policy. Called after episode ended. This method can be overridden by subclasses to provide extra information about the policy's state. Must be json serializable.

Returns:

Type Description
dict

A dictionary containing additional information about the policy.

Source code in molmo_spaces/policy/base_policy.py
def get_info(self) -> dict:
    """
    Get additional information from the policy. Called after episode ended. This method can be
    overridden by subclasses to provide extra information about the policy's state.
    Must be json serializable.

    Returns:
        A dictionary containing additional information about the policy.
    """
    return {}
get_phase
get_phase() -> str

Returns:

Type Description
str

The policy phase

Source code in molmo_spaces/policy/base_policy.py
def get_phase(self) -> str:
    """
    Returns:
        The policy phase
    """
    return "unknown"
reset
reset()

Reset the policy state. No state to reset for DummyPolicy.

Source code in molmo_spaces/policy/dummy_policy.py
def reset(self):
    """
    Reset the policy state. No state to reset for DummyPolicy.
    """
    pass

learned_policy

Modules:

Name Description
bimanual_yam_pi_policy

Bimanual YAM Pi0.5 Policy using LeRobot gRPC inference.

cap_policy
dreamzero_policy
keyboard_policy
lerobot_grpc_client

LeRobot gRPC client for remote policy inference.

phone_policy
pi_policy
rum_client
spacemouse_policy
utils
websocket_policy

bimanual_yam_pi_policy

Bimanual YAM Pi0.5 Policy using LeRobot gRPC inference.

This policy connects to a LeRobot async inference server to run Pi0.5 for bimanual manipulation with the YAM robot.

Classes:

Name Description
BimanualYamPiPolicy

Policy for bimanual YAM robot using Pi0.5 via LeRobot gRPC server.

Attributes:

Name Type Description
log
log module-attribute
log = getLogger(__name__)
BimanualYamPiPolicy
BimanualYamPiPolicy(exp_config: MlSpacesExpConfig)

Bases: InferencePolicy

Policy for bimanual YAM robot using Pi0.5 via LeRobot gRPC server.

This policy: - Connects to a LeRobot async inference server - Sends observations (3 cameras + 14-dim state + task prompt) - Receives 14-dim action chunks (left arm, left gripper, right arm, right gripper) - Buffers actions to avoid calling the model every step

Expected observation format from MuJoCo
  • "left_wrist_camera": RGB image from left wrist
  • "right_wrist_camera": RGB image from right wrist
  • "exo_camera": RGB image from top-down exo camera
  • "qpos": {"left_arm": (6,), "right_arm": (6,), "left_gripper": (N,), "right_gripper": (N,)}
Action format to robot
  • "left_arm": (6,) joint positions
  • "right_arm": (6,) joint positions
  • "left_gripper": (1,) gripper command
  • "right_gripper": (1,) gripper command

Initialize the policy.

Parameters:

Name Type Description Default
exp_config MlSpacesExpConfig

Experiment configuration containing policy config

required

Methods:

Name Description
__del__

Cleanup on deletion.

add_auxiliary_objects

Add auxiliary objects to the scene that might be required for the policy.

get_action
get_all_phases

Returns:

get_info

Get policy information for logging.

get_phase

Returns:

inference_model

Run inference with action buffering.

model_output_to_action

Convert model output to bimanual robot action dictionary.

obs_to_model_input

Transform MuJoCo observations to raw format for LeRobot's build_dataset_frame.

prepare_model

Connect to the LeRobot gRPC server and load the policy.

render

Display camera views for debugging.

reset

Reset the policy state for a new episode.

Attributes:

Name Type Description
GRIPPER_MAX
STATE_NAMES
actions_buffer list[ndarray] | None
buffer_length
camera_mapping
checkpoint_path
client LeRobotGRPCClient | None
config
current_buffer_index int
current_phase
grasping_type
remote_config
starting_time float | None
target_poses
task
task_type
Source code in molmo_spaces/policy/learned_policy/bimanual_yam_pi_policy.py
def __init__(self, exp_config: MlSpacesExpConfig) -> None:
    """Initialize the policy.

    Args:
        exp_config: Experiment configuration containing policy config
    """
    super().__init__(exp_config, exp_config.task_type)

    policy_config = exp_config.policy_config
    self.remote_config = policy_config.remote_config
    self.checkpoint_path = policy_config.checkpoint_path
    self.grasping_type = getattr(policy_config, "grasping_type", "binary")
    self.buffer_length = 50

    # Camera mapping: MuJoCo camera name -> LeRobot feature key
    self.camera_mapping = getattr(
        policy_config,
        "camera_mapping",
        {
            "left_wrist_camera": "observation.images.left",
            "right_wrist_camera": "observation.images.right",
            "exo_camera": "observation.images.top",
        },
    )

    # Will be initialized in prepare_model()
    self.client: LeRobotGRPCClient | None = None
    self.actions_buffer: list[np.ndarray] | None = None
    self.current_buffer_index: int = 0
    self.starting_time: float | None = None
GRIPPER_MAX class-attribute instance-attribute
GRIPPER_MAX = 0.041
STATE_NAMES class-attribute instance-attribute
STATE_NAMES = ['left_joint_0.pos', 'left_joint_1.pos', 'left_joint_2.pos', 'left_joint_3.pos', 'left_joint_4.pos', 'left_joint_5.pos', 'left_gripper.pos', 'right_joint_0.pos', 'right_joint_1.pos', 'right_joint_2.pos', 'right_joint_3.pos', 'right_joint_4.pos', 'right_joint_5.pos', 'right_gripper.pos']
actions_buffer instance-attribute
actions_buffer: list[ndarray] | None = None
buffer_length instance-attribute
buffer_length = 50
camera_mapping instance-attribute
camera_mapping = getattr(policy_config, 'camera_mapping', {'left_wrist_camera': 'observation.images.left', 'right_wrist_camera': 'observation.images.right', 'exo_camera': 'observation.images.top'})
checkpoint_path instance-attribute
checkpoint_path = checkpoint_path
client instance-attribute
client: LeRobotGRPCClient | None = None
config instance-attribute
config = config
current_buffer_index instance-attribute
current_buffer_index: int = 0
current_phase instance-attribute
current_phase = NONE_PHASE
grasping_type instance-attribute
grasping_type = getattr(policy_config, 'grasping_type', 'binary')
remote_config instance-attribute
remote_config = remote_config
starting_time instance-attribute
starting_time: float | None = None
target_poses instance-attribute
target_poses = {'grasp': eye(4)}
task instance-attribute
task = task
task_type instance-attribute
task_type = task_type
__del__
__del__()

Cleanup on deletion.

Source code in molmo_spaces/policy/learned_policy/bimanual_yam_pi_policy.py
def __del__(self):
    """Cleanup on deletion."""
    if hasattr(self, "client") and self.client is not None:
        self.client.close()
add_auxiliary_objects staticmethod
add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py
@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None
get_action
get_action(observation)
Source code in molmo_spaces/policy/base_policy.py
def get_action(self, observation):
    model_input = self.obs_to_model_input(observation)
    model_output = self.inference_model(model_input)
    action = self.model_output_to_action(model_output)
    return action
get_all_phases
get_all_phases() -> dict[str | int]

Returns:

Type Description
dict[str | int]

A dictionary of all possible policy phases

Source code in molmo_spaces/policy/base_policy.py
def get_all_phases(self) -> dict[str | int]:
    """
    Returns:
        A dictionary of all possible policy phases
    """
    return {"unknown": 0}
get_info
get_info() -> dict

Get policy information for logging.

Returns:

Type Description
dict

Dictionary with policy metadata

Source code in molmo_spaces/policy/learned_policy/bimanual_yam_pi_policy.py
def get_info(self) -> dict:
    """Get policy information for logging.

    Returns:
        Dictionary with policy metadata
    """
    info = super().get_info()
    info["policy_name"] = "bimanual_yam_pi"
    info["policy_checkpoint"] = self.model_name
    info["policy_buffer_length"] = self.buffer_length
    info["policy_grasping_type"] = self.grasping_type
    info["prompt"] = self.task.get_task_description()
    info["time_spent"] = time.time() - self.starting_time if self.starting_time else None
    info["timestamp"] = time.time()
    return info
get_phase
get_phase() -> str

Returns:

Type Description
str

The policy phase

Source code in molmo_spaces/policy/base_policy.py
def get_phase(self) -> str:
    """
    Returns:
        The policy phase
    """
    return "unknown"
inference_model
inference_model(model_input: dict) -> ndarray

Run inference with action buffering.

Only calls the remote model when the buffer is empty or exhausted.

Parameters:

Name Type Description Default
model_input dict

Preprocessed observation in LeRobot format

required

Returns:

Type Description
ndarray

Single action from the buffer

Source code in molmo_spaces/policy/learned_policy/bimanual_yam_pi_policy.py
def inference_model(self, model_input: dict) -> np.ndarray:
    """Run inference with action buffering.

    Only calls the remote model when the buffer is empty or exhausted.

    Args:
        model_input: Preprocessed observation in LeRobot format

    Returns:
        Single action from the buffer
    """
    if self.starting_time is None:
        self.starting_time = time.time()

    # Refill buffer if needed
    if self.actions_buffer is None or self.current_buffer_index >= len(self.actions_buffer):
        # Debug: check for NaN/inf in observation before sending to server
        for key, value in model_input.items():
            if isinstance(value, np.ndarray):
                if np.isnan(value).any():
                    log.error(f"NaN detected in observation key '{key}'")
                if np.isinf(value).any():
                    log.error(f"Inf detected in observation key '{key}'")
            elif isinstance(value, float) and (np.isnan(value) or np.isinf(value)):
                log.error(f"NaN/Inf detected in observation key '{key}': {value}")

        self.actions_buffer = self.client.infer(model_input)
        log.info(f"Got action buffer as output from inference: {self.actions_buffer}")
        self.current_buffer_index = 0

        if not self.actions_buffer:
            log.warning("Received empty action buffer from server")
            # Return zeros as fallback
            return np.zeros(14, dtype=np.float32)

    # Get next action from buffer
    action = self.actions_buffer[self.current_buffer_index]
    self.current_buffer_index += 1

    return action
model_output_to_action
model_output_to_action(model_output: ndarray) -> dict

Convert model output to bimanual robot action dictionary.

Model output format (14 dims): [0:6] - left arm joint positions [6] - left gripper [7:13] - right arm joint positions [13] - right gripper

Parameters:

Name Type Description Default
model_output ndarray

14-dim action array from model

required

Returns:

Type Description
dict

Dictionary with keys: left_arm, right_arm, left_gripper, right_gripper

Source code in molmo_spaces/policy/learned_policy/bimanual_yam_pi_policy.py
def model_output_to_action(self, model_output: np.ndarray) -> dict:
    """Convert model output to bimanual robot action dictionary.

    Model output format (14 dims):
        [0:6]  - left arm joint positions
        [6]    - left gripper
        [7:13] - right arm joint positions
        [13]   - right gripper

    Args:
        model_output: 14-dim action array from model

    Returns:
        Dictionary with keys: left_arm, right_arm, left_gripper, right_gripper
    """
    # Extract components from model output
    left_arm = model_output[0:6].astype(np.float32)
    left_gripper_val = float(model_output[6])
    right_arm = model_output[7:13].astype(np.float32)
    right_gripper_val = float(model_output[13])

    action = {
        "left_arm": left_arm,
        "right_arm": right_arm,
        "left_gripper": self._scale_gripper_action(left_gripper_val),
        "right_gripper": self._scale_gripper_action(right_gripper_val),
    }

    log.info(
        f"[ACTION] step={self.current_buffer_index} "
        f"left_arm={np.array2string(left_arm, precision=3, separator=', ')} "
        f"left_grip={left_gripper_val:.3f} "
        f"right_arm={np.array2string(right_arm, precision=3, separator=', ')} "
        f"right_grip={right_gripper_val:.3f}"
    )

    return action
obs_to_model_input
obs_to_model_input(obs: dict) -> dict

Transform MuJoCo observations to raw format for LeRobot's build_dataset_frame.

The server's build_dataset_frame expects: - Individual state values with keys matching STATE_NAMES - Image keys without "observation.images." prefix (e.g., "left", "right", "top") - "task" key for the text prompt

Parameters:

Name Type Description Default
obs dict

MuJoCo observation dictionary with cameras and joint states

required

Returns:

Type Description
dict

Dictionary in raw format that build_dataset_frame will transform

Source code in molmo_spaces/policy/learned_policy/bimanual_yam_pi_policy.py
def obs_to_model_input(self, obs: dict) -> dict:
    """Transform MuJoCo observations to raw format for LeRobot's build_dataset_frame.

    The server's build_dataset_frame expects:
    - Individual state values with keys matching STATE_NAMES
    - Image keys without "observation.images." prefix (e.g., "left", "right", "top")
    - "task" key for the text prompt

    Args:
        obs: MuJoCo observation dictionary with cameras and joint states

    Returns:
        Dictionary in raw format that build_dataset_frame will transform
    """
    model_input = {}

    # Process camera images - use short keys (build_dataset_frame strips "observation.images." prefix)
    camera_short_keys = {
        "left_wrist_camera": "left",
        "right_wrist_camera": "right",
        "exo_camera": "top",
    }
    for mj_cam, short_key in camera_short_keys.items():
        if mj_cam in obs:
            model_input[short_key] = obs[mj_cam]

    # Build state values as individual keys matching STATE_NAMES
    # Order: [left_arm(6), left_gripper(1), right_arm(6), right_gripper(1)]
    qpos = obs.get("qpos", {})

    left_arm = np.array(qpos.get("left_arm", np.zeros(6)))[:6]
    right_arm = np.array(qpos.get("right_arm", np.zeros(6)))[:6]

    left_grip = self._normalize_gripper(qpos.get("left_gripper", np.array([0.0])))
    right_grip = self._normalize_gripper(qpos.get("right_gripper", np.array([0.0])))

    # Combine into 14-dim array
    state_values = np.concatenate(
        [
            left_arm,  # [0:6]
            np.array([left_grip]),  # [6]
            right_arm,  # [7:13]
            np.array([right_grip]),  # [13]
        ]
    ).astype(np.float32)

    # Add individual state values with keys matching STATE_NAMES
    for i, name in enumerate(self.STATE_NAMES):
        model_input[name] = float(state_values[i])

    # Add task prompt
    prompt = self.task.get_task_description()
    model_input["task"] = prompt

    return model_input
prepare_model
prepare_model() -> None

Connect to the LeRobot gRPC server and load the policy.

Source code in molmo_spaces/policy/learned_policy/bimanual_yam_pi_policy.py
def prepare_model(self) -> None:
    """Connect to the LeRobot gRPC server and load the policy."""
    self.model_name = os.path.basename(self.checkpoint_path)

    host = self.remote_config.get("host", "localhost")
    port = self.remote_config.get("port", 8080)

    self.client = LeRobotGRPCClient(host=host, port=port)

    policy_type = self.remote_config.get("policy_type", "pi05")
    device = self.remote_config.get("device", "cuda")

    # Define lerobot_features matching the training dataset format (Jiafei1224/c)
    # These features tell build_dataset_frame how to transform raw observations
    lerobot_features = {
        "observation.state": {
            "dtype": "float32",
            "shape": [14],
            "names": self.STATE_NAMES,
        },
        "observation.images.left": {
            "dtype": "video",
            "shape": [360, 640, 3],
            "names": ["height", "width", "channels"],
        },
        "observation.images.right": {
            "dtype": "video",
            "shape": [360, 640, 3],
            "names": ["height", "width", "channels"],
        },
        "observation.images.top": {
            "dtype": "video",
            "shape": [360, 640, 3],
            "names": ["height", "width", "channels"],
        },
    }

    self.client.connect(
        pretrained_name_or_path=self.checkpoint_path,
        policy_type=policy_type,
        device=device,
        actions_per_chunk=self.buffer_length,
        lerobot_features=lerobot_features,
    )

    log.info(f"BimanualYamPiPolicy connected to {host}:{port}")
render
render(obs: dict) -> None

Display camera views for debugging.

Parameters:

Name Type Description Default
obs dict

Observation dictionary with camera images

required
Source code in molmo_spaces/policy/learned_policy/bimanual_yam_pi_policy.py
def render(self, obs: dict) -> None:
    """Display camera views for debugging.

    Args:
        obs: Observation dictionary with camera images
    """
    images = []
    for mj_cam in self.camera_mapping:
        if mj_cam in obs:
            images.append(obs[mj_cam])

    if images:
        combined = np.concatenate(images, axis=1)
        cv2.imshow("BimanualYam Views", cv2.cvtColor(combined, cv2.COLOR_RGB2BGR))
        cv2.waitKey(1)
reset
reset() -> None

Reset the policy state for a new episode.

Source code in molmo_spaces/policy/learned_policy/bimanual_yam_pi_policy.py
def reset(self) -> None:
    """Reset the policy state for a new episode."""
    self.actions_buffer = None
    self.current_buffer_index = 0
    self.starting_time = None

    # Reset client timestep counter
    if self.client is not None:
        self.client.reset()

cap_policy

Classes:

Name Description
CAP_Policy

Functions:

Name Description
action_tensor_to_matrix

Attributes:

Name Type Description
log
log module-attribute
log = getLogger(__name__)
CAP_Policy
CAP_Policy(exp_config: MlSpacesExpConfig, task_type: str)

Bases: InferencePolicy

Methods:

Name Description
add_auxiliary_objects

Add auxiliary objects to the scene that might be required for the policy.

get_action
get_all_phases

Returns:

get_info
get_phase

Returns:

inference_model
model_output_to_action
obs_to_model_input
prepare_model
render
reset

Attributes:

Name Type Description
config
current_phase
grasping_threshold
grasping_type
model
remote_config
target_poses
task
task_type
use_exo
use_vlm
Source code in molmo_spaces/policy/learned_policy/cap_policy.py
def __init__(
    self,
    exp_config: MlSpacesExpConfig,
    task_type: str,
) -> None:
    super().__init__(exp_config, exp_config.task_type)
    self.task_type = task_type
    self.remote_config = exp_config.policy_config.remote_config
    self.grasping_type = exp_config.policy_config.grasping_type
    self.grasping_threshold = exp_config.policy_config.grasping_threshold
    self.use_vlm = exp_config.policy_config.use_vlm
    self.use_exo = exp_config.policy_config.exo_vlm
    self.model = None
config instance-attribute
config = config
current_phase instance-attribute
current_phase = NONE_PHASE
grasping_threshold instance-attribute
grasping_threshold = grasping_threshold
grasping_type instance-attribute
grasping_type = grasping_type
model instance-attribute
model = None
remote_config instance-attribute
remote_config = remote_config
target_poses instance-attribute
target_poses = {'grasp': eye(4)}
task instance-attribute
task = task
task_type instance-attribute
task_type = task_type
use_exo instance-attribute
use_exo = exo_vlm
use_vlm instance-attribute
use_vlm = use_vlm
add_auxiliary_objects staticmethod
add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py
@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None
get_action
get_action(observation)
Source code in molmo_spaces/policy/base_policy.py
def get_action(self, observation):
    model_input = self.obs_to_model_input(observation)
    model_output = self.inference_model(model_input)
    action = self.model_output_to_action(model_output)
    return action
get_all_phases
get_all_phases() -> dict[str | int]

Returns:

Type Description
dict[str | int]

A dictionary of all possible policy phases

Source code in molmo_spaces/policy/base_policy.py
def get_all_phases(self) -> dict[str | int]:
    """
    Returns:
        A dictionary of all possible policy phases
    """
    return {"unknown": 0}
get_info
get_info() -> dict
Source code in molmo_spaces/policy/learned_policy/cap_policy.py
def get_info(self) -> dict:
    info = super().get_info()

    info["policy_checkpoint"] = self.model_name
    info["policy_grasping_threshold"] = self.grasping_threshold
    info["policy_grasping_type"] = self.grasping_type
    info["time_spent"] = time.time() - self.starting_time if self.starting_time else None
    info["timestamp"] = time.time()
    return info
get_phase
get_phase() -> str

Returns:

Type Description
str

The policy phase

Source code in molmo_spaces/policy/base_policy.py
def get_phase(self) -> str:
    """
    Returns:
        The policy phase
    """
    return "unknown"
inference_model
inference_model(model_input)
Source code in molmo_spaces/policy/learned_policy/cap_policy.py
def inference_model(self, model_input):
    self.step_counter += 1

    model_output = self.model.infer(model_input)
    model_output[0][:3] = np.array(
        [-model_output[0][0], -model_output[0][2], -model_output[0][1]]
    )
    model_output[0][3:6] = np.array(
        [-model_output[0][3], -model_output[0][5], -model_output[0][4]]
    )
    self.is_grasping = max(self.is_grasping, model_output[0][6] < self.grasping_threshold)
    delta_pose_mat = action_tensor_to_matrix(model_output[0][:6], "euler")
    T_world_ego = self.T_world_camera @ delta_pose_mat
    self.T_world_rum = T_world_ego

    goal_pose_7d = np.array(
        list(self.T_world_rum[:3, 3])
        + list(R.from_matrix(self.T_world_rum[:3, :3]).as_quat(scalar_first=True))
    )

    goal_pose_homogeneous = np.eye(4)
    goal_pose_homogeneous[:3, 3] = goal_pose_7d[:3]
    goal_pose_homogeneous[:3, :3] = R.from_quat(
        goal_pose_7d[3:7], scalar_first=True
    ).as_matrix()

    kinematics = self.task.env.current_robot.kinematics
    robot_view = self.task.env.current_robot.robot_view
    gripper_mgs = set(robot_view.get_gripper_movegroup_ids())
    mgs_except_gripper = [x for x in robot_view.move_group_ids() if x not in gripper_mgs]
    new_pose = goal_pose_homogeneous.copy()

    jp = kinematics.ik(
        "arm",
        new_pose,
        mgs_except_gripper,
        robot_view.get_qpos_dict(),
        robot_view.base.pose,
        rel_to_base=False,
    )
    action = robot_view.get_ctrl_dict()
    if jp is not None:
        action.update({mg_id: jp[mg_id] for mg_id in mgs_except_gripper})

    if self.grasping_type == "binary":
        if self.is_grasping:
            action["gripper"] = np.array([-255.0])
        else:
            action["gripper"] = np.array([0.0])
    else:
        action["gripper"] = (1 - model_output[0][6]) * np.array([-255.0])
        if self.is_grasping:
            action["gripper"] = np.array([-255.0])
    return action
model_output_to_action
model_output_to_action(model_output)
Source code in molmo_spaces/policy/learned_policy/cap_policy.py
def model_output_to_action(self, model_output):
    return model_output
obs_to_model_input
obs_to_model_input(obs)
Source code in molmo_spaces/policy/learned_policy/cap_policy.py
def obs_to_model_input(self, obs):
    if self.model is None:
        self.prepare_model()

    if hasattr(self, "T_world_object") is False or self.T_world_object is None:
        if not self.use_vlm:
            T_base_object = obs["object_poses"][self.task.config.task_config.pickup_obj_name]
            T_world_base = np.eye(4)
            T_world_base[:3, 3] = obs["robot_base_pose"][:3]
            T_world_base[:3, :3] = R.from_quat(
                obs["robot_base_pose"][3:7], scalar_first=True
            ).as_matrix()
            self.T_world_object = T_world_base @ T_base_object
        else:
            exo_depth = obs["exo_camera_1_depth"]
            exo_rgb = obs["exo_camera_1"]
            ego_depth = obs["wrist_camera_depth"]
            ego_rgb = obs["wrist_camera"]
            point_norm = self.model.infer_point(
                rgb=exo_rgb if self.use_exo else ego_rgb,
                object_name=self.task.config.task_config.referral_expressions[
                    "pickup_obj_name"
                ],
                task=self.task_type,
            )
            if self.use_exo:
                K = np.array(obs["sensor_param_exo_camera_1"]["intrinsic_cv"])
            else:
                K = np.array(obs["sensor_param_wrist_camera"]["intrinsic_cv"])
            fovy = 2 * np.arctan((2 * K[1, 2]) / (2 * K[1, 1]))
            x_norm, y_norm = point_norm
            width, height = (
                (exo_rgb.shape[1], exo_rgb.shape[0])
                if self.use_exo
                else (ego_rgb.shape[1], ego_rgb.shape[0])
            )
            x = int(x_norm * width)
            y = int(y_norm * height)
            depth_value = exo_depth[y, x] + 0.03 if self.use_exo else ego_depth[y, x] + 0.03
            f = height / (2 * np.tan(fovy / 2))
            cam_mat = np.array([[f, 0, width / 2], [0, f, height / 2], [0, 0, 1]])
            cx = cam_mat[0, 2]
            cy = cam_mat[1, 2]
            fx = cam_mat[0, 0]
            fy = cam_mat[1, 1]
            z_cam = -depth_value
            x_cam = -(x - cx) * z_cam / fx
            y_cam = -(cy - y) * z_cam / fy
            p_cam = np.array([x_cam, y_cam, z_cam])
            T_corr = np.eye(4)
            T_corr[:3, :3] = np.diag([1, -1, -1])
            if self.use_exo:
                camera_pose = (
                    np.array(obs["sensor_param_exo_camera_1"]["cam2world_gl"].copy()) @ T_corr
                )
            else:
                camera_pose = (
                    np.array(obs["sensor_param_wrist_camera"]["cam2world_gl"].copy()) @ T_corr
                )
            p_world = camera_pose[:3, :3] @ p_cam + camera_pose[:3, 3]
            self.T_world_object = np.eye(4)
            self.T_world_object[:3, 3] = p_world

    T_base_ego = np.eye(4)
    T_base_ego[:3, 3] = obs["tcp_pose"][:3]
    T_base_ego[:3, :3] = R.from_quat(obs["tcp_pose"][3:7], scalar_first=True).as_matrix()

    T_world_base = np.eye(4)
    T_world_base[:3, 3] = obs["robot_base_pose"][:3]
    T_world_base[:3, :3] = R.from_quat(
        obs["robot_base_pose"][3:7], scalar_first=True
    ).as_matrix()

    T_world_ego = T_world_base @ T_base_ego
    self.T_world_camera = T_world_ego
    T_camera_object = np.linalg.inv(T_world_ego) @ self.T_world_object
    object_3d_position = T_camera_object[:3, 3]
    object_3d_position = np.array(
        [-T_camera_object[0, 3], -T_camera_object[2, 3], -T_camera_object[1, 3]]
    )

    if self.is_grasping:
        object_3d_position = np.array([0.00, 0.18, 0.04])
    return {
        "rgb_ego": cv2.resize(obs["wrist_camera"], (224, 224)),
        "object_3d_position": object_3d_position,
    }
prepare_model
prepare_model()
Source code in molmo_spaces/policy/learned_policy/cap_policy.py
def prepare_model(self):
    host = "localhost"
    port = 8765

    max_retries = 3
    for attempt in range(max_retries):
        try:
            self.model = RUMClient(host=host, port=port)
            metadata = self.model.get_server_metadata()
            self.model_name = metadata.get("checkpoint", "rum")
            log.info(f"Connected to RUM server at {host}:{port}")
            break
        except Exception as e:
            if attempt < max_retries - 1:
                log.warning(f"Connection attempt {attempt + 1} failed: {e}. Retrying...")
                time.sleep(1)
            else:
                log.error(f"Failed to connect to RUM server after {max_retries} attempts")
                raise
render
render(obs)
Source code in molmo_spaces/policy/learned_policy/cap_policy.py
def render(self, obs):
    views = np.concatenate([obs["wrist_camera"], obs["exo_camera_1"]], axis=1)
    cv2.imshow("views", cv2.cvtColor(views, cv2.COLOR_RGB2BGR))
    cv2.waitKey(1)
reset
reset()
Source code in molmo_spaces/policy/learned_policy/cap_policy.py
def reset(self):
    self.starting_time = None
    self.T_world_object = None
    self.T_world_camera = None
    self.T_world_rum = None
    self.is_grasping = False
    self.step_counter = 0
action_tensor_to_matrix
action_tensor_to_matrix(action_tensor, rot_unit)
Source code in molmo_spaces/policy/learned_policy/cap_policy.py
def action_tensor_to_matrix(action_tensor, rot_unit):
    affine = np.eye(4)
    if rot_unit == "euler":
        r = R.from_euler("xyz", action_tensor[3:6], degrees=False)
    elif rot_unit == "axis":
        r = R.from_rotvec(action_tensor[3:6])
    else:
        raise NotImplementedError
    affine[:3, :3] = r.as_matrix()
    affine[:3, -1] = action_tensor[:3]
    return affine

dreamzero_policy

Classes:

Name Description
DreamZeroWebsocketClient

Websocket client that adds endpoint field for DreamZero server.

DreamZero_Policy

Attributes:

Name Type Description
PING_INTERVAL_SECS
PING_TIMEOUT_SECS
log
PING_INTERVAL_SECS module-attribute
PING_INTERVAL_SECS = 60
PING_TIMEOUT_SECS module-attribute
PING_TIMEOUT_SECS = 600
log module-attribute
log = getLogger(__name__)
DreamZeroWebsocketClient
DreamZeroWebsocketClient(host: str = '0.0.0.0', port: int = 8000)

Websocket client that adds endpoint field for DreamZero server.

Methods:

Name Description
infer
reset
Source code in molmo_spaces/policy/learned_policy/dreamzero_policy.py
def __init__(self, host: str = "0.0.0.0", port: int = 8000) -> None:
    self._uri = f"ws://{host}:{port}"
    self._packer = msgpack_numpy.Packer()
    self._ws, self._server_metadata = self._wait_for_server()
    # store the URI that actually worked so reconnects reuse it
    self._connected_uri = self._uri
infer
infer(obs: dict) -> dict
Source code in molmo_spaces/policy/learned_policy/dreamzero_policy.py
def infer(self, obs: dict) -> dict:
    obs["endpoint"] = "infer"
    data = self._packer.pack(obs)
    try:
        self._ws.send(data)
        response = self._ws.recv()
    except websockets.exceptions.ConnectionClosedError:
        logging.warning("ConnectionClosedError during infer. Reconnecting and retrying...")
        self._reconnect()
        self._ws.send(data)
        response = self._ws.recv()
    if isinstance(response, str):
        raise RuntimeError(f"Error in inference server:\n{response}")
    return msgpack_numpy.unpackb(response)
reset
reset(reset_info: dict = None) -> None
Source code in molmo_spaces/policy/learned_policy/dreamzero_policy.py
def reset(self, reset_info: dict = None) -> None:
    if reset_info is None:
        reset_info = {}
    reset_info["endpoint"] = "reset"
    data = self._packer.pack(reset_info)
    try:
        self._ws.send(data)
        response = self._ws.recv()
    except websockets.exceptions.ConnectionClosedError:
        logging.warning("ConnectionClosedError during reset. Reconnecting and retrying...")
        self._reconnect()
        self._ws.send(data)
        response = self._ws.recv()
    return response
DreamZero_Policy
DreamZero_Policy(exp_config: MlSpacesExpConfig, task_type: str)

Bases: InferencePolicy

Methods:

Name Description
add_auxiliary_objects

Add auxiliary objects to the scene that might be required for the policy.

get_action
get_all_phases

Returns:

get_info
get_phase

Returns:

inference_model
model_output_to_action
obs_to_model_input
prepare_model
render
reset

Attributes:

Name Type Description
checkpoint_path
chunk_size
config
current_phase
grasping_threshold
grasping_type
model
remote_config
session_id
target_poses
task
task_type
Source code in molmo_spaces/policy/learned_policy/dreamzero_policy.py
def __init__(
    self,
    exp_config: MlSpacesExpConfig,
    task_type: str,
) -> None:
    super().__init__(exp_config, exp_config.task_type)
    self.remote_config = exp_config.policy_config.remote_config
    self.checkpoint_path = exp_config.policy_config.checkpoint_path
    self.grasping_type = exp_config.policy_config.grasping_type
    self.chunk_size = exp_config.policy_config.chunk_size
    self.grasping_threshold = exp_config.policy_config.grasping_threshold
    self.model = None
    self.session_id = None
checkpoint_path instance-attribute
checkpoint_path = checkpoint_path
chunk_size instance-attribute
chunk_size = chunk_size
config instance-attribute
config = config
current_phase instance-attribute
current_phase = NONE_PHASE
grasping_threshold instance-attribute
grasping_threshold = grasping_threshold
grasping_type instance-attribute
grasping_type = grasping_type
model instance-attribute
model = None
remote_config instance-attribute
remote_config = remote_config
session_id instance-attribute
session_id = None
target_poses instance-attribute
target_poses = {'grasp': eye(4)}
task instance-attribute
task = task
task_type instance-attribute
task_type = task_type
add_auxiliary_objects staticmethod
add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py
@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None
get_action
get_action(observation)
Source code in molmo_spaces/policy/base_policy.py
def get_action(self, observation):
    model_input = self.obs_to_model_input(observation)
    model_output = self.inference_model(model_input)
    action = self.model_output_to_action(model_output)
    return action
get_all_phases
get_all_phases() -> dict[str | int]

Returns:

Type Description
dict[str | int]

A dictionary of all possible policy phases

Source code in molmo_spaces/policy/base_policy.py
def get_all_phases(self) -> dict[str | int]:
    """
    Returns:
        A dictionary of all possible policy phases
    """
    return {"unknown": 0}
get_info
get_info() -> dict
Source code in molmo_spaces/policy/learned_policy/dreamzero_policy.py
def get_info(self) -> dict:
    info = super().get_info()
    info["policy_name"] = "dreamzero"
    info["policy_checkpoint"] = self.model_name
    info["policy_buffer_length"] = self.chunk_size
    info["policy_grasping_threshold"] = self.grasping_threshold
    info["policy_grasping_type"] = self.grasping_type
    info["prompt"] = self.task.get_task_description()
    info["session_id"] = self.session_id
    info["time_spent"] = time.time() - self.starting_time if self.starting_time else None
    info["timestamp"] = time.time()
    return info
get_phase
get_phase() -> str

Returns:

Type Description
str

The policy phase

Source code in molmo_spaces/policy/base_policy.py
def get_phase(self) -> str:
    """
    Returns:
        The policy phase
    """
    return "unknown"
inference_model
inference_model(model_input)
Source code in molmo_spaces/policy/learned_policy/dreamzero_policy.py
def inference_model(self, model_input):
    if self.model is None:
        self.prepare_model()
    if self.starting_time is None:
        self.starting_time = time.time()
    if self.actions_buffer is None or self.current_buffer_index >= self.chunk_size:
        result = self.model.infer(model_input)
        self.actions_buffer = result["actions"]
        self.current_buffer_index = 0
    model_output = self.actions_buffer[self.current_buffer_index]
    self.current_buffer_index += 1
    return model_output
model_output_to_action
model_output_to_action(model_output)
Source code in molmo_spaces/policy/learned_policy/dreamzero_policy.py
def model_output_to_action(self, model_output):
    gripper_pos = np.clip(model_output[7], 0.0, 1.0)

    if self.grasping_type == "binary":
        gripper_pos = (
            np.array([255.0]) if gripper_pos >= self.grasping_threshold else np.array([0.0])
        )
    elif self.grasping_type == "semi_binary":
        gripper_pos = (
            gripper_pos * np.array([255.0])
            if gripper_pos <= self.grasping_threshold
            else np.array([255.0])
        )
    elif self.grasping_type == "continuous":
        gripper_pos = gripper_pos * np.array([255.0])
    else:
        raise ValueError(f"Invalid grasping type: {self.grasping_type}")

    arm_output = model_output[:7].reshape(
        7,
    )
    action = {
        "arm": arm_output,
        "gripper": gripper_pos,
    }
    return action
obs_to_model_input
obs_to_model_input(obs)
Source code in molmo_spaces/policy/learned_policy/dreamzero_policy.py
def obs_to_model_input(self, obs):
    # self.render(obs)
    prompt = self.task.get_task_description()
    grip = np.clip(obs["qpos"]["gripper"][0] / 0.824033, 0, 1)
    if grip < 0.1:
        grip = 0.00
    exo_camera_key = (
        "droid_shoulder_light_randomization"
        if "droid_shoulder_light_randomization" in obs
        else "exo_camera_1"
    )
    wrist_camera_key = (
        "wrist_camera_zed_mini" if "wrist_camera_zed_mini" in obs else "wrist_camera"
    )

    model_input = {
        "observation/exterior_image_0_left": resize_with_pad(obs[exo_camera_key], 180, 320),
        "observation/exterior_image_1_left": resize_with_pad(obs[exo_camera_key], 180, 320),
        "observation/wrist_image_left": resize_with_pad(obs[wrist_camera_key], 180, 320),
        "observation/joint_position": np.array(
            obs["qpos"]["arm"][:7], dtype=np.float64
        ).reshape(
            7,
        ),
        "observation/cartesian_position": np.zeros((6,), dtype=np.float64),
        "observation/gripper_position": np.array(grip, dtype=np.float64).reshape(
            1,
        ),
        "prompt": prompt,
        "session_id": self.session_id,
    }
    return model_input
prepare_model
prepare_model()
Source code in molmo_spaces/policy/learned_policy/dreamzero_policy.py
def prepare_model(self):
    self.model_name = (
        os.path.basename(self.checkpoint_path) if self.checkpoint_path else "dreamzero"
    )
    if self.remote_config is not None:
        self._prepare_remote_model()
    else:
        raise NotImplementedError("DreamZero policy only supports remote model inference")
render
render(obs)
Source code in molmo_spaces/policy/learned_policy/dreamzero_policy.py
def render(self, obs):
    views = np.concatenate([obs["wrist_camera"], obs["exo_camera_1"]], axis=1)
    cv2.imshow("views", cv2.cvtColor(views, cv2.COLOR_RGB2BGR))
    cv2.waitKey(1)
reset
reset()
Source code in molmo_spaces/policy/learned_policy/dreamzero_policy.py
def reset(self):
    self.actions_buffer = None
    self.current_buffer_index = 0
    self.starting_time = None
    self.session_id = str(uuid.uuid4())

keyboard_policy

Classes:

Name Description
Keyboard_Policy

Attributes:

Name Type Description
log
log module-attribute
log = getLogger(__name__)
Keyboard_Policy
Keyboard_Policy(exp_config: MlSpacesExpConfig, task_type: str)

Bases: InferencePolicy

Methods:

Name Description
add_auxiliary_objects

Add auxiliary objects to the scene that might be required for the policy.

get_action
get_all_phases

Returns:

get_info
get_phase

Returns:

inference_model
model_output_to_action
obs_to_model_input
prepare_model
render
reset

Attributes:

Name Type Description
config
current_phase
robot_type
rot_step
step_size
target_poses
task
task_type
Source code in molmo_spaces/policy/learned_policy/keyboard_policy.py
def __init__(
    self,
    exp_config: MlSpacesExpConfig,
    task_type: str,
) -> None:
    super().__init__(exp_config, task_type)
    from pynput import keyboard

    self.robot_type = exp_config.robot_config.name
    self.step_size = exp_config.policy_config.step_size
    self.rot_step = exp_config.policy_config.rot_step
    self._pressed = set()
    self._gripper_open = True
    self._listener = keyboard.Listener(
        on_press=self._on_press,
        on_release=self._on_release,
    )
    self._listener.start()
    log.info(
        "Keyboard policy started. "
        "w/s: up/down (z), arrows: x/y. "
        "a/d: yaw, e/r: pitch, z/c: roll. "
        "Space: toggle gripper. q: pause."
    )
config instance-attribute
config = config
current_phase instance-attribute
current_phase = NONE_PHASE
robot_type instance-attribute
robot_type = name
rot_step instance-attribute
rot_step = rot_step
step_size instance-attribute
step_size = step_size
target_poses instance-attribute
target_poses = {'grasp': eye(4)}
task instance-attribute
task = task
task_type instance-attribute
task_type = task_type
add_auxiliary_objects staticmethod
add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py
@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None
get_action
get_action(observation)
Source code in molmo_spaces/policy/base_policy.py
def get_action(self, observation):
    model_input = self.obs_to_model_input(observation)
    model_output = self.inference_model(model_input)
    action = self.model_output_to_action(model_output)
    return action
get_all_phases
get_all_phases() -> dict[str | int]

Returns:

Type Description
dict[str | int]

A dictionary of all possible policy phases

Source code in molmo_spaces/policy/base_policy.py
def get_all_phases(self) -> dict[str | int]:
    """
    Returns:
        A dictionary of all possible policy phases
    """
    return {"unknown": 0}
get_info
get_info() -> dict
Source code in molmo_spaces/policy/learned_policy/keyboard_policy.py
def get_info(self) -> dict:
    info = super().get_info()
    info["policy_name"] = "keyboard"
    info["timestamp"] = time.time()
    return info
get_phase
get_phase() -> str

Returns:

Type Description
str

The policy phase

Source code in molmo_spaces/policy/base_policy.py
def get_phase(self) -> str:
    """
    Returns:
        The policy phase
    """
    return "unknown"
inference_model
inference_model(model_input)
Source code in molmo_spaces/policy/learned_policy/keyboard_policy.py
def inference_model(self, model_input):
    if self._is_paused():
        return None

    self.current_position += self.current_rotation @ self._get_delta_position()
    self.current_rotation = self._get_delta_rotation() @ self.current_rotation

    if self.robot_type == "floating_rum":
        goal_pose = self.init_robot_pose.copy()
        goal_pose[:3, 3] = self.current_position
        goal_pose[:3, :3] = self.current_rotation

        goal_pose_7d = np.array(
            list(goal_pose[:3, 3])
            + list(R.from_matrix(goal_pose[:3, :3]).as_quat(scalar_first=True))
        )
        return {
            "base": goal_pose_7d,
            "gripper": np.array([0.0]) if self._gripper_open else np.array([-255.0]),
        }
    else:
        kinematics = self.task.env.current_robot.kinematics
        robot_view = self.task.env.current_robot.robot_view

        gripper_mgs = set(robot_view.get_gripper_movegroup_ids())
        mgs_except_gripper = [x for x in robot_view.move_group_ids() if x not in gripper_mgs]

        new_pose = np.eye(4)
        new_pose[:3, 3] = self.current_position
        new_pose[:3, :3] = self.current_rotation

        jp = kinematics.ik(
            "arm",
            new_pose,
            mgs_except_gripper,
            robot_view.get_qpos_dict(),
            robot_view.base.pose,
            rel_to_base=True,
        )
        action = robot_view.get_ctrl_dict()
        if jp is not None:
            action.update({mg_id: jp[mg_id] for mg_id in mgs_except_gripper})

        action["gripper"] = np.array([0.0]) if self._gripper_open else np.array([255.0])
        return action
model_output_to_action
model_output_to_action(model_output)
Source code in molmo_spaces/policy/learned_policy/keyboard_policy.py
def model_output_to_action(self, model_output):
    return model_output
obs_to_model_input
obs_to_model_input(obs)
Source code in molmo_spaces/policy/learned_policy/keyboard_policy.py
def obs_to_model_input(self, obs):
    self.render(obs)

    robot_pose = obs["robot_base_pose"]
    T_world_robot = np.eye(4)
    T_world_robot[:3, :3] = R.from_quat(robot_pose[3:], scalar_first=True).as_matrix()
    T_world_robot[:3, 3] = robot_pose[:3]

    if self.robot_type == "floating_rum":
        if self.init_robot_pose is None:
            self.init_robot_pose = T_world_robot
            self.current_position = T_world_robot[:3, 3].copy()
            self.current_rotation = T_world_robot[:3, :3].copy()
        return {"T_world_robot": T_world_robot}

    tcp_pose = obs["tcp_pose"]
    qpos = obs["qpos"]["arm"]

    T_robot_tcp = np.eye(4)
    T_robot_tcp[:3, :3] = R.from_quat(tcp_pose[3:], scalar_first=True).as_matrix()
    T_robot_tcp[:3, 3] = tcp_pose[:3]

    if self.init_tcp_pose is None:
        self.init_tcp_pose = T_robot_tcp.copy()
        self.current_position = T_robot_tcp[:3, 3].copy()
        self.current_rotation = T_robot_tcp[:3, :3].copy()

    T_world_tcp = T_world_robot @ T_robot_tcp
    return {
        "qpos": qpos,
        "T_world_robot": T_world_robot,
        "T_world_tcp": T_world_tcp,
        "T_robot_tcp": T_robot_tcp,
    }
prepare_model abstractmethod
prepare_model(model_name: str)
Source code in molmo_spaces/policy/base_policy.py
@abstractmethod
def prepare_model(self, model_name: str):
    raise NotImplementedError
render
render(obs)
Source code in molmo_spaces/policy/learned_policy/keyboard_policy.py
def render(self, obs):
    if self.robot_type == "floating_rum":
        try:
            window_exists = cv2.getWindowProperty("views", cv2.WND_PROP_VISIBLE) >= 0
        except cv2.error:
            window_exists = False

        if not window_exists:
            screen_res = (1920, 1080)
            aspect_ratio = obs["wrist_camera"].shape[0] / obs["wrist_camera"].shape[1]
            new_width = int(screen_res[0] * 0.9)
            new_height = int(new_width * aspect_ratio)
            cv2.namedWindow("views", cv2.WINDOW_NORMAL)
            cv2.resizeWindow("views", new_width, new_height)
            cv2.moveWindow(
                "views", (screen_res[0] - new_width) // 2, (screen_res[1] - new_height) // 2
            )

        cv2.imshow("views", cv2.cvtColor(obs["wrist_camera"], cv2.COLOR_RGB2BGR))
        cv2.waitKey(1)
    else:
        views = np.concatenate(
            [np.flip(np.flip(obs["wrist_camera"], axis=0), axis=1), obs["exo_camera_1"]], axis=1
        )
        try:
            window_exists = cv2.getWindowProperty("views", cv2.WND_PROP_VISIBLE) >= 0
        except cv2.error:
            window_exists = False

        if not window_exists:
            screen_res = (1920, 1080)
            aspect_ratio = views.shape[0] / views.shape[1]
            new_width = int(screen_res[0] * 0.9)
            new_height = int(new_width * aspect_ratio)
            cv2.namedWindow("views", cv2.WINDOW_NORMAL)
            cv2.resizeWindow("views", new_width, new_height)
            cv2.moveWindow(
                "views", (screen_res[0] - new_width) // 2, (screen_res[1] - new_height) // 2
            )

        cv2.imshow("views", cv2.cvtColor(views, cv2.COLOR_RGB2BGR))
        cv2.waitKey(1)
reset
reset()
Source code in molmo_spaces/policy/learned_policy/keyboard_policy.py
def reset(self):
    self.init_robot_pose = None
    self.init_tcp_pose = None
    self.current_position = None
    self.current_rotation = None
    self._gripper_open = True

lerobot_grpc_client

LeRobot gRPC client for remote policy inference.

This client connects to a LeRobot async inference server and handles the gRPC protocol for sending observations and receiving actions.

Classes:

Name Description
LeRobotGRPCClient

gRPC client for LeRobot async inference server.

Attributes:

Name Type Description
log
log module-attribute
log = getLogger(__name__)
LeRobotGRPCClient
LeRobotGRPCClient(host: str = 'localhost', port: int = 8080)

gRPC client for LeRobot async inference server.

This client handles the communication protocol with a LeRobot policy server, including connection setup, observation streaming, and action retrieval.

Initialize the gRPC client.

Parameters:

Name Type Description Default
host str

Server hostname

'localhost'
port int

Server port

8080

Methods:

Name Description
close

Close the gRPC channel.

connect

Connect to the server and initialize the policy.

infer

Send observation and receive action chunk.

reset

Reset the client state (timestep counter).

Attributes:

Name Type Description
channel
connected
host
port
stub
timestep
Source code in molmo_spaces/policy/learned_policy/lerobot_grpc_client.py
def __init__(self, host: str = "localhost", port: int = 8080):
    """Initialize the gRPC client.

    Args:
        host: Server hostname
        port: Server port
    """
    self.host = host
    self.port = port
    self.channel = None
    self.stub = None
    self.timestep = 0
    self.connected = False

    # Will be set during connect() - imported from lerobot for pickle compatibility
    self._services_pb2 = None
    self._services_pb2_grpc = None
    self._RemotePolicyConfig = None
    self._TimedObservation = None
channel instance-attribute
channel = None
connected instance-attribute
connected = False
host instance-attribute
host = host
port instance-attribute
port = port
stub instance-attribute
stub = None
timestep instance-attribute
timestep = 0
close
close() -> None

Close the gRPC channel.

Source code in molmo_spaces/policy/learned_policy/lerobot_grpc_client.py
def close(self) -> None:
    """Close the gRPC channel."""
    if self.channel:
        self.channel.close()
        self.channel = None
        self.stub = None
    self.connected = False
    log.info("Disconnected from LeRobot server")
connect
connect(pretrained_name_or_path: str, policy_type: str = 'pi05', device: str = 'cuda', actions_per_chunk: int = 50, lerobot_features: dict | None = None, rename_map: dict | None = None, max_retries: int = 5, retry_delay: float = 1.0) -> None

Connect to the server and initialize the policy.

Parameters:

Name Type Description Default
pretrained_name_or_path str

HuggingFace model ID or local path

required
policy_type str

Type of policy (e.g., "pi05", "act", "smolvla")

'pi05'
device str

Device to run inference on ("cuda", "cpu")

'cuda'
actions_per_chunk int

Number of actions per inference call

50
lerobot_features dict | None

Feature definitions for observations

None
rename_map dict | None

Optional mapping to rename observation keys

None
max_retries int

Number of connection retry attempts

5
retry_delay float

Delay between retries in seconds

1.0
Source code in molmo_spaces/policy/learned_policy/lerobot_grpc_client.py
def connect(
    self,
    pretrained_name_or_path: str,
    policy_type: str = "pi05",
    device: str = "cuda",
    actions_per_chunk: int = 50,
    lerobot_features: dict | None = None,
    rename_map: dict | None = None,
    max_retries: int = 5,
    retry_delay: float = 1.0,
) -> None:
    """Connect to the server and initialize the policy.

    Args:
        pretrained_name_or_path: HuggingFace model ID or local path
        policy_type: Type of policy (e.g., "pi05", "act", "smolvla")
        device: Device to run inference on ("cuda", "cpu")
        actions_per_chunk: Number of actions per inference call
        lerobot_features: Feature definitions for observations
        rename_map: Optional mapping to rename observation keys
        max_retries: Number of connection retry attempts
        retry_delay: Delay between retries in seconds
    """
    # Import from lerobot to ensure pickle compatibility with server
    try:
        from lerobot.async_inference.helpers import RemotePolicyConfig, TimedObservation
        from lerobot.transport import services_pb2, services_pb2_grpc
    except ImportError as e:
        log.error(
            "lerobot package with transport module is required. "
            "Install it with: pip install -e '.[async]' from lerobot repo"
        )
        raise e

    self._services_pb2 = services_pb2
    self._services_pb2_grpc = services_pb2_grpc
    self._RemotePolicyConfig = RemotePolicyConfig
    self._TimedObservation = TimedObservation

    # Create gRPC channel with large message limits for images
    self.channel = grpc.insecure_channel(
        f"{self.host}:{self.port}",
        options=[
            ("grpc.max_send_message_length", 100 * 1024 * 1024),  # 100MB
            ("grpc.max_receive_message_length", 100 * 1024 * 1024),
        ],
    )
    self.stub = services_pb2_grpc.AsyncInferenceStub(self.channel)

    # Retry connection
    for attempt in range(max_retries):
        try:
            # Signal ready
            self.stub.Ready(services_pb2.Empty())

            # Build and send policy config using lerobot's class for pickle compatibility
            policy_config = self._RemotePolicyConfig(
                policy_type=policy_type,
                pretrained_name_or_path=pretrained_name_or_path,
                lerobot_features=lerobot_features or {},
                actions_per_chunk=actions_per_chunk,
                device=device,
                rename_map=rename_map or {},
            )

            config_bytes = pickle.dumps(policy_config)
            self.stub.SendPolicyInstructions(services_pb2.PolicySetup(data=config_bytes))

            self.connected = True
            log.info(f"Connected to LeRobot server at {self.host}:{self.port}")
            log.info(f"Loaded policy: {pretrained_name_or_path} ({policy_type})")
            return

        except grpc.RpcError as e:
            if attempt < max_retries - 1:
                log.warning(f"Connection attempt {attempt + 1} failed: {e}. Retrying...")
                time.sleep(retry_delay)
            else:
                log.error(f"Failed to connect after {max_retries} attempts")
                raise
infer
infer(observation: dict[str, Any]) -> list[ndarray]

Send observation and receive action chunk.

Parameters:

Name Type Description Default
observation dict[str, Any]

Dictionary with observation data matching LeRobot format: - "observation.state": np.ndarray of robot state - "observation.images.": np.ndarray images (H, W, C) - "task": optional text prompt

required

Returns:

Type Description
list[ndarray]

List of action arrays (one per timestep in the chunk)

Source code in molmo_spaces/policy/learned_policy/lerobot_grpc_client.py
def infer(self, observation: dict[str, Any]) -> list[np.ndarray]:
    """Send observation and receive action chunk.

    Args:
        observation: Dictionary with observation data matching LeRobot format:
            - "observation.state": np.ndarray of robot state
            - "observation.images.<camera>": np.ndarray images (H, W, C)
            - "task": optional text prompt

    Returns:
        List of action arrays (one per timestep in the chunk)
    """
    if not self.connected:
        raise RuntimeError("Not connected. Call connect() first.")

    services_pb2 = self._services_pb2

    # Wrap observation in TimedObservation using lerobot's class for pickle compatibility
    timed_obs = self._TimedObservation(
        timestamp=time.time(),
        timestep=self.timestep,
        observation=observation,
        must_go=True,  # Force processing
    )
    self.timestep += 1

    # Serialize and send
    obs_bytes = pickle.dumps(timed_obs)
    self.stub.SendObservations(self._chunk_observation(obs_bytes))

    # Get action chunk
    response = self.stub.GetActions(services_pb2.Empty())

    if response.data:
        timed_actions = pickle.loads(response.data)
        # Extract action tensors from TimedAction objects
        return [
            ta.action.numpy() if hasattr(ta.action, "numpy") else np.array(ta.action)
            for ta in timed_actions
        ]

    return []
reset
reset() -> None

Reset the client state (timestep counter).

Source code in molmo_spaces/policy/learned_policy/lerobot_grpc_client.py
def reset(self) -> None:
    """Reset the client state (timestep counter)."""
    self.timestep = 0

phone_policy

Classes:

Name Description
Phone_Policy

Attributes:

Name Type Description
log
log module-attribute
log = getLogger(__name__)
Phone_Policy
Phone_Policy(exp_config: MlSpacesExpConfig, task_type: str)

Bases: InferencePolicy

Methods:

Name Description
add_auxiliary_objects

Add auxiliary objects to the scene that might be required for the policy.

get_action
get_all_phases

Returns:

get_info
get_phase

Returns:

inference_model
model_output_to_action
obs_to_model_input
prepare_model
render
reset

Attributes:

Name Type Description
config
current_phase
robot_type
session
target_poses
task
task_type
Source code in molmo_spaces/policy/learned_policy/phone_policy.py
def __init__(
    self,
    exp_config: MlSpacesExpConfig,
    task_type: str,
) -> None:
    super().__init__(exp_config, task_type)
    self.robot_type = exp_config.robot_config.name
    self.session = teledex.Session()
    self.session.start()
    while self.session.get_latest_data()["position"] is None:
        time.sleep(0.1)
    log.info(f"Teledex session connected. Robot type: {self.robot_type}")
config instance-attribute
config = config
current_phase instance-attribute
current_phase = NONE_PHASE
robot_type instance-attribute
robot_type = name
session instance-attribute
session = Session()
target_poses instance-attribute
target_poses = {'grasp': eye(4)}
task instance-attribute
task = task
task_type instance-attribute
task_type = task_type
add_auxiliary_objects staticmethod
add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py
@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None
get_action
get_action(observation)
Source code in molmo_spaces/policy/base_policy.py
def get_action(self, observation):
    model_input = self.obs_to_model_input(observation)
    model_output = self.inference_model(model_input)
    action = self.model_output_to_action(model_output)
    return action
get_all_phases
get_all_phases() -> dict[str | int]

Returns:

Type Description
dict[str | int]

A dictionary of all possible policy phases

Source code in molmo_spaces/policy/base_policy.py
def get_all_phases(self) -> dict[str | int]:
    """
    Returns:
        A dictionary of all possible policy phases
    """
    return {"unknown": 0}
get_info
get_info() -> dict
Source code in molmo_spaces/policy/learned_policy/phone_policy.py
def get_info(self) -> dict:
    info = super().get_info()
    info["policy_name"] = "mujoco_ar"
    info["timestamp"] = time.time()
    return info
get_phase
get_phase() -> str

Returns:

Type Description
str

The policy phase

Source code in molmo_spaces/policy/base_policy.py
def get_phase(self) -> str:
    """
    Returns:
        The policy phase
    """
    return "unknown"
inference_model
inference_model(model_input)
Source code in molmo_spaces/policy/learned_policy/phone_policy.py
def inference_model(self, model_input):
    if self.session.get_latest_data()["button"]:
        return None
    if self.robot_type == "floating_rum":
        phone_pose = np.eye(4)
        phone_pose[:3, :3] = self.session.get_latest_data()["rotation"].reshape(3, 3)
        phone_pose[:3, 3] = self.session.get_latest_data()["position"].reshape(3)

        if self.init_phone_pose is None:
            self.init_phone_pose = phone_pose

        transform = np.eye(4)
        transform[:3, :3] = R.from_euler("xyz", [0, 0, 0], degrees=True).as_matrix()
        delta_phone_pose = (
            transform
            @ np.linalg.inv(self.init_phone_pose)
            @ phone_pose
            @ np.linalg.inv(transform)
        )
        delta_phone_pose[:3, 3] *= 3  # sensitivity

        goal_pose = self.init_robot_pose @ delta_phone_pose

        goal_pose_7d = np.array(
            list(goal_pose[:3, 3])
            + list(R.from_matrix(goal_pose[:3, :3]).as_quat(scalar_first=True))
        )
        action = {
            "base": goal_pose_7d,
            "gripper": np.array([0.0])
            if self.session.get_latest_data()["toggle"] == self.current_grasp
            else np.array([-255.0]),
        }
        return action
    else:
        kinematics = self.task.env.current_robot.kinematics
        robot_view = self.task.env.current_robot.robot_view

        gripper_mgs = set(robot_view.get_gripper_movegroup_ids())
        mgs_except_gripper = [x for x in robot_view.move_group_ids() if x not in gripper_mgs]

        phone_pose = np.eye(4)
        phone_pose[:3, :3] = self.session.get_latest_data()["rotation"].reshape(3, 3)
        phone_pose[:3, 3] = self.session.get_latest_data()["position"].reshape(3)

        if self.init_phone_pose is None:
            self.init_phone_pose = phone_pose

        delta_phone_pose = self.init_phone_pose @ np.linalg.inv(phone_pose)
        delta_phone_position = -delta_phone_pose[:3, 3] * 2  # Negate to flip directions
        delta_phone_rotation = delta_phone_pose[:3, :3].T  # Transpose to invert rotation
        new_pose = np.eye(4)
        new_pose[:3, :3] = delta_phone_rotation @ self.init_tcp_pose[:3, :3]
        new_pose[:3, 3] = self.init_tcp_pose[:3, 3] + delta_phone_position

        jp = kinematics.ik(
            "arm",
            new_pose,
            mgs_except_gripper,
            robot_view.get_qpos_dict(),
            robot_view.base.pose,
            rel_to_base=True,
        )
        action = robot_view.get_ctrl_dict()
        if jp is not None:
            action.update({mg_id: jp[mg_id] for mg_id in mgs_except_gripper})

        if self.session.get_latest_data()["toggle"] != self.current_grasp:
            action["gripper"] = np.array([255.0])
        else:
            action["gripper"] = np.array([0.0])
        return action
model_output_to_action
model_output_to_action(model_output)
Source code in molmo_spaces/policy/learned_policy/phone_policy.py
def model_output_to_action(self, model_output):
    return model_output
obs_to_model_input
obs_to_model_input(obs)
Source code in molmo_spaces/policy/learned_policy/phone_policy.py
def obs_to_model_input(self, obs):
    if self.robot_type == "floating_rum":
        self.render(obs)

        T_world_robot = np.eye(4)
        T_world_robot[:3, 3] = obs["robot_base_pose"][:3]
        T_world_robot[:3, :3] = R.from_quat(
            obs["robot_base_pose"][3:7], scalar_first=True
        ).as_matrix()

        if self.init_robot_pose is None:
            self.init_robot_pose = T_world_robot

        return {
            "T_world_robot": T_world_robot,
        }
    else:
        self.render(obs)
        robot_pose = obs["robot_base_pose"]
        tcp_pose = obs["tcp_pose"]
        qpos = obs["qpos"]["arm"]

        T_world_robot = np.eye(4)
        T_world_robot[:3, :3] = R.from_quat(robot_pose[3:], scalar_first=True).as_matrix()
        T_world_robot[:3, 3] = robot_pose[:3]

        T_robot_tcp = np.eye(4)
        T_robot_tcp[:3, :3] = R.from_quat(tcp_pose[3:], scalar_first=True).as_matrix()
        T_robot_tcp[:3, 3] = tcp_pose[:3]

        if self.init_tcp_pose is None:
            self.init_tcp_pose = T_robot_tcp

        T_world_tcp = T_world_robot @ T_robot_tcp

        return {
            "qpos": qpos,
            "T_world_robot": T_world_robot,
            "T_world_tcp": T_world_tcp,
            "T_robot_tcp": T_robot_tcp,
        }
prepare_model abstractmethod
prepare_model(model_name: str)
Source code in molmo_spaces/policy/base_policy.py
@abstractmethod
def prepare_model(self, model_name: str):
    raise NotImplementedError
render
render(obs)
Source code in molmo_spaces/policy/learned_policy/phone_policy.py
def render(self, obs):
    if self.robot_type == "floating_rum":
        views = np.concatenate([obs["wrist_camera"], obs["exo_camera_1"]], axis=1)

        try:
            window_exists = cv2.getWindowProperty("views", cv2.WND_PROP_VISIBLE) >= 0
        except cv2.error:
            window_exists = False
        if not window_exists:
            screen_res = (1920, 1080)
            aspect_ratio = views.shape[0] / views.shape[1]
            new_width = int(screen_res[0] * 0.9)
            new_height = int(new_width * aspect_ratio)
            cv2.namedWindow("views", cv2.WINDOW_NORMAL)
            cv2.resizeWindow("views", new_width, new_height)
            cv2.moveWindow(
                "views", (screen_res[0] - new_width) // 2, (screen_res[1] - new_height) // 2
            )
        cv2.imshow("views", cv2.cvtColor(views, cv2.COLOR_RGB2BGR))
        cv2.waitKey(1)
    else:
        views = np.concatenate(
            [np.flip(np.flip(obs["wrist_camera"], axis=0), axis=1), obs["exo_camera_1"]], axis=1
        )

        try:
            window_exists = cv2.getWindowProperty("views", cv2.WND_PROP_VISIBLE) >= 0
        except cv2.error:
            window_exists = False
        if not window_exists:
            screen_res = (1920, 1080)
            aspect_ratio = views.shape[0] / views.shape[1]
            new_width = int(screen_res[0] * 0.9)
            new_height = int(new_width * aspect_ratio)
            cv2.namedWindow("views", cv2.WINDOW_NORMAL)
            cv2.resizeWindow("views", new_width, new_height)
            cv2.moveWindow(
                "views", (screen_res[0] - new_width) // 2, (screen_res[1] - new_height) // 2
            )
        cv2.imshow("views", cv2.cvtColor(views, cv2.COLOR_RGB2BGR))
        cv2.waitKey(1)
reset
reset()
Source code in molmo_spaces/policy/learned_policy/phone_policy.py
def reset(self):
    self.init_robot_pose = None
    self.init_phone_pose = None
    self.init_tcp_pose = None
    self.current_grasp = self.session.get_latest_data()["toggle"]

pi_policy

Classes:

Name Description
PI_Policy
PI_PolicyState

Attributes:

Name Type Description
log
log module-attribute
log = getLogger(__name__)
PI_Policy
PI_Policy(exp_config: MlSpacesExpConfig, task_type: str)

Bases: InferencePolicy, StatefulPolicy

Methods:

Name Description
add_auxiliary_objects

Add auxiliary objects to the scene that might be required for the policy.

get_action
get_all_phases

Returns:

get_info
get_phase

Returns:

get_state
inference_model
model_output_to_action
obs_to_model_input
prepare_model
render
reset
set_state

Attributes:

Name Type Description
checkpoint_path
chunk_size
config
current_phase
grasping_threshold
grasping_type
model
remote_config
target_poses
task
task_type
Source code in molmo_spaces/policy/learned_policy/pi_policy.py
def __init__(
    self,
    exp_config: MlSpacesExpConfig,
    task_type: str,
) -> None:
    super().__init__(exp_config, exp_config.task_type)
    self.remote_config = exp_config.policy_config.remote_config
    self.checkpoint_path = exp_config.policy_config.checkpoint_path
    self.grasping_type = exp_config.policy_config.grasping_type
    self.chunk_size = exp_config.policy_config.chunk_size
    self.grasping_threshold = exp_config.policy_config.grasping_threshold
    self.model = None  # don't init model till inference to allow multiprocessing
checkpoint_path instance-attribute
checkpoint_path = checkpoint_path
chunk_size instance-attribute
chunk_size = chunk_size
config instance-attribute
config = config
current_phase instance-attribute
current_phase = NONE_PHASE
grasping_threshold instance-attribute
grasping_threshold = grasping_threshold
grasping_type instance-attribute
grasping_type = grasping_type
model instance-attribute
model = None
remote_config instance-attribute
remote_config = remote_config
target_poses instance-attribute
target_poses = {'grasp': eye(4)}
task instance-attribute
task = task
task_type instance-attribute
task_type = task_type
add_auxiliary_objects staticmethod
add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py
@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None
get_action
get_action(observation)
Source code in molmo_spaces/policy/base_policy.py
def get_action(self, observation):
    model_input = self.obs_to_model_input(observation)
    model_output = self.inference_model(model_input)
    action = self.model_output_to_action(model_output)
    return action
get_all_phases
get_all_phases() -> dict[str | int]

Returns:

Type Description
dict[str | int]

A dictionary of all possible policy phases

Source code in molmo_spaces/policy/base_policy.py
def get_all_phases(self) -> dict[str | int]:
    """
    Returns:
        A dictionary of all possible policy phases
    """
    return {"unknown": 0}
get_info
get_info() -> dict
Source code in molmo_spaces/policy/learned_policy/pi_policy.py
def get_info(self) -> dict:
    info = super().get_info()
    info["policy_name"] = (
        self.model.get_server_metadata().get("policy_name", "pi")
        if hasattr(self.model, "get_server_metadata")
        else "pi"
    )
    info["policy_checkpoint"] = self.model_name
    info["policy_buffer_length"] = self.chunk_size
    info["policy_grasping_threshold"] = self.grasping_threshold
    info["policy_grasping_type"] = self.grasping_type
    info["prompt"] = self.task.get_task_description()
    log.info(f"Current prompt: {info['prompt']}")

    info["time_spent"] = time.time() - self.starting_time if self.starting_time else None
    info["timestamp"] = time.time()
    return info
get_phase
get_phase() -> str

Returns:

Type Description
str

The policy phase

Source code in molmo_spaces/policy/base_policy.py
def get_phase(self) -> str:
    """
    Returns:
        The policy phase
    """
    return "unknown"
get_state
get_state()
Source code in molmo_spaces/policy/learned_policy/pi_policy.py
def get_state(self):
    return PI_PolicyState(
        actions_buffer=self.actions_buffer,
        current_buffer_index=self.current_buffer_index,
        starting_time=self.starting_time,
    )
inference_model
inference_model(model_input)
Source code in molmo_spaces/policy/learned_policy/pi_policy.py
def inference_model(self, model_input):
    if self.model is None:
        self.prepare_model()
    if self.starting_time is None:
        self.starting_time = time.time()
    if self.actions_buffer is None or self.current_buffer_index >= self.chunk_size:
        import websockets

        try:
            self.actions_buffer = self.model.infer(model_input)["actions"]
        except websockets.exceptions.ConnectionClosedError:
            log.error("Connection closed error. Attempting to reset connection...")
            self.prepare_model()
            log.info("Sleeping 5s...")
            time.sleep(5)
            log.info("Retrying inference...")
            self.actions_buffer = self.model.infer(model_input)["actions"]
        self.current_buffer_index = 0
    model_output = self.actions_buffer[self.current_buffer_index]
    self.current_buffer_index += 1
    return model_output
model_output_to_action
model_output_to_action(model_output)
Source code in molmo_spaces/policy/learned_policy/pi_policy.py
def model_output_to_action(self, model_output):
    if self.grasping_type == "continuous":
        gripper_pos = model_output[7] * np.array([255.0])
    else:  # binary
        gripper_pos = (
            np.array([255.0]) if model_output[7] > self.grasping_threshold else np.array([0.0])
        )

    arm_output = model_output[:7].reshape(
        7,
    )
    action = {
        "arm": arm_output,
        "gripper": gripper_pos,
    }
    return action
obs_to_model_input
obs_to_model_input(obs)
Source code in molmo_spaces/policy/learned_policy/pi_policy.py
def obs_to_model_input(self, obs):
    # self.render(obs)
    if isinstance(obs, list):
        if len(obs) > 1:
            log.warning(
                "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n"
                "WARNING: obs list has %d elements but only using the first one!\n"
                "This may indicate a batching issue - expected single observation.\n"
                "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!",
                len(obs),
            )
        obs = obs[0]
    model_input = {**obs}
    prompt = self.task.get_task_description()

    # For local eval
    if isinstance(obs, list | tuple):
        obs = obs[0]

    grip = np.clip(obs["qpos"]["gripper"][0] / 0.824033, 0, 1)
    exo_camera_key = (
        "droid_shoulder_light_randomization"
        if "droid_shoulder_light_randomization" in obs
        else "exo_camera_1"
    )
    wrist_camera_key = (
        "wrist_camera_zed_mini" if "wrist_camera_zed_mini" in obs else "wrist_camera"
    )
    model_input = {
        "observation/exterior_image_1_left": resize_with_pad(obs[exo_camera_key], 224, 224),
        "observation/wrist_image_left": resize_with_pad(obs[wrist_camera_key], 224, 224),
        "observation/joint_position": np.array(obs["qpos"]["arm"][:7]).reshape(
            7,
        ),
        "observation/gripper_position": np.array(grip).reshape(
            1,
        ),
        "prompt": prompt.lower(),
    }
    return model_input
prepare_model
prepare_model()
Source code in molmo_spaces/policy/learned_policy/pi_policy.py
def prepare_model(self):
    self.model_name = os.path.basename(self.checkpoint_path)
    if self.remote_config is not None:
        self._prepare_remote_model(self.checkpoint_path)
    else:
        self._prepare_local_model(self.checkpoint_path)
render
render(obs)
Source code in molmo_spaces/policy/learned_policy/pi_policy.py
def render(self, obs):
    views = np.concatenate([obs["wrist_camera"], obs["exo_camera_1"]], axis=1)
    cv2.imshow("views", cv2.cvtColor(views, cv2.COLOR_RGB2BGR))
    cv2.waitKey(1)
reset
reset()
Source code in molmo_spaces/policy/learned_policy/pi_policy.py
def reset(self):
    self.actions_buffer = None
    self.current_buffer_index = 0
    self.starting_time = None
set_state
set_state(state: PI_PolicyState)
Source code in molmo_spaces/policy/learned_policy/pi_policy.py
def set_state(self, state: PI_PolicyState):
    self.actions_buffer = state.actions_buffer
    self.current_buffer_index = state.current_buffer_index
    self.starting_time = state.starting_time
PI_PolicyState dataclass
PI_PolicyState(actions_buffer: list[ndarray] | None = None, current_buffer_index: int = 0, starting_time: float | None = None)

Attributes:

Name Type Description
actions_buffer list[ndarray] | None
current_buffer_index int
starting_time float | None
actions_buffer class-attribute instance-attribute
actions_buffer: list[ndarray] | None = None
current_buffer_index class-attribute instance-attribute
current_buffer_index: int = 0
starting_time class-attribute instance-attribute
starting_time: float | None = None

rum_client

Classes:

Name Description
RUMClient

Attributes:

Name Type Description
log
log module-attribute
log = getLogger(__name__)
RUMClient
RUMClient(host: str = 'localhost', port: int = 8765, max_size: int = 50 * 1024 * 1024)

Methods:

Name Description
close
get_server_metadata
infer
infer_point
infer_point_molmo
reset

Attributes:

Name Type Description
uri
websocket
Source code in molmo_spaces/policy/learned_policy/rum_client.py
def __init__(self, host: str = "localhost", port: int = 8765, max_size: int = 50 * 1024 * 1024):
    if ws_client is None:
        raise ImportError("websockets package required. Install with: pip install websockets")
    self.uri = f"ws://{host}:{port}"
    self.websocket = ws_client.connect(self.uri, max_size=max_size)
uri instance-attribute
uri = f'ws://{host}:{port}'
websocket instance-attribute
websocket = connect(uri, max_size=max_size)
close
close()
Source code in molmo_spaces/policy/learned_policy/rum_client.py
def close(self):
    if self.websocket:
        self.websocket.close()
        self.websocket = None
get_server_metadata
get_server_metadata() -> dict
Source code in molmo_spaces/policy/learned_policy/rum_client.py
def get_server_metadata(self) -> dict:
    response = self._send({"action": "metadata"})
    return {
        "policy_name": response.get("policy_name"),
        "checkpoint": response.get("checkpoint"),
    }
infer
infer(observation: dict) -> ndarray
Source code in molmo_spaces/policy/learned_policy/rum_client.py
def infer(self, observation: dict) -> np.ndarray:
    request = {
        "action": "infer",
        "observation": {
            "rgb_ego": observation["rgb_ego"].tolist(),
            "object_3d_position": observation["object_3d_position"].tolist(),
        },
    }
    return np.array(self._send(request)["result"], dtype=np.float32)
infer_point
infer_point(rgb: ndarray, object_name: str, task: str) -> ndarray
Source code in molmo_spaces/policy/learned_policy/rum_client.py
def infer_point(self, rgb: np.ndarray, object_name: str, task: str) -> np.ndarray:
    try:
        import google.genai as genai
        from google.genai import types
    except ImportError as e:
        raise ImportError(
            "google-genai package is not installed. Please install it with `pip install google-genai`."
        ) from e

    if not hasattr(self, "_gemini_client"):
        api_key = os.getenv("GEMINI_API_KEY")
        if not api_key:
            raise ValueError(
                "GEMINI_API_KEY environment variable not set. "
                "Please export GEMINI_API_KEY=your_api_key"
            )
        self._gemini_client = genai.Client(api_key=api_key)

    if rgb.dtype == np.float32 or rgb.dtype == np.float64:
        rgb = (rgb * 255).astype(np.uint8)
    image = Image.fromarray(rgb)

    if task == "pick":
        prompt = f"""Get the best point matching the following object: {object_name}. The label returned should be an identifying name for the object detected in the image, and the point should be at a point to pick that object.
The answer should follow the json format: [{{"point": [y, x], "label": "{object_name}"}}...]. The points are in [y, x] format normalized to 0-1000."""
    elif task == "open":
        prompt = f"""Get the best point to be grasped to open the {object_name} (eg. a handle). The label returned should be an identifying name for the object detected in the image, and the point should be the best point to open that object.
The answer should follow the json format: [{{"point": [y, x], "label": "{object_name}"}}...]. The points are in [y, x] format normalized to 0-1000."""
    elif task == "close":
        prompt = f"""Get the best point to be grasped to close the {object_name} (eg. a handle). The label returned should be an identifying name for the object detected in the image, and the point should be the best point to close that object.
The answer should follow the json format: [{{"point": [y, x], "label": "{object_name}"}}...]. The points are in [y, x] format normalized to 0-1000."""
    else:
        raise ValueError(f"Unsupported task: {task}")

    try:
        buf = io.BytesIO()
        image.save(buf, format="PNG")
        image_bytes = buf.getvalue()

        contents = [
            types.Part.from_bytes(
                data=image_bytes,
                mime_type="image/png",
            ),
            prompt,
        ]

        config = types.GenerateContentConfig(
            temperature=0.0, thinking_config=types.ThinkingConfig(thinking_budget=0)
        )

        response = self._call_gemini_api_with_retry(
            model="gemini-robotics-er-1.5-preview", contents=contents, config=config
        )
        generated_text = response.text

        json_point = self._parse_json_point_yx(generated_text)
        if json_point:
            y_pos, x_pos = json_point
            x_norm = float(x_pos) / 1000.0
            y_norm = float(y_pos) / 1000.0

            x_norm = max(0.0, min(1.0, x_norm))
            y_norm = max(0.0, min(1.0, y_norm))

            return np.array([x_norm, y_norm], dtype=np.float32)

        return np.array([0.5, 0.5], dtype=np.float32)

    except Exception as e:
        log.error(f"Gemini API error: {e}")
        return np.array([0.5, 0.5], dtype=np.float32)
infer_point_molmo
infer_point_molmo(rgb: ndarray, object_name: str = None, prompt: str = None, task: str = 'pick') -> ndarray
Source code in molmo_spaces/policy/learned_policy/rum_client.py
def infer_point_molmo(
    self, rgb: np.ndarray, object_name: str = None, prompt: str = None, task: str = "pick"
) -> np.ndarray:
    if object_name is None and prompt is None:
        raise ValueError("Either 'object_name' or 'prompt' must be provided")
    if object_name is not None and prompt is not None:
        raise ValueError("Provide either 'object_name' or 'prompt', not both")

    request = {
        "action": "infer_point",
        "rgb": rgb.tolist(),
    }

    if object_name is not None:
        request["object_name"] = object_name
        request["task"] = task
    else:
        request["prompt"] = prompt

    response = self._send(request)
    return np.array(response["point"], dtype=np.float32)
reset
reset()
Source code in molmo_spaces/policy/learned_policy/rum_client.py
def reset(self):
    self._send({"action": "reset"})

spacemouse_policy

Classes:

Name Description
SpaceMouse_Policy

Attributes:

Name Type Description
DOUBLE_CLICK_TIME
VENDOR_ID
log
DOUBLE_CLICK_TIME module-attribute
DOUBLE_CLICK_TIME = 0.3
VENDOR_ID module-attribute
VENDOR_ID = 9583
log module-attribute
log = getLogger(__name__)
SpaceMouse_Policy
SpaceMouse_Policy(exp_config: MlSpacesExpConfig, task_type: str)

Bases: InferencePolicy

Methods:

Name Description
add_auxiliary_objects

Add auxiliary objects to the scene that might be required for the policy.

get_action
get_all_phases

Returns:

get_info
get_phase

Returns:

inference_model
model_output_to_action
obs_to_model_input
prepare_model
render
reset

Attributes:

Name Type Description
config
current_phase
pos_sensitivity
robot_type
rot_sensitivity
target_poses
task
task_type
Source code in molmo_spaces/policy/learned_policy/spacemouse_policy.py
def __init__(
    self,
    exp_config: MlSpacesExpConfig,
    task_type: str,
) -> None:
    super().__init__(exp_config, task_type)
    self.robot_type = exp_config.robot_config.name
    self.pos_sensitivity = exp_config.policy_config.pos_sensitivity
    self.rot_sensitivity = exp_config.policy_config.rot_sensitivity
    self._mouse = _SpaceMouseReader(exp_config.policy_config.product_id)
    self._pressed = set()
    self._listener = keyboard.Listener(
        on_press=self._on_press,
        on_release=self._on_release,
    )
    self._listener.start()
    log.info(
        "SpaceMouse ready. Right click: engage. Left click: toggle gripper. Right double-click: disengage. q: pause."
    )
config instance-attribute
config = config
current_phase instance-attribute
current_phase = NONE_PHASE
pos_sensitivity instance-attribute
pos_sensitivity = pos_sensitivity
robot_type instance-attribute
robot_type = name
rot_sensitivity instance-attribute
rot_sensitivity = rot_sensitivity
target_poses instance-attribute
target_poses = {'grasp': eye(4)}
task instance-attribute
task = task
task_type instance-attribute
task_type = task_type
add_auxiliary_objects staticmethod
add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py
@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None
get_action
get_action(observation)
Source code in molmo_spaces/policy/base_policy.py
def get_action(self, observation):
    model_input = self.obs_to_model_input(observation)
    model_output = self.inference_model(model_input)
    action = self.model_output_to_action(model_output)
    return action
get_all_phases
get_all_phases() -> dict[str | int]

Returns:

Type Description
dict[str | int]

A dictionary of all possible policy phases

Source code in molmo_spaces/policy/base_policy.py
def get_all_phases(self) -> dict[str | int]:
    """
    Returns:
        A dictionary of all possible policy phases
    """
    return {"unknown": 0}
get_info
get_info() -> dict
Source code in molmo_spaces/policy/learned_policy/spacemouse_policy.py
def get_info(self) -> dict:
    info = super().get_info()
    info["policy_name"] = "spacemouse"
    info["timestamp"] = time.time()
    return info
get_phase
get_phase() -> str

Returns:

Type Description
str

The policy phase

Source code in molmo_spaces/policy/base_policy.py
def get_phase(self) -> str:
    """
    Returns:
        The policy phase
    """
    return "unknown"
inference_model
inference_model(model_input)
Source code in molmo_spaces/policy/learned_policy/spacemouse_policy.py
def inference_model(self, model_input):
    if self._is_paused():
        return None

    ctrl = self._mouse.control if self._mouse.engaged else [0.0] * 6
    delta_pos = np.array(ctrl[:3]) * self.pos_sensitivity
    delta_rot = R.from_euler("xyz", np.array(ctrl[3:]) * self.rot_sensitivity).as_matrix()

    new_position = self.current_position + self.current_rotation @ delta_pos
    new_rotation = delta_rot @ self.current_rotation

    gripper_open = self._mouse.gripper_open

    if self.robot_type == "floating_rum":
        self.current_position = new_position
        self.current_rotation = new_rotation
        goal_pose = self.init_robot_pose.copy()
        goal_pose[:3, 3] = self.current_position
        goal_pose[:3, :3] = self.current_rotation

        goal_pose_7d = np.array(
            list(goal_pose[:3, 3])
            + list(R.from_matrix(goal_pose[:3, :3]).as_quat(scalar_first=True))
        )
        return {
            "base": goal_pose_7d,
            "gripper": np.array([0.0]) if gripper_open else np.array([-255.0]),
        }
    else:
        kinematics = self.task.env.current_robot.kinematics
        robot_view = self.task.env.current_robot.robot_view

        gripper_mgs = set(robot_view.get_gripper_movegroup_ids())
        mgs_except_gripper = [x for x in robot_view.move_group_ids() if x not in gripper_mgs]

        new_pose = np.eye(4)
        new_pose[:3, 3] = new_position
        new_pose[:3, :3] = new_rotation

        jp = kinematics.ik(
            "arm",
            new_pose,
            mgs_except_gripper,
            robot_view.get_qpos_dict(),
            robot_view.base.pose,
            rel_to_base=True,
        )
        action = robot_view.get_ctrl_dict()
        if jp is not None:
            self.current_position = new_position
            self.current_rotation = new_rotation
            action.update({mg_id: jp[mg_id] for mg_id in mgs_except_gripper})

        action["gripper"] = np.array([0.0]) if gripper_open else np.array([255.0])
        return action
model_output_to_action
model_output_to_action(model_output)
Source code in molmo_spaces/policy/learned_policy/spacemouse_policy.py
def model_output_to_action(self, model_output):
    return model_output
obs_to_model_input
obs_to_model_input(obs)
Source code in molmo_spaces/policy/learned_policy/spacemouse_policy.py
def obs_to_model_input(self, obs):
    self.render(obs)

    robot_pose = obs["robot_base_pose"]
    T_world_robot = np.eye(4)
    T_world_robot[:3, :3] = R.from_quat(robot_pose[3:], scalar_first=True).as_matrix()
    T_world_robot[:3, 3] = robot_pose[:3]

    if self.robot_type == "floating_rum":
        if self.init_robot_pose is None:
            self.init_robot_pose = T_world_robot
            self.current_position = T_world_robot[:3, 3].copy()
            self.current_rotation = T_world_robot[:3, :3].copy()
        return {"T_world_robot": T_world_robot}

    tcp_pose = obs["tcp_pose"]
    qpos = obs["qpos"]["arm"]

    T_robot_tcp = np.eye(4)
    T_robot_tcp[:3, :3] = R.from_quat(tcp_pose[3:], scalar_first=True).as_matrix()
    T_robot_tcp[:3, 3] = tcp_pose[:3]

    if self.init_tcp_pose is None:
        self.init_tcp_pose = T_robot_tcp.copy()
        self.current_position = T_robot_tcp[:3, 3].copy()
        self.current_rotation = T_robot_tcp[:3, :3].copy()

    T_world_tcp = T_world_robot @ T_robot_tcp
    return {
        "qpos": qpos,
        "T_world_robot": T_world_robot,
        "T_world_tcp": T_world_tcp,
        "T_robot_tcp": T_robot_tcp,
    }
prepare_model abstractmethod
prepare_model(model_name: str)
Source code in molmo_spaces/policy/base_policy.py
@abstractmethod
def prepare_model(self, model_name: str):
    raise NotImplementedError
render
render(obs)
Source code in molmo_spaces/policy/learned_policy/spacemouse_policy.py
def render(self, obs):
    if self.robot_type == "floating_rum":
        try:
            window_exists = cv2.getWindowProperty("views", cv2.WND_PROP_VISIBLE) >= 0
        except cv2.error:
            window_exists = False

        if not window_exists:
            screen_res = (1920, 1080)
            aspect_ratio = obs["wrist_camera"].shape[0] / obs["wrist_camera"].shape[1]
            new_width = int(screen_res[0] * 0.9)
            new_height = int(new_width * aspect_ratio)
            cv2.namedWindow("views", cv2.WINDOW_NORMAL)
            cv2.resizeWindow("views", new_width, new_height)
            cv2.moveWindow(
                "views", (screen_res[0] - new_width) // 2, (screen_res[1] - new_height) // 2
            )

        cv2.imshow("views", cv2.cvtColor(obs["wrist_camera"], cv2.COLOR_RGB2BGR))
        cv2.waitKey(1)
    else:
        views = np.concatenate(
            [np.flip(np.flip(obs["wrist_camera"], axis=0), axis=1), obs["exo_camera_1"]], axis=1
        )
        try:
            window_exists = cv2.getWindowProperty("views", cv2.WND_PROP_VISIBLE) >= 0
        except cv2.error:
            window_exists = False

        if not window_exists:
            screen_res = (1920, 1080)
            aspect_ratio = views.shape[0] / views.shape[1]
            new_width = int(screen_res[0] * 0.9)
            new_height = int(new_width * aspect_ratio)
            cv2.namedWindow("views", cv2.WINDOW_NORMAL)
            cv2.resizeWindow("views", new_width, new_height)
            cv2.moveWindow(
                "views", (screen_res[0] - new_width) // 2, (screen_res[1] - new_height) // 2
            )

        cv2.imshow("views", cv2.cvtColor(views, cv2.COLOR_RGB2BGR))
        cv2.waitKey(1)
reset
reset()
Source code in molmo_spaces/policy/learned_policy/spacemouse_policy.py
def reset(self):
    self.init_robot_pose = None
    self.init_tcp_pose = None
    self.current_position = None
    self.current_rotation = None
    self._mouse.reset_grasp()

utils

Classes:

Name Description
PromptSampler

Functions:

Name Description
generate_object_hash
resize_with_pad

Attributes:

Name Type Description
log
log module-attribute
log = getLogger(__name__)
PromptSampler
PromptSampler(task_type: str = 'pick', prompt_templates: list[str] = None, prompt_object_word_num: int = 1, disambiguate_distractors_by_pos: bool = False)

Parameters:

Name Type Description Default
task_type str

The type of task to sample prompts for.

'pick'
prompt_templates list[str]

A list of prompt templates to sample from. If None, the default templates for the task type will be used.

None
prompt_object_word_num int

The number of words to use for the object name in the prompt.

1
disambiguate_distractors_by_pos bool

Whether to disambiguate distractors by position in the prompt. This relies on functionality only present when using a frozen config.

False

Methods:

Name Description
clean_object_name
get_prompt
get_short_description
get_state
get_target_object_uid
next
set_state

Attributes:

Name Type Description
DEFAULT_TEMPLATES_BY_TASK
current_index
prompt_object_word_num
prompt_templates
task_type
Source code in molmo_spaces/policy/learned_policy/utils.py
def __init__(
    self,
    task_type: str = "pick",
    prompt_templates: list[str] = None,
    prompt_object_word_num: int = 1,
    disambiguate_distractors_by_pos: bool = False,
) -> None:
    """
    Args:
        task_type: The type of task to sample prompts for.
        prompt_templates: A list of prompt templates to sample from. If None, the default templates for the task type will be used.
        prompt_object_word_num: The number of words to use for the object name in the prompt.
        disambiguate_distractors_by_pos: Whether to disambiguate distractors by position in the prompt.
            This relies on functionality only present when using a frozen config.
    """
    if prompt_templates is not None and task_type in ["pick", "pick_and_place"]:
        self.prompt_templates = prompt_templates
    elif task_type in self.DEFAULT_TEMPLATES_BY_TASK:
        self.prompt_templates = self.DEFAULT_TEMPLATES_BY_TASK[task_type]
    else:
        raise ValueError(
            f"Unknown task_type '{task_type}'. "
            f"Available task types: {list(self.DEFAULT_TEMPLATES_BY_TASK.keys())}"
        )
    self.task_type = task_type
    self.current_index = -1
    self.prompt_object_word_num = prompt_object_word_num
    self._cached_prompt = None
    self._disambiguate_distractors_by_pos = disambiguate_distractors_by_pos
DEFAULT_TEMPLATES_BY_TASK class-attribute instance-attribute
DEFAULT_TEMPLATES_BY_TASK = {'pick': ['pick up the {}.'], 'open': ['open the {}.'], 'pick_and_place': ['pick up the {} and place it on the {}.'], 'packing': ['pack container.'], 'close': ['close the {}.']}
current_index instance-attribute
current_index = -1
prompt_object_word_num instance-attribute
prompt_object_word_num = prompt_object_word_num
prompt_templates instance-attribute
prompt_templates = prompt_templates
task_type instance-attribute
task_type = task_type
clean_object_name
clean_object_name(task: BaseMujocoTask) -> str
Source code in molmo_spaces/policy/learned_policy/utils.py
def clean_object_name(self, task: BaseMujocoTask) -> str:
    return self.get_short_description(self.get_target_object_uid(task))[0]
get_prompt
get_prompt(task: BaseMujocoTask) -> str
Source code in molmo_spaces/policy/learned_policy/utils.py
def get_prompt(self, task: BaseMujocoTask) -> str:
    if self._cached_prompt is not None:
        return self._cached_prompt

    object_uid = self.get_target_object_uid(task)
    target_name = task.env.config.task_config.pickup_obj_name

    # Check if this is a custom object with a provided name
    eval_params = task.env.config.eval_runtime_params
    if (
        eval_params
        and eval_params.custom_object_name
        and target_name.startswith("custom_object/")
    ):
        # Use the provided custom object name directly
        object_name = eval_params.custom_object_name.lower()
    else:
        # Standard object handling
        short_descriptions: list[str] = ObjectMeta.short_descriptions(object_uid)
        target_category = "_".join(target_name.split("_")[0:1])

        if not short_descriptions:
            object_name = target_category
        elif self.prompt_object_word_num == 0:
            description = short_descriptions[3].lower()
            object_name = short_descriptions[0].lower()
            object_name = description.replace(object_name, "object")
        else:
            object_name = short_descriptions[self.prompt_object_word_num - 1].lower()

    if self._disambiguate_distractors_by_pos and self.task_type in ["pick", "pick_and_place"]:
        # TODO: this should pull from metadata or something, since object_poses is not guaranteed to be set
        target_pose = task.env.config.task_config.object_poses[target_name]
        robot_pose = task.env.config.task_config.robot_base_pose
        T_world_robot = np.eye(4)
        T_world_robot[:3, 3] = robot_pose[:3]
        T_world_robot[:3, :3] = R.from_quat(robot_pose[3:7], scalar_first=True).as_matrix()
        T_world_target = np.eye(4)
        T_world_target[:3, 3] = target_pose[:3]
        T_world_target[:3, :3] = R.from_quat(target_pose[3:7], scalar_first=True).as_matrix()
        T_robot_target = np.linalg.inv(T_world_robot) @ T_world_target
        target_pos = T_robot_target[:3, 3]

        distractors_pos = []

        for (
            distractor_name,
            distractor_pose,
        ) in task.env.config.task_config.object_poses.items():
            if (
                distractor_name == target_name
                or "_".join(distractor_name.split("_")[0:1]) != target_category
            ):
                continue
            T_world_distractor = np.eye(4)
            T_world_distractor[:3, 3] = distractor_pose[:3]
            T_world_distractor[:3, :3] = R.from_quat(
                distractor_pose[3:7], scalar_first=True
            ).as_matrix()
            T_robot_distractor = np.linalg.inv(T_world_robot) @ T_world_distractor
            if (
                np.linalg.norm(T_robot_distractor[:3, 3] - T_robot_target[:3, 3]) > 1.0
                or np.linalg.norm(T_robot_distractor[:3, 3]) > 1.0
            ):
                continue
            distractors_pos.append(T_robot_distractor[:3, 3])

        if len(distractors_pos) > 0:
            distractors_array = np.array(distractors_pos)

            deltas = target_pos - distractors_array
            abs_deltas = np.abs(deltas)
            min_indices = np.argmin(abs_deltas, axis=0)
            min_components = np.array(
                [
                    deltas[min_indices[0], 0],
                    deltas[min_indices[1], 1],
                    deltas[min_indices[2], 2],
                ]
            )
            max_component_index = np.argmax(np.abs(min_components))
            min_component_value = min_components[max_component_index]
            if max_component_index == 1:
                object_name += " on the left" if min_component_value > 0 else " on the right"
            elif max_component_index == 0:
                object_name += " in the back" if min_component_value > 0 else " in front"
            else:
                object_name += " above" if min_component_value > 0 else " below"

    if self.task_type == "pick_and_place":
        # Get place receptacle name from config (format: "place_receptacle/<uid>")
        place_receptacle_full_name = task.env.config.task_config.place_receptacle_name
        if place_receptacle_full_name:
            receptacle_uid = place_receptacle_full_name.split("/")[-1]
            receptacle_short_descriptions: list[str] = ObjectMeta.short_descriptions(
                receptacle_uid
            )

            if not receptacle_short_descriptions:
                log.warning(
                    "No receptacle short descriptions found, defaulting to 'receptacle'"
                )
                receptacle_name = "receptacle"
            elif self.prompt_object_word_num == 0:
                description = receptacle_short_descriptions[3].lower()
                base_name = receptacle_short_descriptions[0].lower()
                receptacle_name = description.replace(base_name, "object")
            else:
                receptacle_name = receptacle_short_descriptions[
                    self.prompt_object_word_num - 1
                ].lower()
        else:
            log.warning("No place receptacle found in config, defaulting to 'receptacle'")
            receptacle_name = "receptacle"

        self._cached_prompt = self.prompt_templates[self.current_index].format(
            object_name, receptacle_name
        )
    else:
        self._cached_prompt = self.prompt_templates[self.current_index].format(object_name)

    log.info(f"The prompt is: {self._cached_prompt}")
    return self._cached_prompt
get_short_description
get_short_description(object_uid)
Source code in molmo_spaces/policy/learned_policy/utils.py
def get_short_description(self, object_uid):
    return ObjectMeta.get_short_description(object_uid)
get_state
get_state()
Source code in molmo_spaces/policy/learned_policy/utils.py
def get_state(self):
    return {
        "current_index": self.current_index,
        "cached_prompt": self._cached_prompt,
    }
get_target_object_uid
get_target_object_uid(task)
Source code in molmo_spaces/policy/learned_policy/utils.py
def get_target_object_uid(self, task):
    return ObjectMeta.get_target_object_uid(task)
next
next() -> None
Source code in molmo_spaces/policy/learned_policy/utils.py
def next(self) -> None:
    self.current_index = (self.current_index + 1) % len(self.prompt_templates)
    self._cached_prompt = None
set_state
set_state(state)
Source code in molmo_spaces/policy/learned_policy/utils.py
def set_state(self, state):
    self.current_index = state["current_index"]
    self._cached_prompt = state["cached_prompt"]
generate_object_hash
generate_object_hash(asset_id: str) -> str
Source code in molmo_spaces/policy/learned_policy/utils.py
def generate_object_hash(asset_id: str) -> str:
    hasher = hashlib.md5()
    hasher.update(asset_id.encode())
    return hasher.hexdigest()
resize_with_pad
resize_with_pad(images, height, width)
Source code in molmo_spaces/policy/learned_policy/utils.py
def resize_with_pad(images, height, width):
    if images.shape[-3:-1] == (height, width):
        return images

    original_shape = images.shape
    images = images.reshape(-1, *original_shape[-3:])
    resized = np.stack(
        [
            _resize_with_pad_pil(Image.fromarray(im), height, width, method=Image.BILINEAR)
            for im in images
        ]
    )
    return resized.reshape(*original_shape[:-3], *resized.shape[-3:])

websocket_policy

Classes:

Name Description
WebsocketPolicy

Implements the Policy interface by communicating with a server over websocket.

Attributes:

Name Type Description
logger
logger module-attribute
logger = getLogger(__name__)
WebsocketPolicy
WebsocketPolicy(config: MlSpacesExpConfig, model_name: str, host: str = '127.0.0.1', port: int | None = None, connection_timeout: float | None = None)

Bases: InferencePolicy

Implements the Policy interface by communicating with a server over websocket.

See WebsocketPolicyServer for a corresponding server implementation.

Methods:

Name Description
add_auxiliary_objects

Add auxiliary objects to the scene that might be required for the policy.

close
get_action
get_all_phases

Returns:

get_info
get_phase

Returns:

get_server_metadata
infer
inference_model
model_output_to_action
obs_to_model_input
prepare_model
render

Not required to be implemented by subclasses.

reset

Attributes:

Name Type Description
config
current_phase
model_name
target_poses
task
task_type
Source code in molmo_spaces/policy/learned_policy/websocket_policy.py
def __init__(
    self,
    config: MlSpacesExpConfig,
    model_name: str,
    host: str = "127.0.0.1",
    port: int | None = None,
    connection_timeout: float | None = None,
):
    task_type = getattr(config, "task_type", "pick_and_place")
    super().__init__(config, task_type)
    self.model_name = model_name
    self._last_prompt: str | None = None

    if host.startswith("ws"):
        self._uri = host
    else:
        self._uri = f"ws://{host}"
    if port is not None:
        self._uri += f":{port}"
    self._ws = None
    self._server_metadata = None
    self._prepared = False
    self._connection_timeout = connection_timeout
config instance-attribute
config = config
current_phase instance-attribute
current_phase = NONE_PHASE
model_name instance-attribute
model_name = model_name
target_poses instance-attribute
target_poses = {'grasp': eye(4)}
task instance-attribute
task = task
task_type instance-attribute
task_type = task_type
add_auxiliary_objects staticmethod
add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py
@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None
close
close()
Source code in molmo_spaces/policy/learned_policy/websocket_policy.py
def close(self):
    if self._ws is not None:
        logger.info("Closing websocket connection")
        self._ws.close()
        self._ws = None
get_action
get_action(observation)
Source code in molmo_spaces/policy/base_policy.py
def get_action(self, observation):
    model_input = self.obs_to_model_input(observation)
    model_output = self.inference_model(model_input)
    action = self.model_output_to_action(model_output)
    return action
get_all_phases
get_all_phases() -> dict[str | int]

Returns:

Type Description
dict[str | int]

A dictionary of all possible policy phases

Source code in molmo_spaces/policy/base_policy.py
def get_all_phases(self) -> dict[str | int]:
    """
    Returns:
        A dictionary of all possible policy phases
    """
    return {"unknown": 0}
get_info
get_info() -> dict
Source code in molmo_spaces/policy/learned_policy/websocket_policy.py
def get_info(self) -> dict:
    info = super().get_info()
    info["policy_name"] = "websocket"
    info["policy_model_name"] = self.model_name
    info["prompt"] = self.task.get_task_description()
    return info
get_phase
get_phase() -> str

Returns:

Type Description
str

The policy phase

Source code in molmo_spaces/policy/base_policy.py
def get_phase(self) -> str:
    """
    Returns:
        The policy phase
    """
    return "unknown"
get_server_metadata
get_server_metadata() -> dict
Source code in molmo_spaces/policy/learned_policy/websocket_policy.py
def get_server_metadata(self) -> dict:
    return self._server_metadata
infer
infer(obs: dict) -> dict
Source code in molmo_spaces/policy/learned_policy/websocket_policy.py
def infer(self, obs: dict) -> dict:
    data = msgpack_numpy.packb(obs)
    self._ws.send(data)
    response = self._ws.recv(timeout=10)
    if isinstance(response, str):
        # we're expecting bytes; if the server sends a string, it's an error.
        raise RuntimeError(f"Error in inference server:\n{response}")
    return msgpack_numpy.unpackb(response)
inference_model
inference_model(model_input)
Source code in molmo_spaces/policy/learned_policy/websocket_policy.py
def inference_model(self, model_input):
    self.prepare_model()
    data = msgpack_numpy.packb(model_input)
    self._ws.send(data)
    response = self._ws.recv()
    if isinstance(response, str):
        # we're expecting bytes; if the server sends a string, it's an error.
        raise RuntimeError(f"Error in inference server:\n{response}")
    return msgpack_numpy.unpackb(response)
model_output_to_action
model_output_to_action(model_output)
Source code in molmo_spaces/policy/learned_policy/websocket_policy.py
def model_output_to_action(self, model_output):
    action = {
        "arm": model_output["arm"],
        "gripper": model_output["gripper"],
    }
    return action
obs_to_model_input
obs_to_model_input(obs)
Source code in molmo_spaces/policy/learned_policy/websocket_policy.py
def obs_to_model_input(self, obs):
    if isinstance(obs, list):
        if len(obs) > 1:
            logger.warning(
                "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n"
                "WARNING: obs list has %d elements but only using the first one!\n"
                "This may indicate a batching issue - expected single observation.\n"
                "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!",
                len(obs),
            )
        obs = obs[0]
    model_input = {**obs}
    prompt = self.task.get_task_description()

    if self._last_prompt is None:
        self._last_prompt = prompt
    model_input["task"] = prompt
    return model_input
prepare_model
prepare_model() -> None
Source code in molmo_spaces/policy/learned_policy/websocket_policy.py
def prepare_model(self) -> None:
    if not self._prepared:
        self._ws, self._server_metadata = self._wait_for_server()
        self._prepared = True
render
render(obs)

Not required to be implemented by subclasses. Defaults to noop.

Source code in molmo_spaces/policy/base_policy.py
def render(self, obs):
    """
    Not required to be implemented by subclasses.
    Defaults to noop.
    """
    pass
reset
reset() -> None
Source code in molmo_spaces/policy/learned_policy/websocket_policy.py
def reset(self) -> None:
    self.close()
    self._prepared = False
    self.prepare_model()

random_policy

Classes:

Name Description
RandomPolicy

A Random Policy that selects actions randomly.

RandomPolicy

RandomPolicy(config: MlSpacesExpConfig, task: BaseMujocoTask | None = None)

Bases: BasePolicy

A Random Policy that selects actions randomly.

Methods:

Name Description
add_auxiliary_objects

Add auxiliary objects to the scene that might be required for the policy.

get_action

Decide on a random action to take based on the action space.

get_all_phases

Returns:

get_info

Get additional information from the policy. Called after episode ended. This method can be

get_phase

Returns:

reset

Reset the policy state. No state to reset for RandomPolicy.

Attributes:

Name Type Description
action_space
config
task
type str
Source code in molmo_spaces/policy/random_policy.py
def __init__(self, config: "MlSpacesExpConfig", task: BaseMujocoTask | None = None) -> None:
    raise NotImplementedError  # TODO(snehal): please fix
    self.config = config
    self.task = task
    self.action_space = action_space
action_space instance-attribute
action_space = action_space
config instance-attribute
config = config
task instance-attribute
task = task
type property
type: str
add_auxiliary_objects staticmethod
add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py
@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None
get_action
get_action(obervation)

Decide on a random action to take based on the action space.

Parameters:

Name Type Description Default
observation

The current observation about the task or environment (not used).

required

Returns:

Type Description

A random action from the action space.

Source code in molmo_spaces/policy/random_policy.py
def get_action(self, obervation):
    """
    Decide on a random action to take based on the action space.

    Args:
        observation: The current observation about the task or environment (not used).

    Returns:
        A random action from the action space.
    """
    return self.action_space.sample()
get_all_phases
get_all_phases() -> dict[str | int]

Returns:

Type Description
dict[str | int]

A dictionary of all possible policy phases

Source code in molmo_spaces/policy/base_policy.py
def get_all_phases(self) -> dict[str | int]:
    """
    Returns:
        A dictionary of all possible policy phases
    """
    return {"unknown": 0}
get_info
get_info() -> dict

Get additional information from the policy. Called after episode ended. This method can be overridden by subclasses to provide extra information about the policy's state. Must be json serializable.

Returns:

Type Description
dict

A dictionary containing additional information about the policy.

Source code in molmo_spaces/policy/base_policy.py
def get_info(self) -> dict:
    """
    Get additional information from the policy. Called after episode ended. This method can be
    overridden by subclasses to provide extra information about the policy's state.
    Must be json serializable.

    Returns:
        A dictionary containing additional information about the policy.
    """
    return {}
get_phase
get_phase() -> str

Returns:

Type Description
str

The policy phase

Source code in molmo_spaces/policy/base_policy.py
def get_phase(self) -> str:
    """
    Returns:
        The policy phase
    """
    return "unknown"
reset
reset() -> None

Reset the policy state. No state to reset for RandomPolicy.

Source code in molmo_spaces/policy/random_policy.py
def reset(self) -> None:
    """
    Reset the policy state. No state to reset for RandomPolicy.
    """
    pass

solvers

Modules:

Name Description
curobo_planner_policy
move_solver
navigation
object_manipulation
opening_solver

curobo_planner_policy

Classes:

Name Description
CuroboPlannerPolicy

Base class for Curobo-based planner policies.

Attributes:

Name Type Description
log
log module-attribute
log = getLogger(__name__)
CuroboPlannerPolicy
CuroboPlannerPolicy(config: MlSpacesExpConfig, task: BaseMujocoTask | None = None)

Bases: PlannerPolicy

Base class for Curobo-based planner policies.

This class provides common functionality for motion planning using Curobo, including trajectory execution, coordinate frame transformations, and gripper control. Subclasses should implement task-specific planning logic.

Methods:

Name Description
add_auxiliary_objects

Add auxiliary objects to the scene that might be required for the policy.

batch_plan_trajectory

Plan trajectory using batch motion planning.

clip_to_velocity_constraint

Clip action to respect velocity constraints.

get_action

Decide on the action to take based on the current observation of the or environment.

get_all_phases

Return dictionary mapping phase names to values.

get_info

Get additional information from the policy. Called after episode ended. This method can be

get_look_at_action

Get action to look at a target.

get_phase

Return the current phase.

reset

Reset the policy state.

select_arm

Select which arm to use based on distance to pickup object.

solve_ik

Solve inverse kinematics for a target pose and create interpolated trajectory.

visualize_world_config_mesh

Visualize the world configuration as a mesh file.

Attributes:

Name Type Description
arm_end_idx int
arm_side str | None
arm_start_idx int
config
current_gripper_command dict[str, float]
is_done bool

Property to expose completion state for task checking.

planned_trajectory list[list[float]] | None
planner CuroboPlanner | None
planner_joint_ranges dict[str, tuple[int, int]]
planners dict[str, CuroboPlanner]

Return dictionary of planner instances.

profiler
retry_count int
steps_spent_in_waypoint int
task
trajectory_index int
Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py
def __init__(self, config: MlSpacesExpConfig, task: BaseMujocoTask | None = None) -> None:
    super().__init__(config, task)
    self.config = config

    # Planner instances (to be set by subclasses)
    self.planner: CuroboPlanner | None = None
    self.planner_joint_ranges: dict[str, tuple[int, int]] = {}

    # Trajectory state
    self.planned_trajectory: list[list[float]] | None = None
    self.trajectory_index: int = 0
    self.steps_spent_in_waypoint: int = 0
    self.retry_count: int = 0

    # Arm selection
    self.arm_side: str | None = None
    self.arm_start_idx: int = 0
    self.arm_end_idx: int = 0

    # Gripper state
    self.current_gripper_command: dict[str, float] = {}

    # Profiler
    self.profiler = Profiler()
arm_end_idx instance-attribute
arm_end_idx: int = 0
arm_side instance-attribute
arm_side: str | None = None
arm_start_idx instance-attribute
arm_start_idx: int = 0
config instance-attribute
config = config
current_gripper_command instance-attribute
current_gripper_command: dict[str, float] = {}
is_done abstractmethod property
is_done: bool

Property to expose completion state for task checking.

planned_trajectory instance-attribute
planned_trajectory: list[list[float]] | None = None
planner instance-attribute
planner: CuroboPlanner | None = None
planner_joint_ranges instance-attribute
planner_joint_ranges: dict[str, tuple[int, int]] = {}
planners abstractmethod property
planners: dict[str, CuroboPlanner]

Return dictionary of planner instances.

profiler instance-attribute
profiler = Profiler()
retry_count instance-attribute
retry_count: int = 0
steps_spent_in_waypoint instance-attribute
steps_spent_in_waypoint: int = 0
task instance-attribute
task = task
trajectory_index instance-attribute
trajectory_index: int = 0
add_auxiliary_objects staticmethod
add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py
@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None
batch_plan_trajectory
batch_plan_trajectory() -> None

Plan trajectory using batch motion planning.

Uses the current phase to determine goal poses and plans trajectories in batches for efficiency. Sets self.planned_trajectory to the best trajectory found.

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py
def batch_plan_trajectory(self) -> None:
    """Plan trajectory using batch motion planning.

    Uses the current phase to determine goal poses and plans trajectories
    in batches for efficiency. Sets self.planned_trajectory to the best
    trajectory found.
    """
    init_config = np.concatenate(
        [
            np.zeros(3),
            self.task.env.current_robot.robot_view.get_move_group(
                f"{self.arm_side}_arm"
            ).joint_pos,
        ]
    )

    # Setup collision avoidance if enabled
    if getattr(self.config.policy_config, "enable_collision_avoidance", False):
        if hasattr(self, "_setup_collision_avoidance_config"):
            self._setup_collision_avoidance_config()

    # Get goal poses based on current phase (to be provided by subclass)
    goal_poses = self._get_batch_goal_poses()
    if goal_poses is None or len(goal_poses) == 0:
        log.warning("[BATCH PLAN] No goal poses available")
        return

    total = goal_poses.shape[0]
    batch_size = getattr(self.config.policy_config, "batch_size", 8)
    max_batches = getattr(self.config.policy_config, "max_batch_plan_attempts", 4)
    num_poses = min(max_batches * batch_size, total)
    num_batches = (num_poses + batch_size - 1) // batch_size

    all_successful_trajectories = []
    current_phase = getattr(self, "current_phase", "unknown")

    for batch_start in range(0, num_poses, batch_size):
        batch_end = min(batch_start + batch_size, num_poses)
        batch = goal_poses[batch_start:batch_end]

        if hasattr(self, "_show_poses"):
            self._show_poses(batch, style="tcp")
        if self.task.viewer:
            self.task.viewer.sync()

        log.info(
            f"Processing batch {batch_start // batch_size + 1}/{num_batches}: "
            f"poses {batch_start}-{batch_end - 1}"
        )

        # Transform poses to base frame and convert to 7D format
        batch_base_frame_7d = []
        for pose in batch:
            pose_base = self._target_pose_to_base_frame(pose)
            batch_base_frame_7d.append(pose_base.tolist())

        # Prepare batch inputs for planner
        batch_len = len(batch_base_frame_7d)
        start_states = [init_config.tolist()] * batch_len

        # Plan batch
        log.info(
            f"Planning batch of {batch_len} {current_phase} poses with {self.arm_side} arm"
        )
        result = self.planner.plan_batch(
            start_states,
            batch_base_frame_7d,
            verbose=False,
        )

        # Check for successes
        successes = result.success.cpu().numpy()
        log.info(
            f"{self.arm_side.capitalize()} arm: {np.sum(successes)}/{batch_len} "
            f"{current_phase} planning successes"
        )

        # Process successful trajectories
        if np.any(successes):
            optimized_plan = result.optimized_plan
            position = optimized_plan.position
            if position.ndim == 2:
                position = position.unsqueeze(0)

            for i in range(batch_len):
                if not successes[i]:
                    continue

                trajectory = []
                for t in range(position.shape[1]):
                    waypoint = position[i, t].cpu().tolist()
                    trajectory.append(waypoint)

                self._transform_traj_to_world_frame(trajectory)
                all_successful_trajectories.append(trajectory)

        if all_successful_trajectories:
            log.info(
                f"Found {len(all_successful_trajectories)} successful trajectories, "
                "skipping remaining batches"
            )
            break

    if all_successful_trajectories:
        self.planned_trajectory = self._select_best_trajectory(all_successful_trajectories)
    else:
        log.warning("[BATCH PLAN] No successful trajectories found across all batches")
clip_to_velocity_constraint
clip_to_velocity_constraint(action: dict[str, Any]) -> dict[str, Any]

Clip action to respect velocity constraints.

Parameters:

Name Type Description Default
action dict[str, Any]

Dictionary of commanded joint positions by move group.

required

Returns:

Type Description
dict[str, Any]

Clipped action dictionary.

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py
def clip_to_velocity_constraint(self, action: dict[str, Any]) -> dict[str, Any]:
    """Clip action to respect velocity constraints.

    Args:
        action: Dictionary of commanded joint positions by move group.

    Returns:
        Clipped action dictionary.
    """
    velocity_constraints = getattr(self.config.policy_config, "velocity_constraints", {})
    clipped_action = action.copy()

    for move_group, commanded_action in action.items():
        if move_group in velocity_constraints:
            max_velocity_rad_per_s = velocity_constraints[move_group]
            move_group_view = self.task.env.robots[0].robot_view.get_move_group(move_group)
            move_group_joint_pos = move_group_view.joint_pos.copy()

            # Calculate difference, handling angular wraparound for base theta
            diff = commanded_action - move_group_joint_pos
            if move_group == "base":
                # Normalize theta difference to [-π, π]
                diff[2] = np.arctan2(np.sin(diff[2]), np.cos(diff[2]))

            velocity_rad_per_s = diff * 10.0

            # Clip velocity and recalculate commanded action
            clipped_velocity = np.clip(
                velocity_rad_per_s, -max_velocity_rad_per_s, max_velocity_rad_per_s
            )
            commanded_action = move_group_joint_pos + clipped_velocity / 10.0

            clipped_action[move_group] = commanded_action

    return clipped_action
get_action abstractmethod
get_action(observation)

Decide on the action to take based on the current observation of the or environment. Information could be observations, goals in the case of an rl_agent, or it could be the full environment state in the case of a planner.

Parameters:

Name Type Description Default
observation

The current information about the task or environment.

required

Returns:

Type Description

The action to take in response to the information.

Source code in molmo_spaces/policy/base_policy.py
@abstractmethod
def get_action(self, observation):
    """
    Decide on the action to take based on the current observation of the or environment.
    Information could be observations, goals in the case of an rl_agent, or it could be the full
    environment state in the case of a planner.

    Args:
        observation: The current information about the task or environment.

    Returns:
        The action to take in response to the information.
    """
    pass
get_all_phases abstractmethod
get_all_phases() -> dict[str, int]

Return dictionary mapping phase names to values.

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py
@abstractmethod
def get_all_phases(self) -> dict[str, int]:
    """Return dictionary mapping phase names to values."""
    pass
get_info
get_info() -> dict

Get additional information from the policy. Called after episode ended. This method can be overridden by subclasses to provide extra information about the policy's state. Must be json serializable.

Returns:

Type Description
dict

A dictionary containing additional information about the policy.

Source code in molmo_spaces/policy/base_policy.py
def get_info(self) -> dict:
    """
    Get additional information from the policy. Called after episode ended. This method can be
    overridden by subclasses to provide extra information about the policy's state.
    Must be json serializable.

    Returns:
        A dictionary containing additional information about the policy.
    """
    return {}
get_look_at_action
get_look_at_action() -> dict[str, Any]

Get action to look at a target.

Subclasses can override this to implement head tracking.

Returns:

Type Description
dict[str, Any]

Dictionary with head control commands, or empty dict.

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py
def get_look_at_action(self) -> dict[str, Any]:
    """Get action to look at a target.

    Subclasses can override this to implement head tracking.

    Returns:
        Dictionary with head control commands, or empty dict.
    """
    return {}
get_phase abstractmethod
get_phase() -> Any

Return the current phase.

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py
@abstractmethod
def get_phase(self) -> Any:
    """Return the current phase."""
    pass
reset
reset() -> None

Reset the policy state.

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py
def reset(self) -> None:
    """Reset the policy state."""
    self.planned_trajectory = None
    self.trajectory_index = 0
    self.steps_spent_in_waypoint = 0
    self.retry_count = 0
    self.current_gripper_command = {}
select_arm
select_arm() -> None

Select which arm to use based on distance to pickup object.

Also instantiates the motion planner for the selected arm. This lazy initialization saves ~11GB of GPU memory by only loading one arm's planner.

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py
def select_arm(self) -> None:
    """Select which arm to use based on distance to pickup object.

    Also instantiates the motion planner for the selected arm.
    This lazy initialization saves ~11GB of GPU memory by only loading one arm's planner.
    """
    task_config = self.config.task_config
    om = self.task.env.object_managers[self.task.env.current_batch_index]
    pickup_obj = om.get_object_by_name(task_config.pickup_obj_name)
    pickup_obj_pos = pickup_obj.position

    left_tcp_pose = self.task.env.current_robot.robot_view.get_move_group(
        "left_gripper"
    ).leaf_frame_to_world
    right_tcp_pose = self.task.env.current_robot.robot_view.get_move_group(
        "right_gripper"
    ).leaf_frame_to_world

    left_tcp_pos = left_tcp_pose[:3, 3]
    right_tcp_pos = right_tcp_pose[:3, 3]

    # Compute distances
    left_dist = np.linalg.norm(left_tcp_pos - pickup_obj_pos)
    right_dist = np.linalg.norm(right_tcp_pos - pickup_obj_pos)

    selected_arm = "left" if left_dist < right_dist else "right"
    log.info(
        f"Selected {selected_arm} arm (left dist: {left_dist:.3f}m, right dist: {right_dist:.3f}m)"
    )

    self.arm_side = selected_arm

    # Instantiate the planner for the selected arm only
    log.info(f"Instantiating motion planner for {selected_arm} arm")
    if selected_arm == "left":
        self.planner = CuroboPlanner(
            config=self.config.policy_config.left_curobo_planner_config
        )
        self.planner_joint_ranges = self.config.policy_config.left_planner_joint_ranges
    else:
        self.planner = CuroboPlanner(
            config=self.config.policy_config.right_curobo_planner_config
        )
        self.planner_joint_ranges = self.config.policy_config.right_planner_joint_ranges

    self.arm_start_idx = self.planner_joint_ranges[f"{self.arm_side}_arm"][0]
    self.arm_end_idx = self.planner_joint_ranges[f"{self.arm_side}_arm"][1]
solve_ik
solve_ik(target_pose: ndarray) -> None

Solve inverse kinematics for a target pose and create interpolated trajectory.

Parameters:

Name Type Description Default
target_pose ndarray

4x4 transformation matrix for target end-effector pose in world frame.

required

Raises:

Type Description
ValueError

If IK solution cannot be found.

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py
def solve_ik(self, target_pose: np.ndarray) -> None:
    """Solve inverse kinematics for a target pose and create interpolated trajectory.

    Args:
        target_pose: 4x4 transformation matrix for target end-effector pose in world frame.

    Raises:
        ValueError: If IK solution cannot be found.
    """
    init_config = np.concatenate(
        [
            np.zeros(3),
            self.task.env.current_robot.robot_view.get_move_group(
                f"{self.arm_side}_arm"
            ).joint_pos,
        ]
    )

    target_pose_7d = self._target_pose_to_base_frame(target_pose)
    joint_config, _ = self.planner.ik_solve(
        goal_pose=target_pose_7d.tolist(),
        seed_config=init_config.tolist(),
        disable_collision=True,
    )

    current_phase = getattr(self, "current_phase", "unknown")
    if joint_config is None:
        raise ValueError(f"Could not solve {current_phase} phase IK.")

    trajectory = self._interpolate_joint_trajectory(
        init_config,
        np.array(joint_config),
        num_steps=10,
    )
    self._transform_traj_to_world_frame(trajectory)
    self.planned_trajectory = trajectory
visualize_world_config_mesh
visualize_world_config_mesh(world_cfg: WorldConfig) -> None

Visualize the world configuration as a mesh file.

Parameters:

Name Type Description Default
world_cfg WorldConfig

Curobo WorldConfig to visualize.

required
Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py
def visualize_world_config_mesh(self, world_cfg: WorldConfig) -> None:
    """Visualize the world configuration as a mesh file.

    Args:
        world_cfg: Curobo WorldConfig to visualize.
    """
    from datetime import datetime

    import trimesh
    from curobo.geom.types import WorldConfig

    current_phase = getattr(self, "current_phase", "unknown")
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    file_name = f"world_config_mesh_{current_phase}_{timestamp}"
    obj_path = f"{file_name}.obj"
    initial_joint_configuration = np.array(self._get_current_joint_positions())

    try:
        # Create a combined scene with robot and world obstacles
        combined_scene = trimesh.scene.scene.Scene(base_frame="world_origin")

        # Add robot mesh if we have a joint configuration
        if initial_joint_configuration is not None and self.planner is not None:
            try:
                config_for_mesh = np.array(initial_joint_configuration).copy()
                # Zero out base joints to place robot at origin
                config_for_mesh[:3] = 0.0
                q_tensor = torch.tensor(config_for_mesh).unsqueeze(0).float().cuda()

                robot_spheres_batch = self.planner.motion_gen.kinematics.get_robot_as_spheres(
                    q_tensor
                )
                if robot_spheres_batch and len(robot_spheres_batch) > 0:
                    robot_spheres = robot_spheres_batch[0]
                    for i, sphere in enumerate(robot_spheres):
                        sphere_mesh = trimesh.creation.icosphere(radius=sphere.radius)
                        sphere_transform = np.eye(4)
                        sphere_transform[:3, 3] = sphere.pose[:3]
                        combined_scene.add_geometry(
                            sphere_mesh,
                            geom_name=f"robot_sphere_{i}",
                            parent_node_name="world_origin",
                            transform=sphere_transform,
                        )
                    log.info(
                        f"Added {len(robot_spheres)} robot collision spheres to visualization"
                    )

            except Exception as e:
                log.warning(f"Could not add robot mesh: {e}")

        # Add world obstacles to the scene
        try:
            world_scene = WorldConfig.get_scene_graph(world_cfg, process_color=True)
            for geom_name, geom in world_scene.geometry.items():
                transform = world_scene.graph.get(geom_name)[0]
                combined_scene.add_geometry(
                    geom,
                    geom_name=geom_name,
                    parent_node_name="world_origin",
                    transform=transform,
                )
        except Exception as e:
            log.warning(f"Could not add world obstacles: {e}")

        # Export combined scene
        if len(combined_scene.geometry) > 0:
            combined_scene.export(obj_path)
            log.info(f"Successfully saved combined robot + world mesh to {obj_path}")
        else:
            log.debug("Skipping mesh export - scene is empty")

    except ValueError as e:
        if "empty scene" in str(e).lower():
            log.debug("Skipping mesh export - world config is empty")
        else:
            raise

move_solver

navigation

Modules:

Name Description
astar_planner_policy
astar_planner_policy

Classes:

Name Description
AStarPlannerPolicy
AStarSmoothPlannerPolicy

Attributes:

Name Type Description
log

This planner policy relies on the computation of a sparse set of (x,y) waypoints

log module-attribute
log = getLogger(__name__)

This planner policy relies on the computation of a sparse set of (x,y) waypoints from the AStarPlanner. It then builds a navigation plan by interleaving rotation and translation phases with interpolated waypoints (either by slerp for rotation phases, or by linear interpolation for translation ones).

Note: Even though rotation phases could be ideally skipped for waypoints that are colinear (i.e., interpolated between two spatially distant planner waypoints), as the agent orientation is constant, we still allow a correction, since the controller can deviate from the ideal plan. We introduce corrections by means of two mechanisms: 1. Enforcing some intermediate waypoints between spatially distant ones, regardless of Euclidean distance (via path_interpolation_density, which can be kept low, e.g. 1) 2. Enforcing intermediate waypoint to keep consecutive ones under a limit distance (via path_max_inter_waypoint_dist, which should be kept low enough to prevent too much overshooting by the controller)

Note 2: For rotation, we limit the maximal arc length via path_max_inter_waypoint_angle, which we set by default to 10 degrees. This, combined with a fix in the holonomic base control to express the current yaw according to the intended motion direction to prevent wrapping errors, leads to smooth rotations.

Note 3: the current replanning heuristic is brittle, so we can just switch it off by e.g. setting plan_fail_after_waypoint_steps to a value larger than the task horizon.

Some TODOs
  • Use joint_pos_rel to decide upon failure to complete previous action
  • Select the nearest plannable random location near target instead of first one with valid plan
AStarPlannerPolicy
AStarPlannerPolicy(config: MlSpacesExpConfig, task: BaseMujocoTask | None = None)

Bases: PlannerPolicy

Methods:

Name Description
add_auxiliary_objects

Add auxiliary objects to the scene that might be required for the policy.

build_policy_plan
current_waypoint
get_action
get_all_phases

Returns:

get_info

Get additional information from the policy. Called after episode ended. This method can be

get_phase

Returns:

interpolate_waypoints

Interpolate waypoints between each pair of waypoints.

max_angle_waypoints
max_dist_waypoints
planners
reset
skip_candidate
stop_plan

Attributes:

Name Type Description
candidate_objs list[MlSpacesObject]
config
nav_goal_sampler NavGoalSampler
nav_plan
nav_planner
robot_view
target_object MlSpacesObject

Get the nearest navigation target object for the current batch.

target_pos_quat
task
Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py
def __init__(self, config: MlSpacesExpConfig, task: BaseMujocoTask | None = None) -> None:
    super().__init__(config, task)

    self._target_pos_quat = None
    self._nav_goal_sampler = None

    self.config.policy_config.planner_config.agent_radius = (
        self.config.task_sampler_config.robot_safety_radius
    )

    self.nav_planner = AStarPlanner(
        self.config.policy_config.planner_config, self.task.env.current_model_path
    )

    self._current_waypoint = 0
    self._reached_waypoints = 0
    self._nav_plan = None
    self._target_pos_quat = None
    self._dists_to_waypoint = []
    self._retries_left = self.config.policy_config.plan_max_retries
    self._target_object = None
    self._candidate_objs = None
    self._skipped_candidates = set()
    self._replan_after = None

    self.robot_view = task.env.current_robot.robot_view
candidate_objs property
candidate_objs: list[MlSpacesObject]
config instance-attribute
config = config
nav_goal_sampler property
nav_goal_sampler: NavGoalSampler
nav_plan property
nav_plan
nav_planner instance-attribute
nav_planner = AStarPlanner(planner_config, current_model_path)
robot_view instance-attribute
robot_view = robot_view
target_object property
target_object: MlSpacesObject

Get the nearest navigation target object for the current batch.

target_pos_quat property
target_pos_quat
task instance-attribute
task = task
add_auxiliary_objects staticmethod
add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py
@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None
build_policy_plan
build_policy_plan(world_waypoints)
Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py
def build_policy_plan(self, world_waypoints):
    world_waypoints = self.stop_plan(world_waypoints)

    # the first difference computes theta from first to second waypoint
    pos_deltas = world_waypoints[1:] - world_waypoints[:-1]
    # Here we have the thetas from waypoint i to i+1
    thetas = np.arctan2(pos_deltas[:, 1], pos_deltas[:, 0])[:, None]

    combined_waypoints = []

    # First, we orient toward the 1st waypoint from the 0-th waypoint
    start_theta = self.robot_view.get_noop_ctrl_dict(["base"])["base"][2]
    for theta in self.max_angle_waypoints(np.stack([[start_theta], thetas[0]])):
        combined_waypoints.append(np.concatenate((world_waypoints[0], theta)))

    for i in range(1, len(world_waypoints) - 1):
        # We move towards i-th waypoint from the (i-1)-th waypoint
        for waypoint in self.max_dist_waypoints(world_waypoints[i - 1 : i + 1]):
            combined_waypoints.append(np.concatenate((waypoint, thetas[i - 1])))

        # We orient towards (i+1)-th waypoint from the i-th waypoint
        for theta in self.max_angle_waypoints(thetas[i - 1 : i + 1]):
            combined_waypoints.append(np.concatenate((world_waypoints[i], theta)))

    # First arrive at final position with last movement direction
    for waypoint in self.max_dist_waypoints(world_waypoints[-2:]):
        combined_waypoints.append(np.concatenate((waypoint, thetas[-1])))

    # Then rotate to face the target
    final_pos = world_waypoints[-1]
    target_pos = self.target_object.position[:2]
    final_theta = np.arctan2(target_pos[1] - final_pos[1], target_pos[0] - final_pos[0])
    for theta in self.max_angle_waypoints(np.stack([thetas[-1], [final_theta]])):
        combined_waypoints.append(np.concatenate((final_pos, theta)))

    return np.array(combined_waypoints)
current_waypoint
current_waypoint()
Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py
def current_waypoint(self):
    if self._current_waypoint < len(self.nav_plan):
        cur_distance = self.robot_view.distance_to(
            ["base"], self.nav_plan[self._current_waypoint]
        )
        pose = self.robot_view.base.pose
        angle = R.from_matrix(pose[:3, :3]).as_euler("xyz")[2:3]
        delta = pose[:2, 3]
        pose = np.concatenate((delta, angle))

        log.debug(
            f"Steps {self.task.num_steps_taken()}"
            f" Retries left {self._retries_left}"
            f" Waypoint {self._reached_waypoints}/{len(self.nav_plan) + self._reached_waypoints - self._current_waypoint}"
            f" {np.round(self.nav_plan[self._current_waypoint], 3)}"
            f" Pose {np.round(pose, 3)}"
            f" Dist {cur_distance:3f}"
        )

        if self.robot_view.is_close_to(["base"], self.nav_plan[self._current_waypoint]):
            self._reached_waypoints += 1
            self._dists_to_waypoint = []

            if self._replan_after is None:
                self._current_waypoint += 1
            else:
                if self._replan_after <= 1:
                    self._replan_after = None
                    self._nav_plan = None
                    self._target_pos_quat = None
                    if self.nav_plan is None:
                        log.warning("Terminating due to failure to replan.")
                        return None
                else:
                    self._replan_after -= 1
                    self._current_waypoint -= 1

        elif (
            len(self._dists_to_waypoint)
            > self.config.policy_config.plan_fail_after_waypoint_steps
            and min(self._dists_to_waypoint) - cur_distance
            <= self.config.policy_config.plan_fail_max_dist_delta
        ):
            self.nav_planner.blacklist.append(self.robot_view.base.pose[:3, 3].copy())
            self.nav_planner.apply_black_list()
            if self._retries_left > 0:
                if self._replan_after is None:

                    def waypoints_until_different_location():
                        back = 0
                        while (
                            np.linalg.norm(
                                self.nav_plan[self._current_waypoint][:2]
                                - self.nav_plan[self._current_waypoint - back][:2]
                            )
                            < 0.25  # TODO make this a config param
                        ):
                            if self._current_waypoint - back > 0:
                                back += 1
                            else:
                                break

                        return back

                    self._retries_left -= 1
                    self._replan_after = waypoints_until_different_location()
                    self._dists_to_waypoint = []
                    self._current_waypoint = max(self._current_waypoint - 1, 0)
                    log.warning(
                        f"Replanning requested after returning to the previous {self._replan_after} waypoints"
                        f" due to failure to progress with distance {cur_distance:.3f}"
                        f" to waypoint with {self._retries_left} retries left."
                    )
                else:
                    log.warning(
                        f"Terminating due to failure to return to previous waypoint"
                        f" with {self._replan_after} missing return waypoints"
                    )
                    return None
            else:
                log.warning(
                    f"Terminating due to failure to progress with distance {cur_distance:.3f} to waypoint"
                    f" and no plan retries left."
                )
                return None

        else:
            self._dists_to_waypoint.append(cur_distance)

    if self._current_waypoint < len(self.nav_plan):
        return self.nav_plan[self._current_waypoint]

    return None
get_action
get_action(observation)
Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py
def get_action(self, observation):
    if self.nav_plan is None:
        # No plan possible, finish task immediately
        log.warning(
            f"[A* DONE] Planning failed - terminating episode at step {self.task.num_steps_taken()}"
            f" with {self._reached_waypoints} reached waypoints."
            f" Reason: A* could not find a valid path (see earlier PLAN FAIL logs for details)"
        )
        return self._build_done_action()

    # get next waypoint in the planned trajectory
    waypoint = self.current_waypoint()

    if waypoint is None:
        # All waypoints reached - navigation complete
        log.info(
            f"[A* DONE] Navigation complete - reached {self._reached_waypoints} waypoints"
            f" in {self.task.num_steps_taken()} steps."
        )
        return self._build_done_action()

    # Still navigating - return action to reach next waypoint
    return self._build_navigation_action(waypoint)
get_all_phases
get_all_phases() -> dict[str | int]

Returns:

Type Description
dict[str | int]

A dictionary of all possible policy phases

Source code in molmo_spaces/policy/base_policy.py
def get_all_phases(self) -> dict[str | int]:
    """
    Returns:
        A dictionary of all possible policy phases
    """
    return {"unknown": 0}
get_info
get_info() -> dict

Get additional information from the policy. Called after episode ended. This method can be overridden by subclasses to provide extra information about the policy's state. Must be json serializable.

Returns:

Type Description
dict

A dictionary containing additional information about the policy.

Source code in molmo_spaces/policy/base_policy.py
def get_info(self) -> dict:
    """
    Get additional information from the policy. Called after episode ended. This method can be
    overridden by subclasses to provide extra information about the policy's state.
    Must be json serializable.

    Returns:
        A dictionary containing additional information about the policy.
    """
    return {}
get_phase
get_phase() -> str

Returns:

Type Description
str

The policy phase

Source code in molmo_spaces/policy/base_policy.py
def get_phase(self) -> str:
    """
    Returns:
        The policy phase
    """
    return "unknown"
interpolate_waypoints
interpolate_waypoints(waypoints: ndarray) -> ndarray

Interpolate waypoints between each pair of waypoints.

Parameters:

Name Type Description Default
waypoints ndarray

original waypoints array, shape (N, 2)

required

Returns:

Type Description
ndarray

interpolated waypoints array

Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py
def interpolate_waypoints(self, waypoints: np.ndarray) -> np.ndarray:
    """
    Interpolate waypoints between each pair of waypoints.

    Args:
        waypoints: original waypoints array, shape (N, 2)

    Returns:
        interpolated waypoints array
    """
    density = self.config.policy_config.path_interpolation_density
    if density <= 0 or waypoints is None or len(waypoints) <= 1:
        return waypoints

    # Use np.linspace for faster vectorized interpolation
    segments = []
    for i in range(len(waypoints) - 1):
        # Create density+2 points from waypoints[i] to waypoints[i+1], excluding endpoint
        t = np.linspace(0, 1, density + 2, endpoint=False)[1:]  # exclude start point
        segment = waypoints[i] + t[:, np.newaxis] * (waypoints[i + 1] - waypoints[i])
        segments.append(segment)
    segments.append(waypoints[-1:])  # add final waypoint

    return np.vstack([waypoints[0:1]] + segments)
max_angle_waypoints
max_angle_waypoints(angles: ndarray) -> ndarray
Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py
def max_angle_waypoints(self, angles: np.ndarray) -> np.ndarray:
    assert angles.shape == (2, 1)

    angle = float(abs(normalize_ang_error(angles[1] - angles[0])))
    num_points = int(np.ceil(angle / self.config.policy_config.path_max_inter_waypoint_angle))
    if num_points <= 1:
        # Enofrce always at least one orientation correction
        return angles[1:]

    steps = np.linspace(0, 1, num_points + 1)[1:]
    r0 = R.from_euler("z", angles[0], degrees=False)
    r1 = R.from_euler("z", angles[1], degrees=False)
    rots = Slerp([0, 1], R.concatenate([r0, r1]))(steps)
    new_angles = rots.as_euler("xyz", degrees=False)[:, 2:]

    return new_angles
max_dist_waypoints
max_dist_waypoints(waypoints: ndarray) -> ndarray
Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py
def max_dist_waypoints(self, waypoints: np.ndarray) -> np.ndarray:
    assert waypoints.shape == (2, 2)

    direction = waypoints[-1] - waypoints[0]

    dist = np.linalg.norm(direction)
    num_points = int(np.ceil(dist / self.config.policy_config.path_max_inter_waypoint_dist))
    if num_points <= 1:
        return waypoints[1:]

    stops = np.linspace(0, 1, num_points + 1)[1:]
    return waypoints[:1] + direction[None, :] * stops[:, None]
planners
planners()
Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py
def planners(self):
    return self.nav_planner
reset
reset()
Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py
def reset(self):
    self._current_waypoint = 0
    self._reached_waypoints = 0
    self._nav_plan = None
    self._target_pos_quat = None
    self._dists_to_waypoint = []
    self._retries_left = self.config.policy_config.plan_max_retries
    self._target_object = None
    self._candidate_objs = None
    self._skipped_candidates = set()
    self._replan_after = None
    self.nav_planner.blacklist.clear()
skip_candidate
skip_candidate(obj_name)
Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py
def skip_candidate(self, obj_name):
    self._skipped_candidates.add(obj_name)
stop_plan
stop_plan(waypoints: ndarray) -> ndarray
Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py
def stop_plan(self, waypoints: np.ndarray) -> np.ndarray:
    r = self.config.policy_config.path_min_dist_to_target_center

    if r == 0.0:
        return waypoints

    cc = self.target_object.position[:2]

    # 1. find first waypoint entering circle and not leaving again
    last_out = None
    for i in reversed(range(len(waypoints))):
        if np.linalg.norm(waypoints[i] - cc) > r:
            last_out = i
            break
    else:
        # all waypoints were under r, so use the first two waypoints only
        return waypoints[:2]

    # if all waypoints are further than r, keep them all
    if last_out == len(waypoints) - 1:
        return waypoints

    # If not, intersect circumference around object center and last segment
    # (x-c)^T(x-c) = r^2
    # with x = s + alpha d
    # resulting in alpha^2 * (d^Td) + alpha * [2 d^T(s-c)] +[(s-c)^T(s-c) - r^2] = 0
    # for convenience, we make d a unitary direction

    segment = waypoints[last_out : last_out + 2]
    s = segment[0]

    d = segment[1] - s
    d /= np.linalg.norm(d)  # so a == 1 in the 2nd order equation

    sc = s - cc
    b = 2 * d @ sc
    c = np.linalg.norm(sc) ** 2 - r**2

    # Discriminant should always be positive, as the two points differ
    # in their inclusion in the circle with given radius
    # (we enforce it's at least non-negative)
    disc = max(b**2 - 4 * c, 0)

    # We keep the smallest (entering) solution (minus sign)
    # the relative displacement from the last waypoint outside along the
    # direction to the first one inside needs to be positive
    # (we enforce it's at least non-negative)
    alpha = max((-b - np.sqrt(disc)) / 2, 0)

    intersection = s + alpha * d
    return np.concatenate([waypoints[: last_out + 1], intersection[None, :]])
AStarSmoothPlannerPolicy
AStarSmoothPlannerPolicy(config: MlSpacesExpConfig, task: BaseMujocoTask | None = None)

Bases: AStarPlannerPolicy

Methods:

Name Description
add_auxiliary_objects

Add auxiliary objects to the scene that might be required for the policy.

build_policy_plan
current_waypoint
get_action
get_all_phases

Returns:

get_info

Get additional information from the policy. Called after episode ended. This method can be

get_phase

Returns:

interpolate_waypoints

Interpolate waypoints between each pair of waypoints.

max_angle_waypoints
max_dist_waypoints
planners
reset
skip_candidate
stop_plan

Attributes:

Name Type Description
candidate_objs list[MlSpacesObject]
config
nav_goal_sampler NavGoalSampler
nav_plan
nav_planner
robot_view
target_object MlSpacesObject

Get the nearest navigation target object for the current batch.

target_pos_quat
task
Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py
def __init__(self, config: MlSpacesExpConfig, task: BaseMujocoTask | None = None) -> None:
    super().__init__(config, task)

    self._target_pos_quat = None
    self._nav_goal_sampler = None

    self.config.policy_config.planner_config.agent_radius = (
        self.config.task_sampler_config.robot_safety_radius
    )

    self.nav_planner = AStarPlanner(
        self.config.policy_config.planner_config, self.task.env.current_model_path
    )

    self._current_waypoint = 0
    self._reached_waypoints = 0
    self._nav_plan = None
    self._target_pos_quat = None
    self._dists_to_waypoint = []
    self._retries_left = self.config.policy_config.plan_max_retries
    self._target_object = None
    self._candidate_objs = None
    self._skipped_candidates = set()
    self._replan_after = None

    self.robot_view = task.env.current_robot.robot_view
candidate_objs property
candidate_objs: list[MlSpacesObject]
config instance-attribute
config = config
nav_goal_sampler property
nav_goal_sampler: NavGoalSampler
nav_plan property
nav_plan
nav_planner instance-attribute
nav_planner = AStarPlanner(planner_config, current_model_path)
robot_view instance-attribute
robot_view = robot_view
target_object property
target_object: MlSpacesObject

Get the nearest navigation target object for the current batch.

target_pos_quat property
target_pos_quat
task instance-attribute
task = task
add_auxiliary_objects staticmethod
add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py
@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None
build_policy_plan
build_policy_plan(world_waypoints)
Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py
def build_policy_plan(self, world_waypoints):
    world_waypoints = self.stop_plan(world_waypoints)

    plan_length = sum(
        np.linalg.norm(world_waypoints[it] - world_waypoints[it - 1])
        for it in range(1, len(world_waypoints))
    )
    num_points = 2 * int(
        np.ceil(plan_length / self.config.policy_config.path_max_inter_waypoint_dist)
    )

    tck, u = splprep(world_waypoints.transpose(), s=1e-5)
    u_new = np.linspace(0, 1, num_points)
    x_new, y_new = splev(u_new, tck)

    # First derivatives
    dx_du, dy_du = splev(u_new, tck, der=1)
    # Tangent angle (radians)
    thetas = np.arctan2(dy_du, dx_du)
    # TODO handle large theta deltas

    combined_waypoints = []

    # First, we orient toward the 1st waypoint from the 0-th waypoint
    start_theta = self.robot_view.get_noop_ctrl_dict(["base"])["base"][2]
    for theta in self.max_angle_waypoints(np.stack([start_theta, thetas[0]])[:, None]):
        combined_waypoints.append(np.concatenate((world_waypoints[0], theta)))

    for cur_x, cur_y, cur_theta in zip(x_new, y_new, thetas):
        combined_waypoints.append(np.stack((cur_x, cur_y, cur_theta)))

    # Then rotate to face the target
    final_pos = world_waypoints[-1]
    target_pos = self.target_object.position[:2]
    final_theta = np.arctan2(target_pos[1] - final_pos[1], target_pos[0] - final_pos[0])
    for theta in self.max_angle_waypoints(np.stack([thetas[-1], final_theta])[:, None]):
        combined_waypoints.append(np.concatenate((final_pos, theta)))

    return np.array(combined_waypoints)
current_waypoint
current_waypoint()
Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py
def current_waypoint(self):
    if self._current_waypoint < len(self.nav_plan):
        cur_distance = self.robot_view.distance_to(
            ["base"], self.nav_plan[self._current_waypoint]
        )
        pose = self.robot_view.base.pose
        angle = R.from_matrix(pose[:3, :3]).as_euler("xyz")[2:3]
        delta = pose[:2, 3]
        pose = np.concatenate((delta, angle))

        log.debug(
            f"Steps {self.task.num_steps_taken()}"
            f" Retries left {self._retries_left}"
            f" Waypoint {self._reached_waypoints}/{len(self.nav_plan) + self._reached_waypoints - self._current_waypoint}"
            f" {np.round(self.nav_plan[self._current_waypoint], 3)}"
            f" Pose {np.round(pose, 3)}"
            f" Dist {cur_distance:3f}"
        )

        if self.robot_view.is_close_to(["base"], self.nav_plan[self._current_waypoint]):
            self._reached_waypoints += 1
            self._dists_to_waypoint = []

            if self._replan_after is None:
                self._current_waypoint += 1
            else:
                if self._replan_after <= 1:
                    self._replan_after = None
                    self._nav_plan = None
                    self._target_pos_quat = None
                    if self.nav_plan is None:
                        log.warning("Terminating due to failure to replan.")
                        return None
                else:
                    self._replan_after -= 1
                    self._current_waypoint -= 1

        elif (
            len(self._dists_to_waypoint)
            > self.config.policy_config.plan_fail_after_waypoint_steps
            and min(self._dists_to_waypoint) - cur_distance
            <= self.config.policy_config.plan_fail_max_dist_delta
        ):
            self.nav_planner.blacklist.append(self.robot_view.base.pose[:3, 3].copy())
            self.nav_planner.apply_black_list()
            if self._retries_left > 0:
                if self._replan_after is None:

                    def waypoints_until_different_location():
                        back = 0
                        while (
                            np.linalg.norm(
                                self.nav_plan[self._current_waypoint][:2]
                                - self.nav_plan[self._current_waypoint - back][:2]
                            )
                            < 0.25  # TODO make this a config param
                        ):
                            if self._current_waypoint - back > 0:
                                back += 1
                            else:
                                break

                        return back

                    self._retries_left -= 1
                    self._replan_after = waypoints_until_different_location()
                    self._dists_to_waypoint = []
                    self._current_waypoint = max(self._current_waypoint - 1, 0)
                    log.warning(
                        f"Replanning requested after returning to the previous {self._replan_after} waypoints"
                        f" due to failure to progress with distance {cur_distance:.3f}"
                        f" to waypoint with {self._retries_left} retries left."
                    )
                else:
                    log.warning(
                        f"Terminating due to failure to return to previous waypoint"
                        f" with {self._replan_after} missing return waypoints"
                    )
                    return None
            else:
                log.warning(
                    f"Terminating due to failure to progress with distance {cur_distance:.3f} to waypoint"
                    f" and no plan retries left."
                )
                return None

        else:
            self._dists_to_waypoint.append(cur_distance)

    if self._current_waypoint < len(self.nav_plan):
        return self.nav_plan[self._current_waypoint]

    return None
get_action
get_action(observation)
Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py
def get_action(self, observation):
    if self.nav_plan is None:
        # No plan possible, finish task immediately
        log.warning(
            f"[A* DONE] Planning failed - terminating episode at step {self.task.num_steps_taken()}"
            f" with {self._reached_waypoints} reached waypoints."
            f" Reason: A* could not find a valid path (see earlier PLAN FAIL logs for details)"
        )
        return self._build_done_action()

    # get next waypoint in the planned trajectory
    waypoint = self.current_waypoint()

    if waypoint is None:
        # All waypoints reached - navigation complete
        log.info(
            f"[A* DONE] Navigation complete - reached {self._reached_waypoints} waypoints"
            f" in {self.task.num_steps_taken()} steps."
        )
        return self._build_done_action()

    # Still navigating - return action to reach next waypoint
    return self._build_navigation_action(waypoint)
get_all_phases
get_all_phases() -> dict[str | int]

Returns:

Type Description
dict[str | int]

A dictionary of all possible policy phases

Source code in molmo_spaces/policy/base_policy.py
def get_all_phases(self) -> dict[str | int]:
    """
    Returns:
        A dictionary of all possible policy phases
    """
    return {"unknown": 0}
get_info
get_info() -> dict

Get additional information from the policy. Called after episode ended. This method can be overridden by subclasses to provide extra information about the policy's state. Must be json serializable.

Returns:

Type Description
dict

A dictionary containing additional information about the policy.

Source code in molmo_spaces/policy/base_policy.py
def get_info(self) -> dict:
    """
    Get additional information from the policy. Called after episode ended. This method can be
    overridden by subclasses to provide extra information about the policy's state.
    Must be json serializable.

    Returns:
        A dictionary containing additional information about the policy.
    """
    return {}
get_phase
get_phase() -> str

Returns:

Type Description
str

The policy phase

Source code in molmo_spaces/policy/base_policy.py
def get_phase(self) -> str:
    """
    Returns:
        The policy phase
    """
    return "unknown"
interpolate_waypoints
interpolate_waypoints(waypoints: ndarray) -> ndarray

Interpolate waypoints between each pair of waypoints.

Parameters:

Name Type Description Default
waypoints ndarray

original waypoints array, shape (N, 2)

required

Returns:

Type Description
ndarray

interpolated waypoints array

Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py
def interpolate_waypoints(self, waypoints: np.ndarray) -> np.ndarray:
    """
    Interpolate waypoints between each pair of waypoints.

    Args:
        waypoints: original waypoints array, shape (N, 2)

    Returns:
        interpolated waypoints array
    """
    density = self.config.policy_config.path_interpolation_density
    if density <= 0 or waypoints is None or len(waypoints) <= 1:
        return waypoints

    # Use np.linspace for faster vectorized interpolation
    segments = []
    for i in range(len(waypoints) - 1):
        # Create density+2 points from waypoints[i] to waypoints[i+1], excluding endpoint
        t = np.linspace(0, 1, density + 2, endpoint=False)[1:]  # exclude start point
        segment = waypoints[i] + t[:, np.newaxis] * (waypoints[i + 1] - waypoints[i])
        segments.append(segment)
    segments.append(waypoints[-1:])  # add final waypoint

    return np.vstack([waypoints[0:1]] + segments)
max_angle_waypoints
max_angle_waypoints(angles: ndarray) -> ndarray
Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py
def max_angle_waypoints(self, angles: np.ndarray) -> np.ndarray:
    assert angles.shape == (2, 1)

    angle = float(abs(normalize_ang_error(angles[1] - angles[0])))
    num_points = int(np.ceil(angle / self.config.policy_config.path_max_inter_waypoint_angle))
    if num_points <= 1:
        # Enofrce always at least one orientation correction
        return angles[1:]

    steps = np.linspace(0, 1, num_points + 1)[1:]
    r0 = R.from_euler("z", angles[0], degrees=False)
    r1 = R.from_euler("z", angles[1], degrees=False)
    rots = Slerp([0, 1], R.concatenate([r0, r1]))(steps)
    new_angles = rots.as_euler("xyz", degrees=False)[:, 2:]

    return new_angles
max_dist_waypoints
max_dist_waypoints(waypoints: ndarray) -> ndarray
Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py
def max_dist_waypoints(self, waypoints: np.ndarray) -> np.ndarray:
    assert waypoints.shape == (2, 2)

    direction = waypoints[-1] - waypoints[0]

    dist = np.linalg.norm(direction)
    num_points = int(np.ceil(dist / self.config.policy_config.path_max_inter_waypoint_dist))
    if num_points <= 1:
        return waypoints[1:]

    stops = np.linspace(0, 1, num_points + 1)[1:]
    return waypoints[:1] + direction[None, :] * stops[:, None]
planners
planners()
Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py
def planners(self):
    return self.nav_planner
reset
reset()
Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py
def reset(self):
    self._current_waypoint = 0
    self._reached_waypoints = 0
    self._nav_plan = None
    self._target_pos_quat = None
    self._dists_to_waypoint = []
    self._retries_left = self.config.policy_config.plan_max_retries
    self._target_object = None
    self._candidate_objs = None
    self._skipped_candidates = set()
    self._replan_after = None
    self.nav_planner.blacklist.clear()
skip_candidate
skip_candidate(obj_name)
Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py
def skip_candidate(self, obj_name):
    self._skipped_candidates.add(obj_name)
stop_plan
stop_plan(waypoints: ndarray) -> ndarray
Source code in molmo_spaces/policy/solvers/navigation/astar_planner_policy.py
def stop_plan(self, waypoints: np.ndarray) -> np.ndarray:
    r = self.config.policy_config.path_min_dist_to_target_center

    if r == 0.0:
        return waypoints

    cc = self.target_object.position[:2]

    # 1. find first waypoint entering circle and not leaving again
    last_out = None
    for i in reversed(range(len(waypoints))):
        if np.linalg.norm(waypoints[i] - cc) > r:
            last_out = i
            break
    else:
        # all waypoints were under r, so use the first two waypoints only
        return waypoints[:2]

    # if all waypoints are further than r, keep them all
    if last_out == len(waypoints) - 1:
        return waypoints

    # If not, intersect circumference around object center and last segment
    # (x-c)^T(x-c) = r^2
    # with x = s + alpha d
    # resulting in alpha^2 * (d^Td) + alpha * [2 d^T(s-c)] +[(s-c)^T(s-c) - r^2] = 0
    # for convenience, we make d a unitary direction

    segment = waypoints[last_out : last_out + 2]
    s = segment[0]

    d = segment[1] - s
    d /= np.linalg.norm(d)  # so a == 1 in the 2nd order equation

    sc = s - cc
    b = 2 * d @ sc
    c = np.linalg.norm(sc) ** 2 - r**2

    # Discriminant should always be positive, as the two points differ
    # in their inclusion in the circle with given radius
    # (we enforce it's at least non-negative)
    disc = max(b**2 - 4 * c, 0)

    # We keep the smallest (entering) solution (minus sign)
    # the relative displacement from the last waypoint outside along the
    # direction to the first one inside needs to be positive
    # (we enforce it's at least non-negative)
    alpha = max((-b - np.sqrt(disc)) / 2, 0)

    intersection = s + alpha * d
    return np.concatenate([waypoints[: last_out + 1], intersection[None, :]])

object_manipulation

Modules:

Name Description
base_object_manipulation_planner_policy
curobo_open_close_planner_policy
curobo_pick_and_place_planner_policy
open_close_planner_policy
pick_and_place_color_planner_policy
pick_and_place_next_to_planner_policy
pick_and_place_planner_policy
pick_planner_policy
base_object_manipulation_planner_policy

Classes:

Name Description
ActionPrimitive
BaseObjectManipulationPlannerPolicy
GripperAction
JointMoveSegment
JointMoveSequence
MoveSegment
MoveSequence
NoopAction
TCPMoveSegment
TCPMoveSequence

Attributes:

Name Type Description
log
log module-attribute
log = getLogger(__name__)
ActionPrimitive
ActionPrimitive(robot_view: RobotView, duration: float)

Bases: ABC

Methods:

Name Description
check_failure
elapsed_time
execute
get_current_action
get_current_phase
reset

Attributes:

Name Type Description
duration
robot_view
start_time
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def __init__(self, robot_view: RobotView, duration: float):
    self.robot_view = robot_view
    self.duration = duration
    self.start_time = None
duration instance-attribute
duration = duration
robot_view instance-attribute
robot_view = robot_view
start_time instance-attribute
start_time = None
check_failure
check_failure() -> bool
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def check_failure(self) -> bool:
    return False
elapsed_time
elapsed_time() -> float
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def elapsed_time(self) -> float:
    return self.robot_view.mj_data.time - self.start_time
execute abstractmethod
execute() -> bool
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
@abstractmethod
def execute(self) -> bool:
    pass
get_current_action abstractmethod
get_current_action() -> dict[str, Any]
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
@abstractmethod
def get_current_action(self) -> dict[str, Any]:
    raise NotImplementedError
get_current_phase
get_current_phase() -> str
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def get_current_phase(self) -> str:
    return "unknown"
reset
reset()
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def reset(self):
    self.start_time = None
BaseObjectManipulationPlannerPolicy
BaseObjectManipulationPlannerPolicy(config: MlSpacesExpConfig, task: BaseMujocoTask)

Bases: PlannerPolicy

Methods:

Name Description
add_auxiliary_objects
check_feasible_ik
get_action
get_all_phases
get_info

Get additional information from the policy. Called after episode ended. This method can be

get_phase
reset

Attributes:

Name Type Description
action_idx
action_primitives
config
ik_warmed_up
planners
policy_config
retry_count
robot_view
sequential_ik_failures
target_poses
task
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def __init__(self, config: MlSpacesExpConfig, task: BaseMujocoTask) -> None:
    assert isinstance(config.policy_config, ObjectManipulationPlannerPolicyConfig)
    super().__init__(config, task)
    self.policy_config = config.policy_config
    self.robot_view = task.env.current_robot.robot_view

    self.action_primitives = []
    self.action_idx = 0
    self.retry_count = 0
    self.target_poses = {}
    self.ik_warmed_up = False
    self.sequential_ik_failures = 0
action_idx instance-attribute
action_idx = 0
action_primitives instance-attribute
action_primitives = []
config instance-attribute
config = config
ik_warmed_up instance-attribute
ik_warmed_up = False
planners property
planners
policy_config instance-attribute
policy_config = policy_config
retry_count instance-attribute
retry_count = 0
robot_view instance-attribute
robot_view = robot_view
sequential_ik_failures instance-attribute
sequential_ik_failures = 0
target_poses instance-attribute
target_poses = {}
task instance-attribute
task = task
add_auxiliary_objects staticmethod
add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
@staticmethod
def add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None:
    PlannerPolicy.add_auxiliary_objects(config, spec)
    policy_config = config.policy_config
    assert isinstance(policy_config, ObjectManipulationPlannerPolicyConfig)
    if policy_config.filter_colliding_grasps:
        add_grasp_collision_bodies(
            spec,
            policy_config.grasp_collision_batch_size,
            policy_config.grasp_width,
            policy_config.grasp_length,
            policy_config.grasp_height,
            np.array(policy_config.grasp_base_pos),
        )
check_feasible_ik
check_feasible_ik(pose: ndarray) -> bool
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def check_feasible_ik(self, pose: np.ndarray) -> bool:
    if pose.ndim > 2:
        assert pose.shape[1:] == (4, 4)
        batch_size = pose.shape[0]
        assert batch_size <= self.policy_config.grasp_feasibility_batch_size

        if batch_size == 0:
            return np.array([], dtype=bool)

        # pad to the batch size to avoid triggering recompilation of the IK solver
        if batch_size < self.policy_config.grasp_feasibility_batch_size:
            n_pad = self.policy_config.grasp_feasibility_batch_size - batch_size
            pose = np.concatenate([pose, np.broadcast_to(pose[-1:], (n_pad, 4, 4))])

        robot_view = self.task.env.current_robot.robot_view
        parallel_kinematics = self.task._env.robots[0].parallel_kinematics
        jp_dicts = parallel_kinematics.ik(
            pose,
            robot_view.get_qpos_dict(),
            robot_view.base.pose,
            rel_to_base=False,
            posture_weight=0.0,
        )
        return np.array([jp_dict is not None for jp_dict in jp_dicts[:batch_size]])
    else:
        assert pose.shape == (4, 4)
        robot_view = self.task.env.current_robot.robot_view
        kinematics = self.task.env.current_robot.kinematics
        gripper_mg_id = robot_view.get_gripper_movegroup_ids()[0]
        jp_dict = kinematics.ik(
            gripper_mg_id,
            pose,
            robot_view.move_group_ids(),
            robot_view.get_qpos_dict(),
            base_pose=robot_view.base.pose,
        )
        return jp_dict is not None
get_action
get_action(info: dict[str, Any]) -> dict[str, Any]
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def get_action(self, info: dict[str, Any]) -> dict[str, Any]:
    if self._check_for_failures():
        return self._handle_failure()

    # get the action from the current action primitive, absorbing 0-length segments as necessary
    while self.action_idx < len(self.action_primitives):  # guaranteed to terminate
        act_prim = self.action_primitives[self.action_idx]
        proceed = act_prim.execute()
        # cannot have a 0-length segment that doesn't proceed
        assert proceed or act_prim.duration > 0.0
        if proceed:
            self.action_idx += 1
        if not proceed or act_prim.duration > 0.0:
            break
    if self.action_idx < len(self.action_primitives):
        action = act_prim.get_current_action()
    else:
        action = self.action_primitives[-1].get_current_action()
        action["done"] = True
    return action
get_all_phases
get_all_phases()
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def get_all_phases(self):
    phases = super().get_all_phases()
    # Collect all possible phases here, even those not used for a particular trajectory computation
    new_phases = {
        "gripper-open": 1,
        "pregrasp": 2,
        "grasp": 3,
        "gripper-close": 4,
        "lift": 5,
        "preplace": 6,
        "place": 7,
        "retreat": 8,
        "go_home": 9,
    }
    phases.update(new_phases)
    return phases
get_info
get_info() -> dict

Get additional information from the policy. Called after episode ended. This method can be overridden by subclasses to provide extra information about the policy's state. Must be json serializable.

Returns:

Type Description
dict

A dictionary containing additional information about the policy.

Source code in molmo_spaces/policy/base_policy.py
def get_info(self) -> dict:
    """
    Get additional information from the policy. Called after episode ended. This method can be
    overridden by subclasses to provide extra information about the policy's state.
    Must be json serializable.

    Returns:
        A dictionary containing additional information about the policy.
    """
    return {}
get_phase
get_phase() -> str
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def get_phase(self) -> str:
    if self.action_idx < len(self.action_primitives):
        act_prim = self.action_primitives[self.action_idx]
    else:
        act_prim = self.action_primitives[-1]
    return act_prim.get_current_phase()
reset
reset(reset_retries: bool = True)
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def reset(self, reset_retries: bool = True):
    if not self.ik_warmed_up:
        with Timer() as warmup_time:
            self.task.env.current_robot.parallel_kinematics.warmup_ik(
                self.policy_config.grasp_feasibility_batch_size
            )
        self.ik_warmed_up = True
        log.info(f"Warmed up parallel IK solver in {warmup_time.value:.3f}s")

    self.action_primitives = self._compute_trajectory()

    self.action_idx = 0
    if reset_retries:
        self.retry_count = 0

    self.target_poses = {}
    for action_primitive in self.action_primitives:
        if isinstance(action_primitive, TCPMoveSequence):
            for move_segment in action_primitive.move_segments:
                self.target_poses[move_segment.name] = move_segment.end_pose
        elif isinstance(action_primitive, JointMoveSequence):
            gripper_mg_id = self.robot_view.get_gripper_movegroup_ids()[0]
            kinematics = self.task.env.current_robot.kinematics
            for move_segment in action_primitive.move_segments:
                end_qpos = move_segment.end_qpos
                base_pose = self.robot_view.base.pose
                self.target_poses[move_segment.name] = kinematics.fk(end_qpos, base_pose)[
                    gripper_mg_id
                ]
GripperAction
GripperAction(robot_view: RobotView, target_open: bool, duration: float)

Bases: ActionPrimitive

Methods:

Name Description
check_failure
elapsed_time
execute
get_current_action
get_current_phase
reset

Attributes:

Name Type Description
duration
robot_view
start_time
target_open
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def __init__(self, robot_view: RobotView, target_open: bool, duration: float) -> None:
    super().__init__(robot_view, duration)
    self.target_open = target_open
duration instance-attribute
duration = duration
robot_view instance-attribute
robot_view = robot_view
start_time instance-attribute
start_time = None
target_open instance-attribute
target_open = target_open
check_failure
check_failure() -> bool
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def check_failure(self) -> bool:
    return False
elapsed_time
elapsed_time() -> float
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def elapsed_time(self) -> float:
    return self.robot_view.mj_data.time - self.start_time
execute
execute() -> bool
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def execute(self) -> bool:
    if self.start_time is None:
        self.start_time = self.robot_view.mj_data.time
        if self.target_open:
            log.info("Opening gripper...")
        else:
            log.info("Closing gripper...")

        mg_id = self.robot_view.get_gripper_movegroup_ids()[0]
        gripper = self.robot_view.get_gripper(mg_id)
        gripper.set_gripper_ctrl_open(self.target_open)

    return self.elapsed_time() >= self.duration
get_current_action
get_current_action() -> dict[str, Any]
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def get_current_action(self) -> dict[str, Any]:
    return self.robot_view.get_ctrl_dict()
get_current_phase
get_current_phase()
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def get_current_phase(self):
    if self.target_open:
        return "gripper-open"
    else:
        return "gripper-close"
reset
reset()
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def reset(self):
    self.start_time = None
JointMoveSegment dataclass
JointMoveSegment(name: str, start_qpos: dict[str, ndarray | list[float]] | None, end_qpos: dict[str, ndarray | list[float]], duration_s: float)

Bases: MoveSegment

Attributes:

Name Type Description
duration float
duration_s float
end_qpos dict[str, ndarray | list[float]]
name str
start_qpos dict[str, ndarray | list[float]] | None
duration property
duration: float
duration_s instance-attribute
duration_s: float
end_qpos instance-attribute
end_qpos: dict[str, ndarray | list[float]]
name instance-attribute
name: str
start_qpos instance-attribute
start_qpos: dict[str, ndarray | list[float]] | None
JointMoveSequence
JointMoveSequence(robot_view: RobotView, settle_time: float, move_segments: list[JointMoveSegment], is_holding_object: bool = False, gripper_empty_threshold: float = 0.0)

Bases: MoveSequence

Methods:

Name Description
check_failure
elapsed_time
execute
get_current_action
get_current_phase
reset

Attributes:

Name Type Description
duration
gripper_empty_threshold
is_holding_object
move_seg_idx
move_seg_start_time
move_segments list[JointMoveSegment]
robot_view
settle_time
start_time
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def __init__(
    self,
    robot_view: RobotView,
    settle_time: float,
    move_segments: list[JointMoveSegment],
    is_holding_object: bool = False,
    gripper_empty_threshold: float = 0.0,
) -> None:
    super().__init__(
        robot_view,
        settle_time,
        move_segments,
        is_holding_object,
        gripper_empty_threshold,
    )
duration instance-attribute
duration = duration
gripper_empty_threshold instance-attribute
gripper_empty_threshold = gripper_empty_threshold
is_holding_object instance-attribute
is_holding_object = is_holding_object
move_seg_idx instance-attribute
move_seg_idx = None
move_seg_start_time instance-attribute
move_seg_start_time = None
move_segments property
move_segments: list[JointMoveSegment]
robot_view instance-attribute
robot_view = robot_view
settle_time instance-attribute
settle_time = settle_time
start_time instance-attribute
start_time = None
check_failure
check_failure() -> bool
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def check_failure(self) -> bool:
    if super().check_failure():
        return True

    if self.is_holding_object:
        gripper_mg_id = self.robot_view.get_gripper_movegroup_ids()[0]
        gripper = self.robot_view.get_gripper(gripper_mg_id)
        if (
            gripper.inter_finger_dist
            < gripper.inter_finger_dist_range[0] + self.gripper_empty_threshold
        ):
            log.info(
                f"Object is not in grasp! {gripper.inter_finger_dist:.05f} < {gripper.inter_finger_dist_range[0] + self.gripper_empty_threshold:05f}"
            )
            return True

    return False
elapsed_time
elapsed_time() -> float
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def elapsed_time(self) -> float:
    return self.robot_view.mj_data.time - self.start_time
execute
execute() -> bool
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def execute(self) -> bool:
    ret = super().execute()

    assert self.move_seg_idx is not None
    if self.move_seg_idx < len(self.move_segments):
        move_seg = self.move_segments[self.move_seg_idx]
        if move_seg.start_qpos is None:
            move_seg.start_qpos = self.robot_view.get_qpos_dict()

    return ret
get_current_action
get_current_action() -> dict[str, Any]
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def get_current_action(self) -> dict[str, Any]:
    elapsed_time = self.robot_view.mj_data.time - self.move_seg_start_time
    assert self.move_seg_idx is not None
    if self.move_seg_idx < len(self.move_segments):
        move_seg = self.move_segments[self.move_seg_idx]
        t = min(1.0, elapsed_time / move_seg.duration)

        curr_target_qpos = {**move_seg.start_qpos}
        for k in move_seg.end_qpos:
            q0 = np.asarray(move_seg.start_qpos[k])
            q1 = np.asarray(move_seg.end_qpos[k])
            curr_target_qpos[k] = q0 + (q1 - q0) * t
    else:
        move_seg = self.move_segments[-1]
        curr_target_qpos = {**move_seg.end_qpos}

    for mg_id in self.robot_view.get_gripper_movegroup_ids():
        if mg_id in curr_target_qpos:
            del curr_target_qpos[mg_id]
    return curr_target_qpos
get_current_phase
get_current_phase() -> str
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def get_current_phase(self) -> str:
    if self.move_seg_idx is None:
        move_seg = self.move_segments[0]
    elif self.move_seg_idx < len(self.move_segments):
        move_seg = self.move_segments[self.move_seg_idx]
    else:
        move_seg = self.move_segments[-1]
    return move_seg.name
reset
reset()
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def reset(self):
    super().reset()
    self.move_seg_idx = None
    self.move_seg_start_time = None
MoveSegment dataclass
MoveSegment(name: str)

Attributes:

Name Type Description
duration float
name str
duration property
duration: float
name instance-attribute
name: str
MoveSequence
MoveSequence(robot_view: RobotView, settle_time: float, move_segments: list[MoveSegment], is_holding_object: bool = False, gripper_empty_threshold: float = 0.0)

Bases: ActionPrimitive

Methods:

Name Description
check_failure
elapsed_time
execute
get_current_action
get_current_phase
reset

Attributes:

Name Type Description
duration
gripper_empty_threshold
is_holding_object
move_seg_idx
move_seg_start_time
robot_view
settle_time
start_time
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def __init__(
    self,
    robot_view: RobotView,
    settle_time: float,
    move_segments: list[MoveSegment],
    is_holding_object: bool = False,
    gripper_empty_threshold: float = 0.0,
) -> None:
    super().__init__(robot_view, sum(seg.duration for seg in move_segments))
    self._move_segments = move_segments
    self.move_seg_idx = None
    self.move_seg_start_time = None
    self.settle_time = settle_time
    self.is_holding_object = is_holding_object
    self.gripper_empty_threshold = gripper_empty_threshold
duration instance-attribute
duration = duration
gripper_empty_threshold instance-attribute
gripper_empty_threshold = gripper_empty_threshold
is_holding_object instance-attribute
is_holding_object = is_holding_object
move_seg_idx instance-attribute
move_seg_idx = None
move_seg_start_time instance-attribute
move_seg_start_time = None
robot_view instance-attribute
robot_view = robot_view
settle_time instance-attribute
settle_time = settle_time
start_time instance-attribute
start_time = None
check_failure
check_failure() -> bool
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def check_failure(self) -> bool:
    if super().check_failure():
        return True

    if self.is_holding_object:
        gripper_mg_id = self.robot_view.get_gripper_movegroup_ids()[0]
        gripper = self.robot_view.get_gripper(gripper_mg_id)
        if (
            gripper.inter_finger_dist
            < gripper.inter_finger_dist_range[0] + self.gripper_empty_threshold
        ):
            log.info(
                f"Object is not in grasp! {gripper.inter_finger_dist:.05f} < {gripper.inter_finger_dist_range[0] + self.gripper_empty_threshold:05f}"
            )
            return True

    return False
elapsed_time
elapsed_time() -> float
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def elapsed_time(self) -> float:
    return self.robot_view.mj_data.time - self.start_time
execute
execute() -> bool
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def execute(self) -> bool:
    if self.start_time is None:
        self.start_time = self.robot_view.mj_data.time
        self.move_seg_idx = None
        self.move_seg_start_time = self.start_time

    duration_cumsum = np.cumsum([0] + [seg.duration for seg in self._move_segments])
    idx = np.searchsorted(duration_cumsum, self.elapsed_time(), side="right").item() - 1
    if idx != self.move_seg_idx:
        if idx < len(self._move_segments):
            log.info(f"Moving to {self._move_segments[idx].name}")
        self.move_seg_idx = idx
        self.move_seg_start_time = self.robot_view.mj_data.time

    return self.elapsed_time() >= self.duration + self.settle_time
get_current_action
get_current_action() -> dict[str, Any]
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def get_current_action(self) -> dict[str, Any]:
    elapsed_time = self.robot_view.mj_data.time - self.move_seg_start_time
    if self.move_seg_idx < len(self.move_segments):
        move_seg = self.move_segments[self.move_seg_idx]
        t = min(1.0, elapsed_time / move_seg.duration)

        lin_vel, ang_vel = transform_to_twist(
            np.linalg.inv(move_seg.start_pose) @ move_seg.end_pose
        )
        curr_target_pose = move_seg.start_pose @ twist_to_transform(lin_vel * t, ang_vel * t)
    else:
        move_seg = self.move_segments[-1]
        curr_target_pose = move_seg.end_pose

    # Solve IK for current target pose
    mg_id = self.robot_view.get_gripper_movegroup_ids()[0]
    return self.tcp_to_jp_fn(mg_id, curr_target_pose)
get_current_phase
get_current_phase() -> str
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def get_current_phase(self) -> str:
    if self.move_seg_idx is None:
        move_seg = self.move_segments[0]
    elif self.move_seg_idx < len(self.move_segments):
        move_seg = self.move_segments[self.move_seg_idx]
    else:
        move_seg = self.move_segments[-1]
    return move_seg.name
reset
reset()
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def reset(self):
    super().reset()
    self.move_seg_idx = None
    self.move_seg_start_time = None
NoopAction
NoopAction(robot_view: RobotView, duration: float)

Bases: ActionPrimitive

Methods:

Name Description
check_failure
elapsed_time
execute
get_current_action
get_current_phase
reset

Attributes:

Name Type Description
duration
robot_view
start_time
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def __init__(self, robot_view: RobotView, duration: float) -> None:
    super().__init__(robot_view, duration)
    self._action = None
duration instance-attribute
duration = duration
robot_view instance-attribute
robot_view = robot_view
start_time instance-attribute
start_time = None
check_failure
check_failure() -> bool
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def check_failure(self) -> bool:
    return False
elapsed_time
elapsed_time() -> float
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def elapsed_time(self) -> float:
    return self.robot_view.mj_data.time - self.start_time
execute
execute() -> bool
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def execute(self) -> bool:
    if self.start_time is None:
        self.start_time = self.robot_view.mj_data.time
    return self.elapsed_time() >= self.duration
get_current_action
get_current_action() -> dict[str, Any]
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def get_current_action(self) -> dict[str, Any]:
    if self._action is None:
        self._action = self.robot_view.get_noop_ctrl_dict()
    return self._action
get_current_phase
get_current_phase() -> str
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def get_current_phase(self) -> str:
    return "unknown"
reset
reset()
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def reset(self):
    self.start_time = None
TCPMoveSegment dataclass
TCPMoveSegment(name: str, start_pose: ndarray, end_pose: ndarray, speed: float)

Bases: MoveSegment

Attributes:

Name Type Description
duration float
end_pose ndarray
name str
speed float
start_pose ndarray
duration property
duration: float
end_pose instance-attribute
end_pose: ndarray
name instance-attribute
name: str
speed instance-attribute
speed: float
start_pose instance-attribute
start_pose: ndarray
TCPMoveSequence
TCPMoveSequence(robot_view: RobotView, tcp_to_jp_fn: Callable[[str, ndarray], dict[str, Any]], settle_time: float, move_segments: list[TCPMoveSegment], is_holding_object: bool = False, gripper_empty_threshold: float = 0.0, tcp_pos_err_threshold: float = inf, tcp_rot_err_threshold: float = inf)

Bases: MoveSequence

Methods:

Name Description
check_failure
elapsed_time
execute
get_current_action
get_current_phase
get_current_target_pose
reset

Attributes:

Name Type Description
duration
gripper_empty_threshold
is_holding_object
move_seg_idx
move_seg_start_time
move_segments list[TCPMoveSegment]
robot_view
settle_time
start_time
tcp_pos_err_threshold
tcp_rot_err_threshold
tcp_to_jp_fn
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def __init__(
    self,
    robot_view: RobotView,
    tcp_to_jp_fn: Callable[[str, np.ndarray], dict[str, Any]],
    settle_time: float,
    move_segments: list[TCPMoveSegment],
    is_holding_object: bool = False,
    gripper_empty_threshold: float = 0.0,
    tcp_pos_err_threshold: float = np.inf,
    tcp_rot_err_threshold: float = np.inf,
) -> None:
    super().__init__(
        robot_view,
        settle_time,
        move_segments,
        is_holding_object,
        gripper_empty_threshold,
    )
    self.tcp_to_jp_fn = tcp_to_jp_fn
    self.tcp_pos_err_threshold = tcp_pos_err_threshold
    self.tcp_rot_err_threshold = tcp_rot_err_threshold
duration instance-attribute
duration = duration
gripper_empty_threshold instance-attribute
gripper_empty_threshold = gripper_empty_threshold
is_holding_object instance-attribute
is_holding_object = is_holding_object
move_seg_idx instance-attribute
move_seg_idx = None
move_seg_start_time instance-attribute
move_seg_start_time = None
move_segments property
move_segments: list[TCPMoveSegment]
robot_view instance-attribute
robot_view = robot_view
settle_time instance-attribute
settle_time = settle_time
start_time instance-attribute
start_time = None
tcp_pos_err_threshold instance-attribute
tcp_pos_err_threshold = tcp_pos_err_threshold
tcp_rot_err_threshold instance-attribute
tcp_rot_err_threshold = tcp_rot_err_threshold
tcp_to_jp_fn instance-attribute
tcp_to_jp_fn = tcp_to_jp_fn
check_failure
check_failure() -> bool
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def check_failure(self) -> bool:
    if super().check_failure():
        return True
    elif self.move_seg_idx is None:
        # we haven't started moving yet
        return False

    curr_target_pose = self.get_current_target_pose()
    gripper_mg_id = self.robot_view.get_gripper_movegroup_ids()[0]
    gripper = self.robot_view.get_gripper(gripper_mg_id)

    trf = np.linalg.inv(gripper.leaf_frame_to_world) @ curr_target_pose
    pos_err = np.linalg.norm(trf[:3, 3])
    rot_err = R.from_matrix(trf[:3, :3]).magnitude()
    return pos_err > self.tcp_pos_err_threshold or rot_err > self.tcp_rot_err_threshold
elapsed_time
elapsed_time() -> float
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def elapsed_time(self) -> float:
    return self.robot_view.mj_data.time - self.start_time
execute
execute() -> bool
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def execute(self) -> bool:
    if self.start_time is None:
        self.start_time = self.robot_view.mj_data.time
        self.move_seg_idx = None
        self.move_seg_start_time = self.start_time

    duration_cumsum = np.cumsum([0] + [seg.duration for seg in self._move_segments])
    idx = np.searchsorted(duration_cumsum, self.elapsed_time(), side="right").item() - 1
    if idx != self.move_seg_idx:
        if idx < len(self._move_segments):
            log.info(f"Moving to {self._move_segments[idx].name}")
        self.move_seg_idx = idx
        self.move_seg_start_time = self.robot_view.mj_data.time

    return self.elapsed_time() >= self.duration + self.settle_time
get_current_action
get_current_action() -> dict[str, Any]
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def get_current_action(self) -> dict[str, Any]:
    curr_target_pose = self.get_current_target_pose()

    # Solve IK for current target pose
    mg_id = self.robot_view.get_gripper_movegroup_ids()[0]
    return self.tcp_to_jp_fn(mg_id, curr_target_pose)
get_current_phase
get_current_phase() -> str
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def get_current_phase(self) -> str:
    if self.move_seg_idx is None:
        move_seg = self.move_segments[0]
    elif self.move_seg_idx < len(self.move_segments):
        move_seg = self.move_segments[self.move_seg_idx]
    else:
        move_seg = self.move_segments[-1]
    return move_seg.name
get_current_target_pose
get_current_target_pose() -> ndarray
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def get_current_target_pose(self) -> np.ndarray:
    elapsed_time = self.robot_view.mj_data.time - self.move_seg_start_time
    assert self.move_seg_idx is not None
    if self.move_seg_idx < len(self.move_segments):
        move_seg = self.move_segments[self.move_seg_idx]
        t = min(1.0, elapsed_time / move_seg.duration)

        lin_vel, ang_vel = transform_to_twist(
            np.linalg.inv(move_seg.start_pose) @ move_seg.end_pose
        )
        curr_target_pose = move_seg.start_pose @ twist_to_transform(lin_vel * t, ang_vel * t)
    else:
        move_seg = self.move_segments[-1]
        curr_target_pose = move_seg.end_pose
    return curr_target_pose
reset
reset()
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def reset(self):
    super().reset()
    self.move_seg_idx = None
    self.move_seg_start_time = None
curobo_open_close_planner_policy

Classes:

Name Description
CuroboOpenClosePlannerPolicy

Curobo-based planner policy for open/close articulated object tasks.

OpenClosePhase

Attributes:

Name Type Description
log
log module-attribute
log = getLogger(__name__)
CuroboOpenClosePlannerPolicy
CuroboOpenClosePlannerPolicy(config: MlSpacesExpConfig, task: BaseMujocoTask)

Bases: CuroboPlannerPolicy, OpenClosePlannerPolicy

Curobo-based planner policy for open/close articulated object tasks.

Inherits common motion planning functionality from CuroboPlannerPolicy and task-specific functionality from OpenClosePlannerPolicy. Planning requests are made to a remote CuroboPlanner server via CuroboClient.

Methods:

Name Description
add_auxiliary_objects
batch_plan_trajectory

Plan trajectory in batches.

check_feasible_ik
clip_to_velocity_constraint

Clip action to respect velocity constraints.

get_action
get_all_phases

Return list of all phase names.

get_info

Get additional information from the policy. Called after episode ended. This method can be

get_look_at_action

Get action to look at a target.

get_phase
reset
select_arm

Select which arm to use and set up the planner (local or remote).

solve_ik

Solve IK and create an interpolated trajectory.

visualize_world_config_mesh

Visualize the world configuration as a mesh file.

Attributes:

Name Type Description
action_idx
action_primitives
arm_end_idx int
arm_side str | None
arm_start_idx int
articulation_pose_index
client CuroboClient | None
config
current_gripper_command dict[str, float]
current_phase
grasping_timesteps
height_adjustment_steps
ik_warmed_up
is_done bool

Property to expose completion state for task checking.

opening_timesteps
planned_trajectory list[list[float]] | None
planner CuroboPlanner | None
planner_joint_ranges dict[str, tuple[int, int]]
planners dict
policy_config
pre_grasp_poses
profiler
retry_count int
robot_view
sequential_ik_failures
settle_steps
steps_spent_in_waypoint int
target_poses
task
trajectory_index int
Source code in molmo_spaces/policy/solvers/object_manipulation/curobo_open_close_planner_policy.py
def __init__(self, config: MlSpacesExpConfig, task: BaseMujocoTask) -> None:
    # Initialize both parent classes
    CuroboPlannerPolicy.__init__(self, config, task)
    OpenClosePlannerPolicy.__init__(self, config, task)

    self.client: CuroboClient | None = None
    self._base_lock_joints: dict = {}  # fetched from server in select_arm()

    self.select_arm()
    self.pre_grasp_poses = None  # computed after height adjustment

    self.current_phase = OpenClosePhase.HEIGHT_SELECTION

    self.articulation_pose_index = 0
    self.grasping_timesteps = 0
    self.opening_timesteps = 0
    self.settle_steps = 0
    self.height_adjustment_steps = 0
    self._height_initial: float | None = None
    self._height_target: float | None = None
action_idx instance-attribute
action_idx = 0
action_primitives instance-attribute
action_primitives = []
arm_end_idx instance-attribute
arm_end_idx: int = 0
arm_side instance-attribute
arm_side: str | None = None
arm_start_idx instance-attribute
arm_start_idx: int = 0
articulation_pose_index instance-attribute
articulation_pose_index = 0
client instance-attribute
client: CuroboClient | None = None
config instance-attribute
config = config
current_gripper_command instance-attribute
current_gripper_command: dict[str, float] = {}
current_phase instance-attribute
current_phase = HEIGHT_SELECTION
grasping_timesteps instance-attribute
grasping_timesteps = 0
height_adjustment_steps instance-attribute
height_adjustment_steps = 0
ik_warmed_up instance-attribute
ik_warmed_up = False
is_done property
is_done: bool

Property to expose completion state for task checking.

opening_timesteps instance-attribute
opening_timesteps = 0
planned_trajectory instance-attribute
planned_trajectory: list[list[float]] | None = None
planner instance-attribute
planner: CuroboPlanner | None = None
planner_joint_ranges instance-attribute
planner_joint_ranges: dict[str, tuple[int, int]] = {}
planners property
planners: dict
policy_config instance-attribute
policy_config = policy_config
pre_grasp_poses instance-attribute
pre_grasp_poses = None
profiler instance-attribute
profiler = Profiler()
retry_count instance-attribute
retry_count: int = 0
robot_view instance-attribute
robot_view = robot_view
sequential_ik_failures instance-attribute
sequential_ik_failures = 0
settle_steps instance-attribute
settle_steps = 0
steps_spent_in_waypoint instance-attribute
steps_spent_in_waypoint: int = 0
target_poses instance-attribute
target_poses = {}
task instance-attribute
task = task
trajectory_index instance-attribute
trajectory_index: int = 0
add_auxiliary_objects staticmethod
add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
@staticmethod
def add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None:
    PlannerPolicy.add_auxiliary_objects(config, spec)
    policy_config = config.policy_config
    assert isinstance(policy_config, ObjectManipulationPlannerPolicyConfig)
    if policy_config.filter_colliding_grasps:
        add_grasp_collision_bodies(
            spec,
            policy_config.grasp_collision_batch_size,
            policy_config.grasp_width,
            policy_config.grasp_length,
            policy_config.grasp_height,
            np.array(policy_config.grasp_base_pos),
        )
batch_plan_trajectory
batch_plan_trajectory() -> None

Plan trajectory in batches.

Uses local planner or remote server depending on configuration. When using the server, lock joints and obstacles are passed atomically with each planning request to avoid races when multiple workers share the same server.

Source code in molmo_spaces/policy/solvers/object_manipulation/curobo_open_close_planner_policy.py
def batch_plan_trajectory(self) -> None:
    """Plan trajectory in batches.

    Uses local planner or remote server depending on configuration.
    When using the server, lock joints and obstacles are passed atomically
    with each planning request to avoid races when multiple workers share
    the same server.
    """
    if self._use_local_planner:
        return CuroboPlannerPolicy.batch_plan_trajectory(self)

    init_config = np.concatenate(
        [
            np.zeros(3),
            self.task.env.current_robot.robot_view.get_move_group(
                f"{self.arm_side}_arm"
            ).joint_pos,
        ]
    )

    # Build obstacle list for atomic world update during planning
    obstacles: list[dict] | None = None
    if getattr(self.config.policy_config, "enable_collision_avoidance", False):
        cuboids = self._get_nearby_object_collision_geometry()
        obstacles = self._cuboids_to_obstacle_list(cuboids)
        log.debug(
            f"Adding {len(cuboids)} cuboids to collision avoidance for phase: {self.current_phase}"
        )

    # Sample lock_joints atomically with the planning request
    lock_joints = self._get_lock_joints()

    goal_poses = self._get_batch_goal_poses()
    if goal_poses is None or len(goal_poses) == 0:
        log.warning("[BATCH PLAN] No goal poses available")
        return

    total = goal_poses.shape[0]
    batch_size = getattr(self.config.policy_config, "batch_size", 8)
    max_batches = getattr(self.config.policy_config, "max_batch_plan_attempts", 4)
    num_poses = min(max_batches * batch_size, total)
    num_batches = (num_poses + batch_size - 1) // batch_size

    all_successful_trajectories = []
    current_phase = getattr(self, "current_phase", "unknown")

    for batch_start in range(0, num_poses, batch_size):
        batch_end = min(batch_start + batch_size, num_poses)
        batch = goal_poses[batch_start:batch_end]

        if hasattr(self, "_show_poses"):
            self._show_poses(batch, style="tcp")
        if self.task.viewer:
            self.task.viewer.sync()

        log.info(
            f"Processing batch {batch_start // batch_size + 1}/{num_batches}: "
            f"poses {batch_start}-{batch_end - 1}"
        )

        batch_base_frame_7d = []
        for pose in batch:
            pose_base = self._target_pose_to_base_frame(pose)
            batch_base_frame_7d.append(pose_base.tolist())

        batch_len = len(batch_base_frame_7d)
        start_states = [init_config.tolist()] * batch_len

        log.info(
            f"Planning batch of {batch_len} {current_phase} poses with {self.arm_side} arm"
        )
        trajectories, successes = self.client.motion_plan_batch(
            joint_positions=start_states,
            goal_poses=batch_base_frame_7d,
            obstacles=obstacles,
            lock_joints=lock_joints,
        )

        num_successes = sum(successes)
        log.info(
            f"{self.arm_side.capitalize()} arm: {num_successes}/{batch_len} "
            f"{current_phase} planning successes"
        )

        for success, trajectory in zip(successes, trajectories):
            if not success or not trajectory:
                continue
            self._transform_traj_to_world_frame(trajectory)
            all_successful_trajectories.append(trajectory)

        if all_successful_trajectories:
            log.info(
                f"Found {len(all_successful_trajectories)} successful trajectories, "
                "skipping remaining batches"
            )
            break

    if all_successful_trajectories:
        self.planned_trajectory = self._select_best_trajectory(all_successful_trajectories)
    else:
        log.warning("[BATCH PLAN] No successful trajectories found across all batches")
check_feasible_ik
check_feasible_ik(pose: ndarray) -> bool
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def check_feasible_ik(self, pose: np.ndarray) -> bool:
    if pose.ndim > 2:
        assert pose.shape[1:] == (4, 4)
        batch_size = pose.shape[0]
        assert batch_size <= self.policy_config.grasp_feasibility_batch_size

        if batch_size == 0:
            return np.array([], dtype=bool)

        # pad to the batch size to avoid triggering recompilation of the IK solver
        if batch_size < self.policy_config.grasp_feasibility_batch_size:
            n_pad = self.policy_config.grasp_feasibility_batch_size - batch_size
            pose = np.concatenate([pose, np.broadcast_to(pose[-1:], (n_pad, 4, 4))])

        robot_view = self.task.env.current_robot.robot_view
        parallel_kinematics = self.task._env.robots[0].parallel_kinematics
        jp_dicts = parallel_kinematics.ik(
            pose,
            robot_view.get_qpos_dict(),
            robot_view.base.pose,
            rel_to_base=False,
            posture_weight=0.0,
        )
        return np.array([jp_dict is not None for jp_dict in jp_dicts[:batch_size]])
    else:
        assert pose.shape == (4, 4)
        robot_view = self.task.env.current_robot.robot_view
        kinematics = self.task.env.current_robot.kinematics
        gripper_mg_id = robot_view.get_gripper_movegroup_ids()[0]
        jp_dict = kinematics.ik(
            gripper_mg_id,
            pose,
            robot_view.move_group_ids(),
            robot_view.get_qpos_dict(),
            base_pose=robot_view.base.pose,
        )
        return jp_dict is not None
clip_to_velocity_constraint
clip_to_velocity_constraint(action: dict[str, Any]) -> dict[str, Any]

Clip action to respect velocity constraints.

Parameters:

Name Type Description Default
action dict[str, Any]

Dictionary of commanded joint positions by move group.

required

Returns:

Type Description
dict[str, Any]

Clipped action dictionary.

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py
def clip_to_velocity_constraint(self, action: dict[str, Any]) -> dict[str, Any]:
    """Clip action to respect velocity constraints.

    Args:
        action: Dictionary of commanded joint positions by move group.

    Returns:
        Clipped action dictionary.
    """
    velocity_constraints = getattr(self.config.policy_config, "velocity_constraints", {})
    clipped_action = action.copy()

    for move_group, commanded_action in action.items():
        if move_group in velocity_constraints:
            max_velocity_rad_per_s = velocity_constraints[move_group]
            move_group_view = self.task.env.robots[0].robot_view.get_move_group(move_group)
            move_group_joint_pos = move_group_view.joint_pos.copy()

            # Calculate difference, handling angular wraparound for base theta
            diff = commanded_action - move_group_joint_pos
            if move_group == "base":
                # Normalize theta difference to [-π, π]
                diff[2] = np.arctan2(np.sin(diff[2]), np.cos(diff[2]))

            velocity_rad_per_s = diff * 10.0

            # Clip velocity and recalculate commanded action
            clipped_velocity = np.clip(
                velocity_rad_per_s, -max_velocity_rad_per_s, max_velocity_rad_per_s
            )
            commanded_action = move_group_joint_pos + clipped_velocity / 10.0

            clipped_action[move_group] = commanded_action

    return clipped_action
get_action
get_action(info: dict[str, Any]) -> dict[str, Any]
Source code in molmo_spaces/policy/solvers/object_manipulation/curobo_open_close_planner_policy.py
def get_action(self, info: dict[str, Any]) -> dict[str, Any]:
    action_cmd = self.task.env.current_robot.robot_view.get_noop_ctrl_dict()
    if self.current_phase == OpenClosePhase.HEIGHT_SELECTION:
        action_cmd.update(self._execute_height_selection_phase())
    elif self.current_phase == OpenClosePhase.PREGRASP:
        action_cmd.update(self._execute_pre_grasp_phase())
    elif self.current_phase == OpenClosePhase.GRASP:
        action_cmd.update(self._execute_grasp_phase())
    elif self.current_phase == OpenClosePhase.ARTICULATE:
        action_cmd.update(self._execute_articulate_phase())
    elif self.current_phase == OpenClosePhase.POSTARTICULATE:
        action_cmd.update(self._execute_postarticulate_phase())
    elif self.current_phase == OpenClosePhase.DONE:
        return {"done": True}
    else:
        raise ValueError(f"Unknown phase: {self.current_phase}")

    action_cmd["torso"] = np.array([self.current_height])
    action_cmd = self.clip_to_velocity_constraint(action_cmd)
    return action_cmd
get_all_phases
get_all_phases() -> dict[str, int]

Return list of all phase names.

Source code in molmo_spaces/policy/solvers/object_manipulation/curobo_open_close_planner_policy.py
def get_all_phases(self) -> dict[str, int]:
    """Return list of all phase names."""
    return {phase.value: i for i, phase in enumerate(OpenClosePhase)}
get_info
get_info() -> dict

Get additional information from the policy. Called after episode ended. This method can be overridden by subclasses to provide extra information about the policy's state. Must be json serializable.

Returns:

Type Description
dict

A dictionary containing additional information about the policy.

Source code in molmo_spaces/policy/base_policy.py
def get_info(self) -> dict:
    """
    Get additional information from the policy. Called after episode ended. This method can be
    overridden by subclasses to provide extra information about the policy's state.
    Must be json serializable.

    Returns:
        A dictionary containing additional information about the policy.
    """
    return {}
get_look_at_action
get_look_at_action() -> dict[str, Any]

Get action to look at a target.

Subclasses can override this to implement head tracking.

Returns:

Type Description
dict[str, Any]

Dictionary with head control commands, or empty dict.

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py
def get_look_at_action(self) -> dict[str, Any]:
    """Get action to look at a target.

    Subclasses can override this to implement head tracking.

    Returns:
        Dictionary with head control commands, or empty dict.
    """
    return {}
get_phase
get_phase() -> OpenClosePhase
Source code in molmo_spaces/policy/solvers/object_manipulation/curobo_open_close_planner_policy.py
def get_phase(self) -> OpenClosePhase:
    return self.current_phase
reset
reset(reset_retries: bool = True)
Source code in molmo_spaces/policy/solvers/object_manipulation/curobo_open_close_planner_policy.py
def reset(self, reset_retries: bool = True):
    # Call parent reset
    CuroboPlannerPolicy.reset(self)

    self.current_arm = None
    self.current_phase = OpenClosePhase.HEIGHT_SELECTION
    self.grasping_timesteps = 0
    self.opening_timesteps = 0
    self.articulation_pose_index = 0
    self.height_adjustment_steps = 0
    self.current_height = 0.0
    self._height_initial = None
    self._height_target = None
    self.pre_grasp_poses = None
    dummy_pose = np.eye(4)
    self.target_poses = {
        "grasp": dummy_pose,
        "pregrasp": dummy_pose,
        "place": dummy_pose,
    }
    self.settle_steps = 0
    from molmo_spaces.policy.solvers.object_manipulation.base_object_manipulation_planner_policy import (
        NoopAction,
    )

    self.action_primitives = [NoopAction(self.task.env.current_robot.robot_view, 0.0)]
select_arm
select_arm() -> None

Select which arm to use and set up the planner (local or remote).

Source code in molmo_spaces/policy/solvers/object_manipulation/curobo_open_close_planner_policy.py
def select_arm(self) -> None:
    """Select which arm to use and set up the planner (local or remote)."""
    task_config = self.config.task_config
    om = self.task.env.object_managers[self.task.env.current_batch_index]
    pickup_obj = om.get_object_by_name(task_config.pickup_obj_name)
    pickup_obj_pos = pickup_obj.position

    left_tcp_pose = self.task.env.current_robot.robot_view.get_move_group(
        "left_gripper"
    ).leaf_frame_to_world
    right_tcp_pose = self.task.env.current_robot.robot_view.get_move_group(
        "right_gripper"
    ).leaf_frame_to_world

    left_dist = np.linalg.norm(left_tcp_pose[:3, 3] - pickup_obj_pos)
    right_dist = np.linalg.norm(right_tcp_pose[:3, 3] - pickup_obj_pos)

    selected_arm = "left" if left_dist < right_dist else "right"
    log.info(
        f"Selected {selected_arm} arm (left dist: {left_dist:.3f}m, right dist: {right_dist:.3f}m)"
    )

    self.arm_side = selected_arm

    if self._use_local_planner:
        self._select_arm_local(selected_arm)
    else:
        server_url = random.choice(self.config.policy_config.server_urls)
        log.info(f"Connecting to CuroboPlanner server at {server_url} (arm={selected_arm})")
        server_timeout = getattr(self.config.policy_config, "server_timeout", 120.0)
        self.client = CuroboClient(
            base_url=server_url, arm=selected_arm, timeout=server_timeout
        )

        # Fetch base lock_joints from server once; merged with live torso values per-request.
        self._base_lock_joints = self.client.get_lock_joints()
        log.info(f"Base lock_joints from server: {self._base_lock_joints}")

    if selected_arm == "left":
        self.planner_joint_ranges = self.config.policy_config.left_planner_joint_ranges
    else:
        self.planner_joint_ranges = self.config.policy_config.right_planner_joint_ranges

    self.arm_start_idx = self.planner_joint_ranges[f"{self.arm_side}_arm"][0]
    self.arm_end_idx = self.planner_joint_ranges[f"{self.arm_side}_arm"][1]
solve_ik
solve_ik(target_pose: ndarray) -> None

Solve IK and create an interpolated trajectory.

Uses local planner or remote server depending on configuration. When using the server, lock joints and obstacles are passed atomically with the request to avoid races when multiple workers share the same server.

Parameters:

Name Type Description Default
target_pose ndarray

4x4 transformation matrix for target end-effector pose in world frame.

required
Source code in molmo_spaces/policy/solvers/object_manipulation/curobo_open_close_planner_policy.py
def solve_ik(self, target_pose: np.ndarray) -> None:
    """Solve IK and create an interpolated trajectory.

    Uses local planner or remote server depending on configuration.
    When using the server, lock joints and obstacles are passed atomically
    with the request to avoid races when multiple workers share the same server.

    Args:
        target_pose: 4x4 transformation matrix for target end-effector pose in world frame.
    """
    if self._use_local_planner:
        return CuroboPlannerPolicy.solve_ik(self, target_pose)

    init_config = np.concatenate(
        [
            np.zeros(3),
            self.task.env.current_robot.robot_view.get_move_group(
                f"{self.arm_side}_arm"
            ).joint_pos,
        ]
    )

    target_pose_7d = self._target_pose_to_base_frame(target_pose)
    lock_joints = self._get_lock_joints()
    joint_config, success = self.client.ik(
        goal_pose=target_pose_7d.tolist(),
        seed_config=init_config.tolist(),
        disable_collision=True,
        lock_joints=lock_joints,
    )

    current_phase = getattr(self, "current_phase", "unknown")
    if not success:
        raise ValueError(f"Could not solve {current_phase} phase IK.")

    trajectory = self._interpolate_joint_trajectory(
        init_config,
        np.array(joint_config),
        num_steps=10,
    )
    self._transform_traj_to_world_frame(trajectory)
    self.planned_trajectory = trajectory
visualize_world_config_mesh
visualize_world_config_mesh(world_cfg: WorldConfig) -> None

Visualize the world configuration as a mesh file.

Parameters:

Name Type Description Default
world_cfg WorldConfig

Curobo WorldConfig to visualize.

required
Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py
def visualize_world_config_mesh(self, world_cfg: WorldConfig) -> None:
    """Visualize the world configuration as a mesh file.

    Args:
        world_cfg: Curobo WorldConfig to visualize.
    """
    from datetime import datetime

    import trimesh
    from curobo.geom.types import WorldConfig

    current_phase = getattr(self, "current_phase", "unknown")
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    file_name = f"world_config_mesh_{current_phase}_{timestamp}"
    obj_path = f"{file_name}.obj"
    initial_joint_configuration = np.array(self._get_current_joint_positions())

    try:
        # Create a combined scene with robot and world obstacles
        combined_scene = trimesh.scene.scene.Scene(base_frame="world_origin")

        # Add robot mesh if we have a joint configuration
        if initial_joint_configuration is not None and self.planner is not None:
            try:
                config_for_mesh = np.array(initial_joint_configuration).copy()
                # Zero out base joints to place robot at origin
                config_for_mesh[:3] = 0.0
                q_tensor = torch.tensor(config_for_mesh).unsqueeze(0).float().cuda()

                robot_spheres_batch = self.planner.motion_gen.kinematics.get_robot_as_spheres(
                    q_tensor
                )
                if robot_spheres_batch and len(robot_spheres_batch) > 0:
                    robot_spheres = robot_spheres_batch[0]
                    for i, sphere in enumerate(robot_spheres):
                        sphere_mesh = trimesh.creation.icosphere(radius=sphere.radius)
                        sphere_transform = np.eye(4)
                        sphere_transform[:3, 3] = sphere.pose[:3]
                        combined_scene.add_geometry(
                            sphere_mesh,
                            geom_name=f"robot_sphere_{i}",
                            parent_node_name="world_origin",
                            transform=sphere_transform,
                        )
                    log.info(
                        f"Added {len(robot_spheres)} robot collision spheres to visualization"
                    )

            except Exception as e:
                log.warning(f"Could not add robot mesh: {e}")

        # Add world obstacles to the scene
        try:
            world_scene = WorldConfig.get_scene_graph(world_cfg, process_color=True)
            for geom_name, geom in world_scene.geometry.items():
                transform = world_scene.graph.get(geom_name)[0]
                combined_scene.add_geometry(
                    geom,
                    geom_name=geom_name,
                    parent_node_name="world_origin",
                    transform=transform,
                )
        except Exception as e:
            log.warning(f"Could not add world obstacles: {e}")

        # Export combined scene
        if len(combined_scene.geometry) > 0:
            combined_scene.export(obj_path)
            log.info(f"Successfully saved combined robot + world mesh to {obj_path}")
        else:
            log.debug("Skipping mesh export - scene is empty")

    except ValueError as e:
        if "empty scene" in str(e).lower():
            log.debug("Skipping mesh export - world config is empty")
        else:
            raise
OpenClosePhase

Bases: str, Enum

Attributes:

Name Type Description
ARTICULATE
DONE
GRASP
HEIGHT_SELECTION
POSTARTICULATE
PREGRASP
ARTICULATE class-attribute instance-attribute
ARTICULATE = 'articulate'
DONE class-attribute instance-attribute
DONE = 'done'
GRASP class-attribute instance-attribute
GRASP = 'grasp'
HEIGHT_SELECTION class-attribute instance-attribute
HEIGHT_SELECTION = 'height_selection'
POSTARTICULATE class-attribute instance-attribute
POSTARTICULATE = 'place'
PREGRASP class-attribute instance-attribute
PREGRASP = 'pregrasp'
curobo_pick_and_place_planner_policy

Classes:

Name Description
CuroboPickAndPlacePlannerPolicy

Curobo-based planner policy for pick and place tasks.

PickAndPlacePhase

Attributes:

Name Type Description
log
log module-attribute
log = getLogger(__name__)
CuroboPickAndPlacePlannerPolicy
CuroboPickAndPlacePlannerPolicy(config: MlSpacesExpConfig, task: BaseMujocoTask)

Bases: CuroboPlannerPolicy, PickAndPlacePlannerPolicy

Curobo-based planner policy for pick and place tasks.

Inherits common motion planning functionality from CuroboPlannerPolicy and task-specific functionality from PickAndPlacePlannerPolicy. Planning requests are made to a remote CuroboPlanner server via CuroboClient.

Methods:

Name Description
add_auxiliary_objects
batch_plan_trajectory

Plan trajectory in batches.

check_feasible_ik
clip_to_velocity_constraint

Clip action to respect velocity constraints.

get_action
get_all_phases

Return list of all phase names.

get_info

Get additional information from the policy. Called after episode ended. This method can be

get_look_at_action

Get action to look at a target.

get_phase
reset
select_arm

Select which arm to use and set up the planner (local or remote).

solve_ik

Solve IK and create an interpolated trajectory.

visualize_world_config_mesh

Visualize the world configuration as a mesh file.

Attributes:

Name Type Description
action_idx
action_primitives
arm_end_idx int
arm_side str | None
arm_start_idx int
client CuroboClient | None
config
current_gripper_command dict[str, float]
current_phase
grasping_timesteps
ik_warmed_up
is_done bool

Property to expose completion state for task checking.

opening_timesteps
place_poses
planned_trajectory list[list[float]] | None
planner CuroboPlanner | None
planner_joint_ranges dict[str, tuple[int, int]]
planners dict
policy_config
pre_grasp_poses
profiler
retry_count int
robot_view
sequential_ik_failures
settle_steps
steps_spent_in_waypoint int
target_poses
task
trajectory_index int
Source code in molmo_spaces/policy/solvers/object_manipulation/curobo_pick_and_place_planner_policy.py
def __init__(self, config: MlSpacesExpConfig, task: BaseMujocoTask) -> None:
    # Initialize both parent classes
    CuroboPlannerPolicy.__init__(self, config, task)
    PickAndPlacePlannerPolicy.__init__(self, config, task)

    self.client: CuroboClient | None = None

    self.select_arm()
    self.pre_grasp_poses = self._get_pregrasp_poses()
    if isinstance(self.task, PickAndPlaceTask):
        self.place_poses = self._get_place_poses()

    self.current_phase = PickAndPlacePhase.PREGRASP

    self.grasping_timesteps = 0
    self.opening_timesteps = 0
    self.settle_steps = 0
action_idx instance-attribute
action_idx = 0
action_primitives instance-attribute
action_primitives = []
arm_end_idx instance-attribute
arm_end_idx: int = 0
arm_side instance-attribute
arm_side: str | None = None
arm_start_idx instance-attribute
arm_start_idx: int = 0
client instance-attribute
client: CuroboClient | None = None
config instance-attribute
config = config
current_gripper_command instance-attribute
current_gripper_command: dict[str, float] = {}
current_phase instance-attribute
current_phase = PREGRASP
grasping_timesteps instance-attribute
grasping_timesteps = 0
ik_warmed_up instance-attribute
ik_warmed_up = False
is_done property
is_done: bool

Property to expose completion state for task checking.

opening_timesteps instance-attribute
opening_timesteps = 0
place_poses instance-attribute
place_poses = _get_place_poses()
planned_trajectory instance-attribute
planned_trajectory: list[list[float]] | None = None
planner instance-attribute
planner: CuroboPlanner | None = None
planner_joint_ranges instance-attribute
planner_joint_ranges: dict[str, tuple[int, int]] = {}
planners property
planners: dict
policy_config instance-attribute
policy_config = policy_config
pre_grasp_poses instance-attribute
pre_grasp_poses = _get_pregrasp_poses()
profiler instance-attribute
profiler = Profiler()
retry_count instance-attribute
retry_count: int = 0
robot_view instance-attribute
robot_view = robot_view
sequential_ik_failures instance-attribute
sequential_ik_failures = 0
settle_steps instance-attribute
settle_steps = 0
steps_spent_in_waypoint instance-attribute
steps_spent_in_waypoint: int = 0
target_poses instance-attribute
target_poses = {}
task instance-attribute
task = task
trajectory_index instance-attribute
trajectory_index: int = 0
add_auxiliary_objects staticmethod
add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
@staticmethod
def add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None:
    PlannerPolicy.add_auxiliary_objects(config, spec)
    policy_config = config.policy_config
    assert isinstance(policy_config, ObjectManipulationPlannerPolicyConfig)
    if policy_config.filter_colliding_grasps:
        add_grasp_collision_bodies(
            spec,
            policy_config.grasp_collision_batch_size,
            policy_config.grasp_width,
            policy_config.grasp_length,
            policy_config.grasp_height,
            np.array(policy_config.grasp_base_pos),
        )
batch_plan_trajectory
batch_plan_trajectory() -> None

Plan trajectory in batches.

Uses local planner or remote server depending on configuration. When using the server, obstacles are passed atomically with each planning request to avoid races when multiple workers share the same server.

Source code in molmo_spaces/policy/solvers/object_manipulation/curobo_pick_and_place_planner_policy.py
def batch_plan_trajectory(self) -> None:
    """Plan trajectory in batches.

    Uses local planner or remote server depending on configuration.
    When using the server, obstacles are passed atomically with each planning
    request to avoid races when multiple workers share the same server.
    """
    if self._use_local_planner:
        return CuroboPlannerPolicy.batch_plan_trajectory(self)

    init_config = np.concatenate(
        [
            np.zeros(3),
            self.task.env.current_robot.robot_view.get_move_group(
                f"{self.arm_side}_arm"
            ).joint_pos,
        ]
    )

    # Build obstacle list for atomic world update during planning
    obstacles: list[dict] | None = None
    if getattr(self.config.policy_config, "enable_collision_avoidance", False):
        cuboids = self._get_collision_cuboids()
        obstacles = self._cuboids_to_obstacle_list(cuboids)
        log.debug(
            f"Adding {len(cuboids)} cuboids to collision avoidance for phase: {self.current_phase}"
        )

    # Attach pickup object to EE link for place phase if configured
    if self.current_phase == PickAndPlacePhase.PLACE and self.config.policy_config.attach_obj:
        pickup_obj_name = self.config.task_config.pickup_obj_name
        joint_pos_for_attach = np.array(self._get_current_joint_positions()).copy()
        base_range = self.planner_joint_ranges["base"]
        joint_pos_for_attach[base_range[0] : base_range[1]] = 0.0
        self.client.attach_object(
            object_names=[pickup_obj_name],
            joint_position=joint_pos_for_attach.tolist(),
            attach_link_names=[f"attached_object_{self.arm_side}"],
        )
        log.info(
            f"Attached object '{pickup_obj_name}' to link 'attached_object_{self.arm_side}'"
        )

    # Get goal poses based on current phase
    goal_poses = self._get_batch_goal_poses()
    if goal_poses is None or len(goal_poses) == 0:
        log.warning("[BATCH PLAN] No goal poses available")
        return

    total = goal_poses.shape[0]
    batch_size = getattr(self.config.policy_config, "batch_size", 8)
    max_batches = getattr(self.config.policy_config, "max_batch_plan_attempts", 4)
    num_poses = min(max_batches * batch_size, total)
    num_batches = (num_poses + batch_size - 1) // batch_size

    all_successful_trajectories = []
    current_phase = getattr(self, "current_phase", "unknown")

    for batch_start in range(0, num_poses, batch_size):
        batch_end = min(batch_start + batch_size, num_poses)
        batch = goal_poses[batch_start:batch_end]

        if hasattr(self, "_show_poses"):
            self._show_poses(batch, style="tcp")
        if self.task.viewer:
            self.task.viewer.sync()

        log.info(
            f"Processing batch {batch_start // batch_size + 1}/{num_batches}: "
            f"poses {batch_start}-{batch_end - 1}"
        )

        # Transform poses to base frame and convert to 7D format
        batch_base_frame_7d = []
        for pose in batch:
            pose_base = self._target_pose_to_base_frame(pose)
            batch_base_frame_7d.append(pose_base.tolist())

        batch_len = len(batch_base_frame_7d)
        start_states = [init_config.tolist()] * batch_len

        log.info(
            f"Planning batch of {batch_len} {current_phase} poses with {self.arm_side} arm"
        )
        trajectories, successes = self.client.motion_plan_batch(
            joint_positions=start_states,
            goal_poses=batch_base_frame_7d,
            obstacles=obstacles,
        )

        num_successes = sum(successes)
        log.info(
            f"{self.arm_side.capitalize()} arm: {num_successes}/{batch_len} "
            f"{current_phase} planning successes"
        )

        for success, trajectory in zip(successes, trajectories):
            if not success or not trajectory:
                continue
            self._transform_traj_to_world_frame(trajectory)
            all_successful_trajectories.append(trajectory)

        if all_successful_trajectories:
            log.info(
                f"Found {len(all_successful_trajectories)} successful trajectories, "
                "skipping remaining batches"
            )
            break

    if all_successful_trajectories:
        self.planned_trajectory = self._select_best_trajectory(all_successful_trajectories)
    else:
        log.warning("[BATCH PLAN] No successful trajectories found across all batches")
check_feasible_ik
check_feasible_ik(pose: ndarray) -> bool
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def check_feasible_ik(self, pose: np.ndarray) -> bool:
    if pose.ndim > 2:
        assert pose.shape[1:] == (4, 4)
        batch_size = pose.shape[0]
        assert batch_size <= self.policy_config.grasp_feasibility_batch_size

        if batch_size == 0:
            return np.array([], dtype=bool)

        # pad to the batch size to avoid triggering recompilation of the IK solver
        if batch_size < self.policy_config.grasp_feasibility_batch_size:
            n_pad = self.policy_config.grasp_feasibility_batch_size - batch_size
            pose = np.concatenate([pose, np.broadcast_to(pose[-1:], (n_pad, 4, 4))])

        robot_view = self.task.env.current_robot.robot_view
        parallel_kinematics = self.task._env.robots[0].parallel_kinematics
        jp_dicts = parallel_kinematics.ik(
            pose,
            robot_view.get_qpos_dict(),
            robot_view.base.pose,
            rel_to_base=False,
            posture_weight=0.0,
        )
        return np.array([jp_dict is not None for jp_dict in jp_dicts[:batch_size]])
    else:
        assert pose.shape == (4, 4)
        robot_view = self.task.env.current_robot.robot_view
        kinematics = self.task.env.current_robot.kinematics
        gripper_mg_id = robot_view.get_gripper_movegroup_ids()[0]
        jp_dict = kinematics.ik(
            gripper_mg_id,
            pose,
            robot_view.move_group_ids(),
            robot_view.get_qpos_dict(),
            base_pose=robot_view.base.pose,
        )
        return jp_dict is not None
clip_to_velocity_constraint
clip_to_velocity_constraint(action: dict[str, Any]) -> dict[str, Any]

Clip action to respect velocity constraints.

Parameters:

Name Type Description Default
action dict[str, Any]

Dictionary of commanded joint positions by move group.

required

Returns:

Type Description
dict[str, Any]

Clipped action dictionary.

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py
def clip_to_velocity_constraint(self, action: dict[str, Any]) -> dict[str, Any]:
    """Clip action to respect velocity constraints.

    Args:
        action: Dictionary of commanded joint positions by move group.

    Returns:
        Clipped action dictionary.
    """
    velocity_constraints = getattr(self.config.policy_config, "velocity_constraints", {})
    clipped_action = action.copy()

    for move_group, commanded_action in action.items():
        if move_group in velocity_constraints:
            max_velocity_rad_per_s = velocity_constraints[move_group]
            move_group_view = self.task.env.robots[0].robot_view.get_move_group(move_group)
            move_group_joint_pos = move_group_view.joint_pos.copy()

            # Calculate difference, handling angular wraparound for base theta
            diff = commanded_action - move_group_joint_pos
            if move_group == "base":
                # Normalize theta difference to [-π, π]
                diff[2] = np.arctan2(np.sin(diff[2]), np.cos(diff[2]))

            velocity_rad_per_s = diff * 10.0

            # Clip velocity and recalculate commanded action
            clipped_velocity = np.clip(
                velocity_rad_per_s, -max_velocity_rad_per_s, max_velocity_rad_per_s
            )
            commanded_action = move_group_joint_pos + clipped_velocity / 10.0

            clipped_action[move_group] = commanded_action

    return clipped_action
get_action
get_action(info: dict[str, Any]) -> dict[str, Any]
Source code in molmo_spaces/policy/solvers/object_manipulation/curobo_pick_and_place_planner_policy.py
def get_action(self, info: dict[str, Any]) -> dict[str, Any]:
    action_cmd = self.task.env.current_robot.robot_view.get_noop_ctrl_dict()
    if self.current_phase == PickAndPlacePhase.PREGRASP:
        action_cmd.update(self._execute_pre_grasp_phase())
    elif self.current_phase == PickAndPlacePhase.GRASP:
        action_cmd.update(self._execute_grasp_phase())
    elif self.current_phase == PickAndPlacePhase.LIFT:
        action_cmd.update(self._execute_lift_phase())
    elif self.current_phase == PickAndPlacePhase.PLACE:
        action_cmd.update(self._execute_place_phase())
    elif self.current_phase == PickAndPlacePhase.POSTPLACE:
        action_cmd.update(self._execute_postplace_phase())
    elif self.current_phase == PickAndPlacePhase.DONE:
        return {"done": True}
    else:
        raise ValueError(f"Unknown phase: {self.current_phase}")
    action_cmd = self.clip_to_velocity_constraint(action_cmd)
    return action_cmd
get_all_phases
get_all_phases() -> dict[str, int]

Return list of all phase names.

Source code in molmo_spaces/policy/solvers/object_manipulation/curobo_pick_and_place_planner_policy.py
def get_all_phases(self) -> dict[str, int]:
    """Return list of all phase names."""
    return {phase.value: i for i, phase in enumerate(PickAndPlacePhase)}
get_info
get_info() -> dict

Get additional information from the policy. Called after episode ended. This method can be overridden by subclasses to provide extra information about the policy's state. Must be json serializable.

Returns:

Type Description
dict

A dictionary containing additional information about the policy.

Source code in molmo_spaces/policy/base_policy.py
def get_info(self) -> dict:
    """
    Get additional information from the policy. Called after episode ended. This method can be
    overridden by subclasses to provide extra information about the policy's state.
    Must be json serializable.

    Returns:
        A dictionary containing additional information about the policy.
    """
    return {}
get_look_at_action
get_look_at_action() -> dict[str, Any]

Get action to look at a target.

Subclasses can override this to implement head tracking.

Returns:

Type Description
dict[str, Any]

Dictionary with head control commands, or empty dict.

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py
def get_look_at_action(self) -> dict[str, Any]:
    """Get action to look at a target.

    Subclasses can override this to implement head tracking.

    Returns:
        Dictionary with head control commands, or empty dict.
    """
    return {}
get_phase
get_phase() -> PickAndPlacePhase
Source code in molmo_spaces/policy/solvers/object_manipulation/curobo_pick_and_place_planner_policy.py
def get_phase(self) -> PickAndPlacePhase:
    return self.current_phase
reset
reset(reset_retries: bool = True)
Source code in molmo_spaces/policy/solvers/object_manipulation/curobo_pick_and_place_planner_policy.py
def reset(self, reset_retries: bool = True):
    # Call parent reset
    CuroboPlannerPolicy.reset(self)

    self.current_arm = None
    self.current_phase = PickAndPlacePhase.PREGRASP
    self.grasping_timesteps = 0
    self.opening_timesteps = 0
    dummy_pose = np.eye(4)
    self.target_poses = {
        "grasp": dummy_pose,
        "pregrasp": dummy_pose,
        "place": dummy_pose,
    }
    self.settle_steps = 0
    from molmo_spaces.policy.solvers.object_manipulation.base_object_manipulation_planner_policy import (
        NoopAction,
    )

    self.action_primitives = [NoopAction(self.task.env.current_robot.robot_view, 0.0)]
select_arm
select_arm() -> None

Select which arm to use and set up the planner (local or remote).

Source code in molmo_spaces/policy/solvers/object_manipulation/curobo_pick_and_place_planner_policy.py
def select_arm(self) -> None:
    """Select which arm to use and set up the planner (local or remote)."""
    task_config = self.config.task_config
    om = self.task.env.object_managers[self.task.env.current_batch_index]
    pickup_obj = om.get_object_by_name(task_config.pickup_obj_name)
    pickup_obj_pos = pickup_obj.position

    left_tcp_pose = self.task.env.current_robot.robot_view.get_move_group(
        "left_gripper"
    ).leaf_frame_to_world
    right_tcp_pose = self.task.env.current_robot.robot_view.get_move_group(
        "right_gripper"
    ).leaf_frame_to_world

    left_dist = np.linalg.norm(left_tcp_pose[:3, 3] - pickup_obj_pos)
    right_dist = np.linalg.norm(right_tcp_pose[:3, 3] - pickup_obj_pos)

    selected_arm = "left" if left_dist < right_dist else "right"
    log.info(
        f"Selected {selected_arm} arm (left dist: {left_dist:.3f}m, right dist: {right_dist:.3f}m)"
    )

    self.arm_side = selected_arm

    if self._use_local_planner:
        self._select_arm_local(selected_arm)
    else:
        server_url = random.choice(self.config.policy_config.server_urls)
        log.info(f"Connecting to CuroboPlanner server at {server_url} (arm={selected_arm})")
        server_timeout = getattr(self.config.policy_config, "server_timeout", 120.0)
        self.client = CuroboClient(
            base_url=server_url, arm=selected_arm, timeout=server_timeout
        )

    if selected_arm == "left":
        self.planner_joint_ranges = self.config.policy_config.left_planner_joint_ranges
    else:
        self.planner_joint_ranges = self.config.policy_config.right_planner_joint_ranges

    self.arm_start_idx = self.planner_joint_ranges[f"{self.arm_side}_arm"][0]
    self.arm_end_idx = self.planner_joint_ranges[f"{self.arm_side}_arm"][1]
solve_ik
solve_ik(target_pose: ndarray) -> None

Solve IK and create an interpolated trajectory.

Uses local planner or remote server depending on configuration.

Parameters:

Name Type Description Default
target_pose ndarray

4x4 transformation matrix for target end-effector pose in world frame.

required
Source code in molmo_spaces/policy/solvers/object_manipulation/curobo_pick_and_place_planner_policy.py
def solve_ik(self, target_pose: np.ndarray) -> None:
    """Solve IK and create an interpolated trajectory.

    Uses local planner or remote server depending on configuration.

    Args:
        target_pose: 4x4 transformation matrix for target end-effector pose in world frame.
    """
    if self._use_local_planner:
        return CuroboPlannerPolicy.solve_ik(self, target_pose)

    init_config = np.concatenate(
        [
            np.zeros(3),
            self.task.env.current_robot.robot_view.get_move_group(
                f"{self.arm_side}_arm"
            ).joint_pos,
        ]
    )

    target_pose_7d = self._target_pose_to_base_frame(target_pose)
    joint_config, success = self.client.ik(
        goal_pose=target_pose_7d.tolist(),
        seed_config=init_config.tolist(),
        disable_collision=True,
    )

    current_phase = getattr(self, "current_phase", "unknown")
    if not success:
        raise ValueError(f"Could not solve {current_phase} phase IK.")

    trajectory = self._interpolate_joint_trajectory(
        init_config,
        np.array(joint_config),
        num_steps=10,
    )
    self._transform_traj_to_world_frame(trajectory)
    self.planned_trajectory = trajectory
visualize_world_config_mesh
visualize_world_config_mesh(world_cfg: WorldConfig) -> None

Visualize the world configuration as a mesh file.

Parameters:

Name Type Description Default
world_cfg WorldConfig

Curobo WorldConfig to visualize.

required
Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py
def visualize_world_config_mesh(self, world_cfg: WorldConfig) -> None:
    """Visualize the world configuration as a mesh file.

    Args:
        world_cfg: Curobo WorldConfig to visualize.
    """
    from datetime import datetime

    import trimesh
    from curobo.geom.types import WorldConfig

    current_phase = getattr(self, "current_phase", "unknown")
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    file_name = f"world_config_mesh_{current_phase}_{timestamp}"
    obj_path = f"{file_name}.obj"
    initial_joint_configuration = np.array(self._get_current_joint_positions())

    try:
        # Create a combined scene with robot and world obstacles
        combined_scene = trimesh.scene.scene.Scene(base_frame="world_origin")

        # Add robot mesh if we have a joint configuration
        if initial_joint_configuration is not None and self.planner is not None:
            try:
                config_for_mesh = np.array(initial_joint_configuration).copy()
                # Zero out base joints to place robot at origin
                config_for_mesh[:3] = 0.0
                q_tensor = torch.tensor(config_for_mesh).unsqueeze(0).float().cuda()

                robot_spheres_batch = self.planner.motion_gen.kinematics.get_robot_as_spheres(
                    q_tensor
                )
                if robot_spheres_batch and len(robot_spheres_batch) > 0:
                    robot_spheres = robot_spheres_batch[0]
                    for i, sphere in enumerate(robot_spheres):
                        sphere_mesh = trimesh.creation.icosphere(radius=sphere.radius)
                        sphere_transform = np.eye(4)
                        sphere_transform[:3, 3] = sphere.pose[:3]
                        combined_scene.add_geometry(
                            sphere_mesh,
                            geom_name=f"robot_sphere_{i}",
                            parent_node_name="world_origin",
                            transform=sphere_transform,
                        )
                    log.info(
                        f"Added {len(robot_spheres)} robot collision spheres to visualization"
                    )

            except Exception as e:
                log.warning(f"Could not add robot mesh: {e}")

        # Add world obstacles to the scene
        try:
            world_scene = WorldConfig.get_scene_graph(world_cfg, process_color=True)
            for geom_name, geom in world_scene.geometry.items():
                transform = world_scene.graph.get(geom_name)[0]
                combined_scene.add_geometry(
                    geom,
                    geom_name=geom_name,
                    parent_node_name="world_origin",
                    transform=transform,
                )
        except Exception as e:
            log.warning(f"Could not add world obstacles: {e}")

        # Export combined scene
        if len(combined_scene.geometry) > 0:
            combined_scene.export(obj_path)
            log.info(f"Successfully saved combined robot + world mesh to {obj_path}")
        else:
            log.debug("Skipping mesh export - scene is empty")

    except ValueError as e:
        if "empty scene" in str(e).lower():
            log.debug("Skipping mesh export - world config is empty")
        else:
            raise
PickAndPlacePhase

Bases: str, Enum

Attributes:

Name Type Description
DONE
GRASP
LIFT
PLACE
POSTPLACE
PREGRASP
DONE class-attribute instance-attribute
DONE = 'done'
GRASP class-attribute instance-attribute
GRASP = 'grasp'
LIFT class-attribute instance-attribute
LIFT = 'lift'
PLACE class-attribute instance-attribute
PLACE = 'place'
POSTPLACE class-attribute instance-attribute
POSTPLACE = 'postplace'
PREGRASP class-attribute instance-attribute
PREGRASP = 'pregrasp'
open_close_planner_policy

Classes:

Name Description
OpenClosePlannerPolicy

Attributes:

Name Type Description
log
log module-attribute
log = getLogger(__name__)
OpenClosePlannerPolicy
OpenClosePlannerPolicy(config: MlSpacesExpConfig, task: BaseMujocoTask)

Bases: BaseObjectManipulationPlannerPolicy

Methods:

Name Description
add_auxiliary_objects
check_feasible_ik
get_action
get_all_phases
get_info

Get additional information from the policy. Called after episode ended. This method can be

get_phase
reset

Attributes:

Name Type Description
action_idx
action_primitives
config
ik_warmed_up
planners
policy_config
retry_count
robot_view
sequential_ik_failures
target_poses
task
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def __init__(self, config: MlSpacesExpConfig, task: BaseMujocoTask) -> None:
    assert isinstance(config.policy_config, ObjectManipulationPlannerPolicyConfig)
    super().__init__(config, task)
    self.policy_config = config.policy_config
    self.robot_view = task.env.current_robot.robot_view

    self.action_primitives = []
    self.action_idx = 0
    self.retry_count = 0
    self.target_poses = {}
    self.ik_warmed_up = False
    self.sequential_ik_failures = 0
action_idx instance-attribute
action_idx = 0
action_primitives instance-attribute
action_primitives = []
config instance-attribute
config = config
ik_warmed_up instance-attribute
ik_warmed_up = False
planners property
planners
policy_config instance-attribute
policy_config = policy_config
retry_count instance-attribute
retry_count = 0
robot_view instance-attribute
robot_view = robot_view
sequential_ik_failures instance-attribute
sequential_ik_failures = 0
target_poses instance-attribute
target_poses = {}
task instance-attribute
task = task
add_auxiliary_objects staticmethod
add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
@staticmethod
def add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None:
    PlannerPolicy.add_auxiliary_objects(config, spec)
    policy_config = config.policy_config
    assert isinstance(policy_config, ObjectManipulationPlannerPolicyConfig)
    if policy_config.filter_colliding_grasps:
        add_grasp_collision_bodies(
            spec,
            policy_config.grasp_collision_batch_size,
            policy_config.grasp_width,
            policy_config.grasp_length,
            policy_config.grasp_height,
            np.array(policy_config.grasp_base_pos),
        )
check_feasible_ik
check_feasible_ik(pose: ndarray) -> bool
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def check_feasible_ik(self, pose: np.ndarray) -> bool:
    if pose.ndim > 2:
        assert pose.shape[1:] == (4, 4)
        batch_size = pose.shape[0]
        assert batch_size <= self.policy_config.grasp_feasibility_batch_size

        if batch_size == 0:
            return np.array([], dtype=bool)

        # pad to the batch size to avoid triggering recompilation of the IK solver
        if batch_size < self.policy_config.grasp_feasibility_batch_size:
            n_pad = self.policy_config.grasp_feasibility_batch_size - batch_size
            pose = np.concatenate([pose, np.broadcast_to(pose[-1:], (n_pad, 4, 4))])

        robot_view = self.task.env.current_robot.robot_view
        parallel_kinematics = self.task._env.robots[0].parallel_kinematics
        jp_dicts = parallel_kinematics.ik(
            pose,
            robot_view.get_qpos_dict(),
            robot_view.base.pose,
            rel_to_base=False,
            posture_weight=0.0,
        )
        return np.array([jp_dict is not None for jp_dict in jp_dicts[:batch_size]])
    else:
        assert pose.shape == (4, 4)
        robot_view = self.task.env.current_robot.robot_view
        kinematics = self.task.env.current_robot.kinematics
        gripper_mg_id = robot_view.get_gripper_movegroup_ids()[0]
        jp_dict = kinematics.ik(
            gripper_mg_id,
            pose,
            robot_view.move_group_ids(),
            robot_view.get_qpos_dict(),
            base_pose=robot_view.base.pose,
        )
        return jp_dict is not None
get_action
get_action(info: dict[str, Any]) -> dict[str, Any]
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def get_action(self, info: dict[str, Any]) -> dict[str, Any]:
    if self._check_for_failures():
        return self._handle_failure()

    # get the action from the current action primitive, absorbing 0-length segments as necessary
    while self.action_idx < len(self.action_primitives):  # guaranteed to terminate
        act_prim = self.action_primitives[self.action_idx]
        proceed = act_prim.execute()
        # cannot have a 0-length segment that doesn't proceed
        assert proceed or act_prim.duration > 0.0
        if proceed:
            self.action_idx += 1
        if not proceed or act_prim.duration > 0.0:
            break
    if self.action_idx < len(self.action_primitives):
        action = act_prim.get_current_action()
    else:
        action = self.action_primitives[-1].get_current_action()
        action["done"] = True
    return action
get_all_phases
get_all_phases()
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def get_all_phases(self):
    phases = super().get_all_phases()
    # Collect all possible phases here, even those not used for a particular trajectory computation
    new_phases = {
        "gripper-open": 1,
        "pregrasp": 2,
        "grasp": 3,
        "gripper-close": 4,
        "lift": 5,
        "preplace": 6,
        "place": 7,
        "retreat": 8,
        "go_home": 9,
    }
    phases.update(new_phases)
    return phases
get_info
get_info() -> dict

Get additional information from the policy. Called after episode ended. This method can be overridden by subclasses to provide extra information about the policy's state. Must be json serializable.

Returns:

Type Description
dict

A dictionary containing additional information about the policy.

Source code in molmo_spaces/policy/base_policy.py
def get_info(self) -> dict:
    """
    Get additional information from the policy. Called after episode ended. This method can be
    overridden by subclasses to provide extra information about the policy's state.
    Must be json serializable.

    Returns:
        A dictionary containing additional information about the policy.
    """
    return {}
get_phase
get_phase() -> str
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def get_phase(self) -> str:
    if self.action_idx < len(self.action_primitives):
        act_prim = self.action_primitives[self.action_idx]
    else:
        act_prim = self.action_primitives[-1]
    return act_prim.get_current_phase()
reset
reset(reset_retries: bool = True)
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def reset(self, reset_retries: bool = True):
    if not self.ik_warmed_up:
        with Timer() as warmup_time:
            self.task.env.current_robot.parallel_kinematics.warmup_ik(
                self.policy_config.grasp_feasibility_batch_size
            )
        self.ik_warmed_up = True
        log.info(f"Warmed up parallel IK solver in {warmup_time.value:.3f}s")

    self.action_primitives = self._compute_trajectory()

    self.action_idx = 0
    if reset_retries:
        self.retry_count = 0

    self.target_poses = {}
    for action_primitive in self.action_primitives:
        if isinstance(action_primitive, TCPMoveSequence):
            for move_segment in action_primitive.move_segments:
                self.target_poses[move_segment.name] = move_segment.end_pose
        elif isinstance(action_primitive, JointMoveSequence):
            gripper_mg_id = self.robot_view.get_gripper_movegroup_ids()[0]
            kinematics = self.task.env.current_robot.kinematics
            for move_segment in action_primitive.move_segments:
                end_qpos = move_segment.end_qpos
                base_pose = self.robot_view.base.pose
                self.target_poses[move_segment.name] = kinematics.fk(end_qpos, base_pose)[
                    gripper_mg_id
                ]
pick_and_place_color_planner_policy

Classes:

Name Description
PickAndPlaceColorPlannerPolicy

Planner policy for pick and place color task.

PickAndPlaceColorPlannerPolicy
PickAndPlaceColorPlannerPolicy(config: MlSpacesExpConfig, task: BaseMujocoTask)

Bases: PickAndPlacePlannerPolicy

Planner policy for pick and place color task.

Methods:

Name Description
add_auxiliary_objects
check_feasible_ik
get_action
get_all_phases
get_info

Get additional information from the policy. Called after episode ended. This method can be

get_phase
reset

Attributes:

Name Type Description
action_idx
action_primitives
config
ik_warmed_up
planners
policy_config
retry_count
robot_view
sequential_ik_failures
target_poses
task
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def __init__(self, config: MlSpacesExpConfig, task: BaseMujocoTask) -> None:
    assert isinstance(config.policy_config, ObjectManipulationPlannerPolicyConfig)
    super().__init__(config, task)
    self.policy_config = config.policy_config
    self.robot_view = task.env.current_robot.robot_view

    self.action_primitives = []
    self.action_idx = 0
    self.retry_count = 0
    self.target_poses = {}
    self.ik_warmed_up = False
    self.sequential_ik_failures = 0
action_idx instance-attribute
action_idx = 0
action_primitives instance-attribute
action_primitives = []
config instance-attribute
config = config
ik_warmed_up instance-attribute
ik_warmed_up = False
planners property
planners
policy_config instance-attribute
policy_config = policy_config
retry_count instance-attribute
retry_count = 0
robot_view instance-attribute
robot_view = robot_view
sequential_ik_failures instance-attribute
sequential_ik_failures = 0
target_poses instance-attribute
target_poses = {}
task instance-attribute
task = task
add_auxiliary_objects staticmethod
add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
@staticmethod
def add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None:
    PlannerPolicy.add_auxiliary_objects(config, spec)
    policy_config = config.policy_config
    assert isinstance(policy_config, ObjectManipulationPlannerPolicyConfig)
    if policy_config.filter_colliding_grasps:
        add_grasp_collision_bodies(
            spec,
            policy_config.grasp_collision_batch_size,
            policy_config.grasp_width,
            policy_config.grasp_length,
            policy_config.grasp_height,
            np.array(policy_config.grasp_base_pos),
        )
check_feasible_ik
check_feasible_ik(pose: ndarray) -> bool
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def check_feasible_ik(self, pose: np.ndarray) -> bool:
    if pose.ndim > 2:
        assert pose.shape[1:] == (4, 4)
        batch_size = pose.shape[0]
        assert batch_size <= self.policy_config.grasp_feasibility_batch_size

        if batch_size == 0:
            return np.array([], dtype=bool)

        # pad to the batch size to avoid triggering recompilation of the IK solver
        if batch_size < self.policy_config.grasp_feasibility_batch_size:
            n_pad = self.policy_config.grasp_feasibility_batch_size - batch_size
            pose = np.concatenate([pose, np.broadcast_to(pose[-1:], (n_pad, 4, 4))])

        robot_view = self.task.env.current_robot.robot_view
        parallel_kinematics = self.task._env.robots[0].parallel_kinematics
        jp_dicts = parallel_kinematics.ik(
            pose,
            robot_view.get_qpos_dict(),
            robot_view.base.pose,
            rel_to_base=False,
            posture_weight=0.0,
        )
        return np.array([jp_dict is not None for jp_dict in jp_dicts[:batch_size]])
    else:
        assert pose.shape == (4, 4)
        robot_view = self.task.env.current_robot.robot_view
        kinematics = self.task.env.current_robot.kinematics
        gripper_mg_id = robot_view.get_gripper_movegroup_ids()[0]
        jp_dict = kinematics.ik(
            gripper_mg_id,
            pose,
            robot_view.move_group_ids(),
            robot_view.get_qpos_dict(),
            base_pose=robot_view.base.pose,
        )
        return jp_dict is not None
get_action
get_action(info: dict[str, Any]) -> dict[str, Any]
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def get_action(self, info: dict[str, Any]) -> dict[str, Any]:
    if self._check_for_failures():
        return self._handle_failure()

    # get the action from the current action primitive, absorbing 0-length segments as necessary
    while self.action_idx < len(self.action_primitives):  # guaranteed to terminate
        act_prim = self.action_primitives[self.action_idx]
        proceed = act_prim.execute()
        # cannot have a 0-length segment that doesn't proceed
        assert proceed or act_prim.duration > 0.0
        if proceed:
            self.action_idx += 1
        if not proceed or act_prim.duration > 0.0:
            break
    if self.action_idx < len(self.action_primitives):
        action = act_prim.get_current_action()
    else:
        action = self.action_primitives[-1].get_current_action()
        action["done"] = True
    return action
get_all_phases
get_all_phases()
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def get_all_phases(self):
    phases = super().get_all_phases()
    # Collect all possible phases here, even those not used for a particular trajectory computation
    new_phases = {
        "gripper-open": 1,
        "pregrasp": 2,
        "grasp": 3,
        "gripper-close": 4,
        "lift": 5,
        "preplace": 6,
        "place": 7,
        "retreat": 8,
        "go_home": 9,
    }
    phases.update(new_phases)
    return phases
get_info
get_info() -> dict

Get additional information from the policy. Called after episode ended. This method can be overridden by subclasses to provide extra information about the policy's state. Must be json serializable.

Returns:

Type Description
dict

A dictionary containing additional information about the policy.

Source code in molmo_spaces/policy/base_policy.py
def get_info(self) -> dict:
    """
    Get additional information from the policy. Called after episode ended. This method can be
    overridden by subclasses to provide extra information about the policy's state.
    Must be json serializable.

    Returns:
        A dictionary containing additional information about the policy.
    """
    return {}
get_phase
get_phase() -> str
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def get_phase(self) -> str:
    if self.action_idx < len(self.action_primitives):
        act_prim = self.action_primitives[self.action_idx]
    else:
        act_prim = self.action_primitives[-1]
    return act_prim.get_current_phase()
reset
reset(reset_retries: bool = True)
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def reset(self, reset_retries: bool = True):
    if not self.ik_warmed_up:
        with Timer() as warmup_time:
            self.task.env.current_robot.parallel_kinematics.warmup_ik(
                self.policy_config.grasp_feasibility_batch_size
            )
        self.ik_warmed_up = True
        log.info(f"Warmed up parallel IK solver in {warmup_time.value:.3f}s")

    self.action_primitives = self._compute_trajectory()

    self.action_idx = 0
    if reset_retries:
        self.retry_count = 0

    self.target_poses = {}
    for action_primitive in self.action_primitives:
        if isinstance(action_primitive, TCPMoveSequence):
            for move_segment in action_primitive.move_segments:
                self.target_poses[move_segment.name] = move_segment.end_pose
        elif isinstance(action_primitive, JointMoveSequence):
            gripper_mg_id = self.robot_view.get_gripper_movegroup_ids()[0]
            kinematics = self.task.env.current_robot.kinematics
            for move_segment in action_primitive.move_segments:
                end_qpos = move_segment.end_qpos
                base_pose = self.robot_view.base.pose
                self.target_poses[move_segment.name] = kinematics.fk(end_qpos, base_pose)[
                    gripper_mg_id
                ]
pick_and_place_next_to_planner_policy

Classes:

Name Description
PickAndPlaceNextToPlannerPolicy

Planner policy for pick and place next to task.

PickAndPlaceNextToPlannerPolicy
PickAndPlaceNextToPlannerPolicy(config: MlSpacesExpConfig, task: BaseMujocoTask)

Bases: PickAndPlacePlannerPolicy

Planner policy for pick and place next to task.

Methods:

Name Description
add_auxiliary_objects
check_feasible_ik
get_action
get_all_phases
get_info

Get additional information from the policy. Called after episode ended. This method can be

get_phase
reset

Attributes:

Name Type Description
action_idx
action_primitives
config
ik_warmed_up
planners
policy_config
retry_count
robot_view
sequential_ik_failures
target_poses
task
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def __init__(self, config: MlSpacesExpConfig, task: BaseMujocoTask) -> None:
    assert isinstance(config.policy_config, ObjectManipulationPlannerPolicyConfig)
    super().__init__(config, task)
    self.policy_config = config.policy_config
    self.robot_view = task.env.current_robot.robot_view

    self.action_primitives = []
    self.action_idx = 0
    self.retry_count = 0
    self.target_poses = {}
    self.ik_warmed_up = False
    self.sequential_ik_failures = 0
action_idx instance-attribute
action_idx = 0
action_primitives instance-attribute
action_primitives = []
config instance-attribute
config = config
ik_warmed_up instance-attribute
ik_warmed_up = False
planners property
planners
policy_config instance-attribute
policy_config = policy_config
retry_count instance-attribute
retry_count = 0
robot_view instance-attribute
robot_view = robot_view
sequential_ik_failures instance-attribute
sequential_ik_failures = 0
target_poses instance-attribute
target_poses = {}
task instance-attribute
task = task
add_auxiliary_objects staticmethod
add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
@staticmethod
def add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None:
    PlannerPolicy.add_auxiliary_objects(config, spec)
    policy_config = config.policy_config
    assert isinstance(policy_config, ObjectManipulationPlannerPolicyConfig)
    if policy_config.filter_colliding_grasps:
        add_grasp_collision_bodies(
            spec,
            policy_config.grasp_collision_batch_size,
            policy_config.grasp_width,
            policy_config.grasp_length,
            policy_config.grasp_height,
            np.array(policy_config.grasp_base_pos),
        )
check_feasible_ik
check_feasible_ik(pose: ndarray) -> bool
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def check_feasible_ik(self, pose: np.ndarray) -> bool:
    if pose.ndim > 2:
        assert pose.shape[1:] == (4, 4)
        batch_size = pose.shape[0]
        assert batch_size <= self.policy_config.grasp_feasibility_batch_size

        if batch_size == 0:
            return np.array([], dtype=bool)

        # pad to the batch size to avoid triggering recompilation of the IK solver
        if batch_size < self.policy_config.grasp_feasibility_batch_size:
            n_pad = self.policy_config.grasp_feasibility_batch_size - batch_size
            pose = np.concatenate([pose, np.broadcast_to(pose[-1:], (n_pad, 4, 4))])

        robot_view = self.task.env.current_robot.robot_view
        parallel_kinematics = self.task._env.robots[0].parallel_kinematics
        jp_dicts = parallel_kinematics.ik(
            pose,
            robot_view.get_qpos_dict(),
            robot_view.base.pose,
            rel_to_base=False,
            posture_weight=0.0,
        )
        return np.array([jp_dict is not None for jp_dict in jp_dicts[:batch_size]])
    else:
        assert pose.shape == (4, 4)
        robot_view = self.task.env.current_robot.robot_view
        kinematics = self.task.env.current_robot.kinematics
        gripper_mg_id = robot_view.get_gripper_movegroup_ids()[0]
        jp_dict = kinematics.ik(
            gripper_mg_id,
            pose,
            robot_view.move_group_ids(),
            robot_view.get_qpos_dict(),
            base_pose=robot_view.base.pose,
        )
        return jp_dict is not None
get_action
get_action(info: dict[str, Any]) -> dict[str, Any]
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def get_action(self, info: dict[str, Any]) -> dict[str, Any]:
    if self._check_for_failures():
        return self._handle_failure()

    # get the action from the current action primitive, absorbing 0-length segments as necessary
    while self.action_idx < len(self.action_primitives):  # guaranteed to terminate
        act_prim = self.action_primitives[self.action_idx]
        proceed = act_prim.execute()
        # cannot have a 0-length segment that doesn't proceed
        assert proceed or act_prim.duration > 0.0
        if proceed:
            self.action_idx += 1
        if not proceed or act_prim.duration > 0.0:
            break
    if self.action_idx < len(self.action_primitives):
        action = act_prim.get_current_action()
    else:
        action = self.action_primitives[-1].get_current_action()
        action["done"] = True
    return action
get_all_phases
get_all_phases()
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def get_all_phases(self):
    phases = super().get_all_phases()
    # Collect all possible phases here, even those not used for a particular trajectory computation
    new_phases = {
        "gripper-open": 1,
        "pregrasp": 2,
        "grasp": 3,
        "gripper-close": 4,
        "lift": 5,
        "preplace": 6,
        "place": 7,
        "retreat": 8,
        "go_home": 9,
    }
    phases.update(new_phases)
    return phases
get_info
get_info() -> dict

Get additional information from the policy. Called after episode ended. This method can be overridden by subclasses to provide extra information about the policy's state. Must be json serializable.

Returns:

Type Description
dict

A dictionary containing additional information about the policy.

Source code in molmo_spaces/policy/base_policy.py
def get_info(self) -> dict:
    """
    Get additional information from the policy. Called after episode ended. This method can be
    overridden by subclasses to provide extra information about the policy's state.
    Must be json serializable.

    Returns:
        A dictionary containing additional information about the policy.
    """
    return {}
get_phase
get_phase() -> str
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def get_phase(self) -> str:
    if self.action_idx < len(self.action_primitives):
        act_prim = self.action_primitives[self.action_idx]
    else:
        act_prim = self.action_primitives[-1]
    return act_prim.get_current_phase()
reset
reset(reset_retries: bool = True)
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def reset(self, reset_retries: bool = True):
    if not self.ik_warmed_up:
        with Timer() as warmup_time:
            self.task.env.current_robot.parallel_kinematics.warmup_ik(
                self.policy_config.grasp_feasibility_batch_size
            )
        self.ik_warmed_up = True
        log.info(f"Warmed up parallel IK solver in {warmup_time.value:.3f}s")

    self.action_primitives = self._compute_trajectory()

    self.action_idx = 0
    if reset_retries:
        self.retry_count = 0

    self.target_poses = {}
    for action_primitive in self.action_primitives:
        if isinstance(action_primitive, TCPMoveSequence):
            for move_segment in action_primitive.move_segments:
                self.target_poses[move_segment.name] = move_segment.end_pose
        elif isinstance(action_primitive, JointMoveSequence):
            gripper_mg_id = self.robot_view.get_gripper_movegroup_ids()[0]
            kinematics = self.task.env.current_robot.kinematics
            for move_segment in action_primitive.move_segments:
                end_qpos = move_segment.end_qpos
                base_pose = self.robot_view.base.pose
                self.target_poses[move_segment.name] = kinematics.fk(end_qpos, base_pose)[
                    gripper_mg_id
                ]
pick_and_place_planner_policy

Classes:

Name Description
PickAndPlacePlannerPolicy

Attributes:

Name Type Description
log
log module-attribute
log = getLogger(__name__)
PickAndPlacePlannerPolicy
PickAndPlacePlannerPolicy(config: MlSpacesExpConfig, task: BaseMujocoTask)

Bases: BaseObjectManipulationPlannerPolicy

Methods:

Name Description
add_auxiliary_objects
check_feasible_ik
get_action
get_all_phases
get_info

Get additional information from the policy. Called after episode ended. This method can be

get_phase
reset

Attributes:

Name Type Description
action_idx
action_primitives
config
ik_warmed_up
planners
policy_config
retry_count
robot_view
sequential_ik_failures
target_poses
task
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def __init__(self, config: MlSpacesExpConfig, task: BaseMujocoTask) -> None:
    assert isinstance(config.policy_config, ObjectManipulationPlannerPolicyConfig)
    super().__init__(config, task)
    self.policy_config = config.policy_config
    self.robot_view = task.env.current_robot.robot_view

    self.action_primitives = []
    self.action_idx = 0
    self.retry_count = 0
    self.target_poses = {}
    self.ik_warmed_up = False
    self.sequential_ik_failures = 0
action_idx instance-attribute
action_idx = 0
action_primitives instance-attribute
action_primitives = []
config instance-attribute
config = config
ik_warmed_up instance-attribute
ik_warmed_up = False
planners property
planners
policy_config instance-attribute
policy_config = policy_config
retry_count instance-attribute
retry_count = 0
robot_view instance-attribute
robot_view = robot_view
sequential_ik_failures instance-attribute
sequential_ik_failures = 0
target_poses instance-attribute
target_poses = {}
task instance-attribute
task = task
add_auxiliary_objects staticmethod
add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
@staticmethod
def add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None:
    PlannerPolicy.add_auxiliary_objects(config, spec)
    policy_config = config.policy_config
    assert isinstance(policy_config, ObjectManipulationPlannerPolicyConfig)
    if policy_config.filter_colliding_grasps:
        add_grasp_collision_bodies(
            spec,
            policy_config.grasp_collision_batch_size,
            policy_config.grasp_width,
            policy_config.grasp_length,
            policy_config.grasp_height,
            np.array(policy_config.grasp_base_pos),
        )
check_feasible_ik
check_feasible_ik(pose: ndarray) -> bool
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def check_feasible_ik(self, pose: np.ndarray) -> bool:
    if pose.ndim > 2:
        assert pose.shape[1:] == (4, 4)
        batch_size = pose.shape[0]
        assert batch_size <= self.policy_config.grasp_feasibility_batch_size

        if batch_size == 0:
            return np.array([], dtype=bool)

        # pad to the batch size to avoid triggering recompilation of the IK solver
        if batch_size < self.policy_config.grasp_feasibility_batch_size:
            n_pad = self.policy_config.grasp_feasibility_batch_size - batch_size
            pose = np.concatenate([pose, np.broadcast_to(pose[-1:], (n_pad, 4, 4))])

        robot_view = self.task.env.current_robot.robot_view
        parallel_kinematics = self.task._env.robots[0].parallel_kinematics
        jp_dicts = parallel_kinematics.ik(
            pose,
            robot_view.get_qpos_dict(),
            robot_view.base.pose,
            rel_to_base=False,
            posture_weight=0.0,
        )
        return np.array([jp_dict is not None for jp_dict in jp_dicts[:batch_size]])
    else:
        assert pose.shape == (4, 4)
        robot_view = self.task.env.current_robot.robot_view
        kinematics = self.task.env.current_robot.kinematics
        gripper_mg_id = robot_view.get_gripper_movegroup_ids()[0]
        jp_dict = kinematics.ik(
            gripper_mg_id,
            pose,
            robot_view.move_group_ids(),
            robot_view.get_qpos_dict(),
            base_pose=robot_view.base.pose,
        )
        return jp_dict is not None
get_action
get_action(info: dict[str, Any]) -> dict[str, Any]
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def get_action(self, info: dict[str, Any]) -> dict[str, Any]:
    if self._check_for_failures():
        return self._handle_failure()

    # get the action from the current action primitive, absorbing 0-length segments as necessary
    while self.action_idx < len(self.action_primitives):  # guaranteed to terminate
        act_prim = self.action_primitives[self.action_idx]
        proceed = act_prim.execute()
        # cannot have a 0-length segment that doesn't proceed
        assert proceed or act_prim.duration > 0.0
        if proceed:
            self.action_idx += 1
        if not proceed or act_prim.duration > 0.0:
            break
    if self.action_idx < len(self.action_primitives):
        action = act_prim.get_current_action()
    else:
        action = self.action_primitives[-1].get_current_action()
        action["done"] = True
    return action
get_all_phases
get_all_phases()
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def get_all_phases(self):
    phases = super().get_all_phases()
    # Collect all possible phases here, even those not used for a particular trajectory computation
    new_phases = {
        "gripper-open": 1,
        "pregrasp": 2,
        "grasp": 3,
        "gripper-close": 4,
        "lift": 5,
        "preplace": 6,
        "place": 7,
        "retreat": 8,
        "go_home": 9,
    }
    phases.update(new_phases)
    return phases
get_info
get_info() -> dict

Get additional information from the policy. Called after episode ended. This method can be overridden by subclasses to provide extra information about the policy's state. Must be json serializable.

Returns:

Type Description
dict

A dictionary containing additional information about the policy.

Source code in molmo_spaces/policy/base_policy.py
def get_info(self) -> dict:
    """
    Get additional information from the policy. Called after episode ended. This method can be
    overridden by subclasses to provide extra information about the policy's state.
    Must be json serializable.

    Returns:
        A dictionary containing additional information about the policy.
    """
    return {}
get_phase
get_phase() -> str
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def get_phase(self) -> str:
    if self.action_idx < len(self.action_primitives):
        act_prim = self.action_primitives[self.action_idx]
    else:
        act_prim = self.action_primitives[-1]
    return act_prim.get_current_phase()
reset
reset(reset_retries: bool = True)
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def reset(self, reset_retries: bool = True):
    if not self.ik_warmed_up:
        with Timer() as warmup_time:
            self.task.env.current_robot.parallel_kinematics.warmup_ik(
                self.policy_config.grasp_feasibility_batch_size
            )
        self.ik_warmed_up = True
        log.info(f"Warmed up parallel IK solver in {warmup_time.value:.3f}s")

    self.action_primitives = self._compute_trajectory()

    self.action_idx = 0
    if reset_retries:
        self.retry_count = 0

    self.target_poses = {}
    for action_primitive in self.action_primitives:
        if isinstance(action_primitive, TCPMoveSequence):
            for move_segment in action_primitive.move_segments:
                self.target_poses[move_segment.name] = move_segment.end_pose
        elif isinstance(action_primitive, JointMoveSequence):
            gripper_mg_id = self.robot_view.get_gripper_movegroup_ids()[0]
            kinematics = self.task.env.current_robot.kinematics
            for move_segment in action_primitive.move_segments:
                end_qpos = move_segment.end_qpos
                base_pose = self.robot_view.base.pose
                self.target_poses[move_segment.name] = kinematics.fk(end_qpos, base_pose)[
                    gripper_mg_id
                ]
pick_planner_policy

Classes:

Name Description
PickPlannerPolicy

Attributes:

Name Type Description
log
log module-attribute
log = getLogger(__name__)
PickPlannerPolicy
PickPlannerPolicy(config: MlSpacesExpConfig, task: BaseMujocoTask)

Bases: BaseObjectManipulationPlannerPolicy

Methods:

Name Description
add_auxiliary_objects
check_feasible_ik
get_action
get_all_phases
get_info

Get additional information from the policy. Called after episode ended. This method can be

get_phase
reset

Attributes:

Name Type Description
action_idx
action_primitives
config
ik_warmed_up
planners
policy_config
retry_count
robot_view
sequential_ik_failures
target_poses
task
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def __init__(self, config: MlSpacesExpConfig, task: BaseMujocoTask) -> None:
    assert isinstance(config.policy_config, ObjectManipulationPlannerPolicyConfig)
    super().__init__(config, task)
    self.policy_config = config.policy_config
    self.robot_view = task.env.current_robot.robot_view

    self.action_primitives = []
    self.action_idx = 0
    self.retry_count = 0
    self.target_poses = {}
    self.ik_warmed_up = False
    self.sequential_ik_failures = 0
action_idx instance-attribute
action_idx = 0
action_primitives instance-attribute
action_primitives = []
config instance-attribute
config = config
ik_warmed_up instance-attribute
ik_warmed_up = False
planners property
planners
policy_config instance-attribute
policy_config = policy_config
retry_count instance-attribute
retry_count = 0
robot_view instance-attribute
robot_view = robot_view
sequential_ik_failures instance-attribute
sequential_ik_failures = 0
target_poses instance-attribute
target_poses = {}
task instance-attribute
task = task
add_auxiliary_objects staticmethod
add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
@staticmethod
def add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None:
    PlannerPolicy.add_auxiliary_objects(config, spec)
    policy_config = config.policy_config
    assert isinstance(policy_config, ObjectManipulationPlannerPolicyConfig)
    if policy_config.filter_colliding_grasps:
        add_grasp_collision_bodies(
            spec,
            policy_config.grasp_collision_batch_size,
            policy_config.grasp_width,
            policy_config.grasp_length,
            policy_config.grasp_height,
            np.array(policy_config.grasp_base_pos),
        )
check_feasible_ik
check_feasible_ik(pose: ndarray) -> bool
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def check_feasible_ik(self, pose: np.ndarray) -> bool:
    if pose.ndim > 2:
        assert pose.shape[1:] == (4, 4)
        batch_size = pose.shape[0]
        assert batch_size <= self.policy_config.grasp_feasibility_batch_size

        if batch_size == 0:
            return np.array([], dtype=bool)

        # pad to the batch size to avoid triggering recompilation of the IK solver
        if batch_size < self.policy_config.grasp_feasibility_batch_size:
            n_pad = self.policy_config.grasp_feasibility_batch_size - batch_size
            pose = np.concatenate([pose, np.broadcast_to(pose[-1:], (n_pad, 4, 4))])

        robot_view = self.task.env.current_robot.robot_view
        parallel_kinematics = self.task._env.robots[0].parallel_kinematics
        jp_dicts = parallel_kinematics.ik(
            pose,
            robot_view.get_qpos_dict(),
            robot_view.base.pose,
            rel_to_base=False,
            posture_weight=0.0,
        )
        return np.array([jp_dict is not None for jp_dict in jp_dicts[:batch_size]])
    else:
        assert pose.shape == (4, 4)
        robot_view = self.task.env.current_robot.robot_view
        kinematics = self.task.env.current_robot.kinematics
        gripper_mg_id = robot_view.get_gripper_movegroup_ids()[0]
        jp_dict = kinematics.ik(
            gripper_mg_id,
            pose,
            robot_view.move_group_ids(),
            robot_view.get_qpos_dict(),
            base_pose=robot_view.base.pose,
        )
        return jp_dict is not None
get_action
get_action(info: dict[str, Any]) -> dict[str, Any]
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def get_action(self, info: dict[str, Any]) -> dict[str, Any]:
    if self._check_for_failures():
        return self._handle_failure()

    # get the action from the current action primitive, absorbing 0-length segments as necessary
    while self.action_idx < len(self.action_primitives):  # guaranteed to terminate
        act_prim = self.action_primitives[self.action_idx]
        proceed = act_prim.execute()
        # cannot have a 0-length segment that doesn't proceed
        assert proceed or act_prim.duration > 0.0
        if proceed:
            self.action_idx += 1
        if not proceed or act_prim.duration > 0.0:
            break
    if self.action_idx < len(self.action_primitives):
        action = act_prim.get_current_action()
    else:
        action = self.action_primitives[-1].get_current_action()
        action["done"] = True
    return action
get_all_phases
get_all_phases()
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def get_all_phases(self):
    phases = super().get_all_phases()
    # Collect all possible phases here, even those not used for a particular trajectory computation
    new_phases = {
        "gripper-open": 1,
        "pregrasp": 2,
        "grasp": 3,
        "gripper-close": 4,
        "lift": 5,
        "preplace": 6,
        "place": 7,
        "retreat": 8,
        "go_home": 9,
    }
    phases.update(new_phases)
    return phases
get_info
get_info() -> dict

Get additional information from the policy. Called after episode ended. This method can be overridden by subclasses to provide extra information about the policy's state. Must be json serializable.

Returns:

Type Description
dict

A dictionary containing additional information about the policy.

Source code in molmo_spaces/policy/base_policy.py
def get_info(self) -> dict:
    """
    Get additional information from the policy. Called after episode ended. This method can be
    overridden by subclasses to provide extra information about the policy's state.
    Must be json serializable.

    Returns:
        A dictionary containing additional information about the policy.
    """
    return {}
get_phase
get_phase() -> str
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def get_phase(self) -> str:
    if self.action_idx < len(self.action_primitives):
        act_prim = self.action_primitives[self.action_idx]
    else:
        act_prim = self.action_primitives[-1]
    return act_prim.get_current_phase()
reset
reset(reset_retries: bool = True)
Source code in molmo_spaces/policy/solvers/object_manipulation/base_object_manipulation_planner_policy.py
def reset(self, reset_retries: bool = True):
    if not self.ik_warmed_up:
        with Timer() as warmup_time:
            self.task.env.current_robot.parallel_kinematics.warmup_ik(
                self.policy_config.grasp_feasibility_batch_size
            )
        self.ik_warmed_up = True
        log.info(f"Warmed up parallel IK solver in {warmup_time.value:.3f}s")

    self.action_primitives = self._compute_trajectory()

    self.action_idx = 0
    if reset_retries:
        self.retry_count = 0

    self.target_poses = {}
    for action_primitive in self.action_primitives:
        if isinstance(action_primitive, TCPMoveSequence):
            for move_segment in action_primitive.move_segments:
                self.target_poses[move_segment.name] = move_segment.end_pose
        elif isinstance(action_primitive, JointMoveSequence):
            gripper_mg_id = self.robot_view.get_gripper_movegroup_ids()[0]
            kinematics = self.task.env.current_robot.kinematics
            for move_segment in action_primitive.move_segments:
                end_qpos = move_segment.end_qpos
                base_pose = self.robot_view.base.pose
                self.target_poses[move_segment.name] = kinematics.fk(end_qpos, base_pose)[
                    gripper_mg_id
                ]

opening_solver

Classes:

Name Description
DoorOpeningPhase

Enumeration of door opening phases.

DoorOpeningPlannerPolicy

Planner policy for RBY1 door opening tasks using motion planning.

Attributes:

Name Type Description
log
log module-attribute
log = getLogger(__name__)
DoorOpeningPhase

Bases: IntEnum

Enumeration of door opening phases.

Attributes:

Name Type Description
COMPLETE
GRASP_HANDLE
NAVIGATE_TO_DOOR
OPEN_DOOR
REACH_GRASP
REACH_PRE_GRASP
RECOVERY
RELEASE_HANDLE
COMPLETE class-attribute instance-attribute
COMPLETE = 6
GRASP_HANDLE class-attribute instance-attribute
GRASP_HANDLE = 3
NAVIGATE_TO_DOOR class-attribute instance-attribute
NAVIGATE_TO_DOOR = 0
OPEN_DOOR class-attribute instance-attribute
OPEN_DOOR = 4
REACH_GRASP class-attribute instance-attribute
REACH_GRASP = 2
REACH_PRE_GRASP class-attribute instance-attribute
REACH_PRE_GRASP = 1
RECOVERY class-attribute instance-attribute
RECOVERY = 7
RELEASE_HANDLE class-attribute instance-attribute
RELEASE_HANDLE = 5
DoorOpeningPlannerPolicy
DoorOpeningPlannerPolicy(config: MlSpacesExpConfig, task: DoorOpeningTask | None = None)

Bases: CuroboPlannerPolicy

Planner policy for RBY1 door opening tasks using motion planning.

Inherits common motion planning functionality from CuroboPlannerPolicy.

Methods:

Name Description
add_auxiliary_objects

Add auxiliary objects to the scene that might be required for the policy.

batch_plan_trajectory

Plan trajectory using batch motion planning.

clip_to_velocity_constraint

Clip action to respect velocity constraints.

get_action

Get the next action based on current task state and phase.

get_all_phases

Return a dictionary mapping all phase names to their enum values.

get_info

Get additional information from the policy. Called after episode ended. This method can be

get_look_at_action
get_phase

Return the current phase name as a string.

reset

Reset the policy state.

select_arm

Select which arm to use based on distance to pickup object.

select_arm_for_opening

Select the arm for opening the door.

solve_ik

Solve inverse kinematics for a target pose and create interpolated trajectory.

visualize_world_config_mesh

Visualize the world configuration as a mesh file.

Attributes:

Name Type Description
arm_end_idx int
arm_side
arm_start_idx int
articulate_deltas
config
curr_articulation_step
current_gripper_command dict[str, float]
current_phase
first_pushing_articulation_deltas
grasping_timesteps
is_done bool

Property to expose completion state for task checking.

joint_position_tolerance
left_motion_planner
max_steps_per_waypoint
num_steps_for_articulation
phase_name str

Property to expose current phase name for task tracking.

phase_value int

Property to expose current phase name for task tracking.

planned_trajectory list[list[float]] | None
planner CuroboPlanner | None
planner_joint_ranges dict[str, tuple[int, int]]
planners dict[str, CuroboPlanner]
planning_failures
pre_grasp_distance
profiler
recovery_step_count
retry_count int
right_motion_planner
steps_spent_in_waypoint int
task
trajectory_index int
Source code in molmo_spaces/policy/solvers/opening_solver.py
def __init__(
    self,
    config: MlSpacesExpConfig,
    task: DoorOpeningTask | None = None,
) -> None:
    super().__init__(config, task)
    self.task = task  # Added to help IDE typing

    # This policy uses both planners (unlike pick-and-place which lazy-loads one)
    self.left_motion_planner = CuroboPlanner(
        config=config.policy_config.left_curobo_planner_config
    )
    self.right_motion_planner = CuroboPlanner(
        config=config.policy_config.right_curobo_planner_config
    )

    # Planner policy state
    self.arm_side = None
    self.current_phase = DoorOpeningPhase.REACH_PRE_GRASP
    self.planning_failures = 0

    # Trajectory execution parameters
    self.max_steps_per_waypoint = self.config.policy_config.max_steps_per_waypoint
    self.joint_position_tolerance = self.config.policy_config.joint_position_tolerance

    # Door opening parameters
    self.pre_grasp_distance = self.config.policy_config.pre_grasp_distance
    self.articulate_deltas = self.config.policy_config.articulation_deltas
    self.first_pushing_articulation_deltas = (
        self.config.policy_config.first_pushing_articulation_deltas
    )

    # Phase-specific state
    self.grasping_timesteps = 0
    self.recovery_step_count = 0

    # Articulation tracking
    self.curr_articulation_step = 0
    self.num_steps_for_articulation = len(self.articulate_deltas)
arm_end_idx instance-attribute
arm_end_idx: int = 0
arm_side instance-attribute
arm_side = None
arm_start_idx instance-attribute
arm_start_idx: int = 0
articulate_deltas instance-attribute
articulate_deltas = articulation_deltas
config instance-attribute
config = config
curr_articulation_step instance-attribute
curr_articulation_step = 0
current_gripper_command instance-attribute
current_gripper_command: dict[str, float] = {}
current_phase instance-attribute
current_phase = REACH_PRE_GRASP
first_pushing_articulation_deltas instance-attribute
first_pushing_articulation_deltas = first_pushing_articulation_deltas
grasping_timesteps instance-attribute
grasping_timesteps = 0
is_done property
is_done: bool

Property to expose completion state for task checking.

joint_position_tolerance instance-attribute
joint_position_tolerance = joint_position_tolerance
left_motion_planner instance-attribute
left_motion_planner = CuroboPlanner(config=left_curobo_planner_config)
max_steps_per_waypoint instance-attribute
max_steps_per_waypoint = max_steps_per_waypoint
num_steps_for_articulation instance-attribute
num_steps_for_articulation = len(articulate_deltas)
phase_name property
phase_name: str

Property to expose current phase name for task tracking.

phase_value property
phase_value: int

Property to expose current phase name for task tracking.

planned_trajectory instance-attribute
planned_trajectory: list[list[float]] | None = None
planner instance-attribute
planner: CuroboPlanner | None = None
planner_joint_ranges instance-attribute
planner_joint_ranges: dict[str, tuple[int, int]] = {}
planners property
planners: dict[str, CuroboPlanner]
planning_failures instance-attribute
planning_failures = 0
pre_grasp_distance instance-attribute
pre_grasp_distance = pre_grasp_distance
profiler instance-attribute
profiler = Profiler()
recovery_step_count instance-attribute
recovery_step_count = 0
retry_count instance-attribute
retry_count: int = 0
right_motion_planner instance-attribute
right_motion_planner = CuroboPlanner(config=right_curobo_planner_config)
steps_spent_in_waypoint instance-attribute
steps_spent_in_waypoint: int = 0
task instance-attribute
task = task
trajectory_index instance-attribute
trajectory_index: int = 0
add_auxiliary_objects staticmethod
add_auxiliary_objects(config: MlSpacesExpConfig, spec: MjSpec) -> None

Add auxiliary objects to the scene that might be required for the policy. Args: config: The configuration for the policy. spec: The experiment configuration.

Source code in molmo_spaces/policy/base_policy.py
@staticmethod
def add_auxiliary_objects(config: "MlSpacesExpConfig", spec: MjSpec) -> None:
    """
    Add auxiliary objects to the scene that might be required for the policy.
    Args:
        config: The configuration for the policy.
        spec: The experiment configuration.
    """
    return None
batch_plan_trajectory
batch_plan_trajectory() -> None

Plan trajectory using batch motion planning.

Uses the current phase to determine goal poses and plans trajectories in batches for efficiency. Sets self.planned_trajectory to the best trajectory found.

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py
def batch_plan_trajectory(self) -> None:
    """Plan trajectory using batch motion planning.

    Uses the current phase to determine goal poses and plans trajectories
    in batches for efficiency. Sets self.planned_trajectory to the best
    trajectory found.
    """
    init_config = np.concatenate(
        [
            np.zeros(3),
            self.task.env.current_robot.robot_view.get_move_group(
                f"{self.arm_side}_arm"
            ).joint_pos,
        ]
    )

    # Setup collision avoidance if enabled
    if getattr(self.config.policy_config, "enable_collision_avoidance", False):
        if hasattr(self, "_setup_collision_avoidance_config"):
            self._setup_collision_avoidance_config()

    # Get goal poses based on current phase (to be provided by subclass)
    goal_poses = self._get_batch_goal_poses()
    if goal_poses is None or len(goal_poses) == 0:
        log.warning("[BATCH PLAN] No goal poses available")
        return

    total = goal_poses.shape[0]
    batch_size = getattr(self.config.policy_config, "batch_size", 8)
    max_batches = getattr(self.config.policy_config, "max_batch_plan_attempts", 4)
    num_poses = min(max_batches * batch_size, total)
    num_batches = (num_poses + batch_size - 1) // batch_size

    all_successful_trajectories = []
    current_phase = getattr(self, "current_phase", "unknown")

    for batch_start in range(0, num_poses, batch_size):
        batch_end = min(batch_start + batch_size, num_poses)
        batch = goal_poses[batch_start:batch_end]

        if hasattr(self, "_show_poses"):
            self._show_poses(batch, style="tcp")
        if self.task.viewer:
            self.task.viewer.sync()

        log.info(
            f"Processing batch {batch_start // batch_size + 1}/{num_batches}: "
            f"poses {batch_start}-{batch_end - 1}"
        )

        # Transform poses to base frame and convert to 7D format
        batch_base_frame_7d = []
        for pose in batch:
            pose_base = self._target_pose_to_base_frame(pose)
            batch_base_frame_7d.append(pose_base.tolist())

        # Prepare batch inputs for planner
        batch_len = len(batch_base_frame_7d)
        start_states = [init_config.tolist()] * batch_len

        # Plan batch
        log.info(
            f"Planning batch of {batch_len} {current_phase} poses with {self.arm_side} arm"
        )
        result = self.planner.plan_batch(
            start_states,
            batch_base_frame_7d,
            verbose=False,
        )

        # Check for successes
        successes = result.success.cpu().numpy()
        log.info(
            f"{self.arm_side.capitalize()} arm: {np.sum(successes)}/{batch_len} "
            f"{current_phase} planning successes"
        )

        # Process successful trajectories
        if np.any(successes):
            optimized_plan = result.optimized_plan
            position = optimized_plan.position
            if position.ndim == 2:
                position = position.unsqueeze(0)

            for i in range(batch_len):
                if not successes[i]:
                    continue

                trajectory = []
                for t in range(position.shape[1]):
                    waypoint = position[i, t].cpu().tolist()
                    trajectory.append(waypoint)

                self._transform_traj_to_world_frame(trajectory)
                all_successful_trajectories.append(trajectory)

        if all_successful_trajectories:
            log.info(
                f"Found {len(all_successful_trajectories)} successful trajectories, "
                "skipping remaining batches"
            )
            break

    if all_successful_trajectories:
        self.planned_trajectory = self._select_best_trajectory(all_successful_trajectories)
    else:
        log.warning("[BATCH PLAN] No successful trajectories found across all batches")
clip_to_velocity_constraint
clip_to_velocity_constraint(action: dict[str, Any]) -> dict[str, Any]

Clip action to respect velocity constraints.

Parameters:

Name Type Description Default
action dict[str, Any]

Dictionary of commanded joint positions by move group.

required

Returns:

Type Description
dict[str, Any]

Clipped action dictionary.

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py
def clip_to_velocity_constraint(self, action: dict[str, Any]) -> dict[str, Any]:
    """Clip action to respect velocity constraints.

    Args:
        action: Dictionary of commanded joint positions by move group.

    Returns:
        Clipped action dictionary.
    """
    velocity_constraints = getattr(self.config.policy_config, "velocity_constraints", {})
    clipped_action = action.copy()

    for move_group, commanded_action in action.items():
        if move_group in velocity_constraints:
            max_velocity_rad_per_s = velocity_constraints[move_group]
            move_group_view = self.task.env.robots[0].robot_view.get_move_group(move_group)
            move_group_joint_pos = move_group_view.joint_pos.copy()

            # Calculate difference, handling angular wraparound for base theta
            diff = commanded_action - move_group_joint_pos
            if move_group == "base":
                # Normalize theta difference to [-π, π]
                diff[2] = np.arctan2(np.sin(diff[2]), np.cos(diff[2]))

            velocity_rad_per_s = diff * 10.0

            # Clip velocity and recalculate commanded action
            clipped_velocity = np.clip(
                velocity_rad_per_s, -max_velocity_rad_per_s, max_velocity_rad_per_s
            )
            commanded_action = move_group_joint_pos + clipped_velocity / 10.0

            clipped_action[move_group] = commanded_action

    return clipped_action
get_action
get_action(info: dict[str, Any]) -> dict[str, Any]

Get the next action based on current task state and phase.

Source code in molmo_spaces/policy/solvers/opening_solver.py
def get_action(self, info: dict[str, Any]) -> dict[str, Any]:
    """Get the next action based on current task state and phase."""
    self._print_distance_to_handle()

    if self.arm_side is None:
        self.arm_side = self.select_arm_for_opening()
        if self.config.policy_config.verbose:
            log.info(f"[ARM] Selected {self.arm_side} arm for door opening")

    action_cmd = {}
    if self.current_phase == DoorOpeningPhase.NAVIGATE_TO_DOOR:
        action_cmd = self._execute_navigate_phase()
    elif self.current_phase == DoorOpeningPhase.REACH_PRE_GRASP:
        action_cmd = self._execute_reach_pre_grasp_phase()
    elif self.current_phase == DoorOpeningPhase.REACH_GRASP:
        action_cmd = self._execute_reach_grasp_phase()
    elif self.current_phase == DoorOpeningPhase.GRASP_HANDLE:
        action_cmd = self._execute_grasp_handle_phase()
    elif self.current_phase == DoorOpeningPhase.OPEN_DOOR:
        action_cmd = self._execute_open_phase()
    elif self.current_phase == DoorOpeningPhase.RELEASE_HANDLE:
        action_cmd = self._execute_release_phase()
    elif self.current_phase == DoorOpeningPhase.COMPLETE:
        action_cmd = self._execute_complete_phase()
    elif self.current_phase == DoorOpeningPhase.RECOVERY:
        action_cmd = self._execute_recovery_phase()
    else:
        raise ValueError(f"Unknown phase: {self.current_phase}")

    action_cmd.update(self.get_look_at_action())
    action_cmd = self.clip_to_velocity_constraint(action_cmd)

    return action_cmd
get_all_phases
get_all_phases() -> dict[str, int]

Return a dictionary mapping all phase names to their enum values.

Source code in molmo_spaces/policy/solvers/opening_solver.py
def get_all_phases(self) -> dict[str, int]:
    """Return a dictionary mapping all phase names to their enum values."""
    return {phase.name: phase.value for phase in DoorOpeningPhase}
get_info
get_info() -> dict

Get additional information from the policy. Called after episode ended. This method can be overridden by subclasses to provide extra information about the policy's state. Must be json serializable.

Returns:

Type Description
dict

A dictionary containing additional information about the policy.

Source code in molmo_spaces/policy/base_policy.py
def get_info(self) -> dict:
    """
    Get additional information from the policy. Called after episode ended. This method can be
    overridden by subclasses to provide extra information about the policy's state.
    Must be json serializable.

    Returns:
        A dictionary containing additional information about the policy.
    """
    return {}
get_look_at_action
get_look_at_action() -> dict[str, Any]
Source code in molmo_spaces/policy/solvers/opening_solver.py
def get_look_at_action(self) -> dict[str, Any]:
    target_pos = self.task.get_door_handle_position()
    return {"head": self.task.get_target_head_pose(target_pos)}
get_phase
get_phase() -> str

Return the current phase name as a string.

Source code in molmo_spaces/policy/solvers/opening_solver.py
def get_phase(self) -> str:
    """Return the current phase name as a string."""
    return self.current_phase.name
reset
reset() -> None

Reset the policy state.

Source code in molmo_spaces/policy/solvers/opening_solver.py
def reset(self) -> None:
    """Reset the policy state."""
    super().reset()
    log.debug("[RESET] Resetting DoorOpeningPlannerPolicy state")
    self.arm_side = None
    self.current_phase = DoorOpeningPhase.REACH_PRE_GRASP
    self.grasping_timesteps = 0
    self.planning_failures = 0
    self.curr_articulation_step = 0
    self.recovery_step_count = 0

    # Reset motion planners
    self.left_motion_planner.reset()
    self.right_motion_planner.reset()
select_arm
select_arm() -> None

Select which arm to use based on distance to pickup object.

Also instantiates the motion planner for the selected arm. This lazy initialization saves ~11GB of GPU memory by only loading one arm's planner.

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py
def select_arm(self) -> None:
    """Select which arm to use based on distance to pickup object.

    Also instantiates the motion planner for the selected arm.
    This lazy initialization saves ~11GB of GPU memory by only loading one arm's planner.
    """
    task_config = self.config.task_config
    om = self.task.env.object_managers[self.task.env.current_batch_index]
    pickup_obj = om.get_object_by_name(task_config.pickup_obj_name)
    pickup_obj_pos = pickup_obj.position

    left_tcp_pose = self.task.env.current_robot.robot_view.get_move_group(
        "left_gripper"
    ).leaf_frame_to_world
    right_tcp_pose = self.task.env.current_robot.robot_view.get_move_group(
        "right_gripper"
    ).leaf_frame_to_world

    left_tcp_pos = left_tcp_pose[:3, 3]
    right_tcp_pos = right_tcp_pose[:3, 3]

    # Compute distances
    left_dist = np.linalg.norm(left_tcp_pos - pickup_obj_pos)
    right_dist = np.linalg.norm(right_tcp_pos - pickup_obj_pos)

    selected_arm = "left" if left_dist < right_dist else "right"
    log.info(
        f"Selected {selected_arm} arm (left dist: {left_dist:.3f}m, right dist: {right_dist:.3f}m)"
    )

    self.arm_side = selected_arm

    # Instantiate the planner for the selected arm only
    log.info(f"Instantiating motion planner for {selected_arm} arm")
    if selected_arm == "left":
        self.planner = CuroboPlanner(
            config=self.config.policy_config.left_curobo_planner_config
        )
        self.planner_joint_ranges = self.config.policy_config.left_planner_joint_ranges
    else:
        self.planner = CuroboPlanner(
            config=self.config.policy_config.right_curobo_planner_config
        )
        self.planner_joint_ranges = self.config.policy_config.right_planner_joint_ranges

    self.arm_start_idx = self.planner_joint_ranges[f"{self.arm_side}_arm"][0]
    self.arm_end_idx = self.planner_joint_ranges[f"{self.arm_side}_arm"][1]
select_arm_for_opening
select_arm_for_opening() -> str

Select the arm for opening the door.

Also sets self.arm_side and self.planner_joint_ranges for compatibility with base class methods.

Source code in molmo_spaces/policy/solvers/opening_solver.py
def select_arm_for_opening(self) -> str:
    """Select the arm for opening the door.

    Also sets self.arm_side and self.planner_joint_ranges for compatibility
    with base class methods.
    """
    robot_base_pose_tf = self.task.env.robots[0].get_world_pose_tf_mat()
    door_joint_pos_world = self.task.get_door_joint_position()
    door_handle_pos_world = self.task.get_door_handle_position()

    log.debug("\n[ARM SELECTION] Analyzing door geometry:")
    log.debug(f"  - Robot base position: {robot_base_pose_tf[:3, 3]}")
    log.debug(f"  - Door joint position: {door_joint_pos_world}")
    log.debug(f"  - Door handle position: {door_handle_pos_world}")

    door_joint_pos_robot = np.linalg.inv(robot_base_pose_tf) @ np.array(
        [*door_joint_pos_world, 1.0]
    )
    door_handle_pos_robot = np.linalg.inv(robot_base_pose_tf) @ np.array(
        [*door_handle_pos_world, 1.0]
    )

    selected_arm = "left" if door_joint_pos_robot[1] > door_handle_pos_robot[1] else "right"

    log.debug(f"  - Door joint in robot frame: {door_joint_pos_robot[:3]}")
    log.debug(f"  - Door handle in robot frame: {door_handle_pos_robot[:3]}")
    log.debug(
        f"  - Selected arm: {selected_arm} (joint Y: {door_joint_pos_robot[1]:.3f}, "
        f"handle Y: {door_handle_pos_robot[1]:.3f})"
    )

    # Set base class properties for compatibility with shared methods
    self.arm_side = selected_arm
    self.planner_joint_ranges = self._get_planner_joint_ranges()

    return selected_arm
solve_ik
solve_ik(target_pose: ndarray) -> None

Solve inverse kinematics for a target pose and create interpolated trajectory.

Parameters:

Name Type Description Default
target_pose ndarray

4x4 transformation matrix for target end-effector pose in world frame.

required

Raises:

Type Description
ValueError

If IK solution cannot be found.

Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py
def solve_ik(self, target_pose: np.ndarray) -> None:
    """Solve inverse kinematics for a target pose and create interpolated trajectory.

    Args:
        target_pose: 4x4 transformation matrix for target end-effector pose in world frame.

    Raises:
        ValueError: If IK solution cannot be found.
    """
    init_config = np.concatenate(
        [
            np.zeros(3),
            self.task.env.current_robot.robot_view.get_move_group(
                f"{self.arm_side}_arm"
            ).joint_pos,
        ]
    )

    target_pose_7d = self._target_pose_to_base_frame(target_pose)
    joint_config, _ = self.planner.ik_solve(
        goal_pose=target_pose_7d.tolist(),
        seed_config=init_config.tolist(),
        disable_collision=True,
    )

    current_phase = getattr(self, "current_phase", "unknown")
    if joint_config is None:
        raise ValueError(f"Could not solve {current_phase} phase IK.")

    trajectory = self._interpolate_joint_trajectory(
        init_config,
        np.array(joint_config),
        num_steps=10,
    )
    self._transform_traj_to_world_frame(trajectory)
    self.planned_trajectory = trajectory
visualize_world_config_mesh
visualize_world_config_mesh(world_cfg: WorldConfig) -> None

Visualize the world configuration as a mesh file.

Parameters:

Name Type Description Default
world_cfg WorldConfig

Curobo WorldConfig to visualize.

required
Source code in molmo_spaces/policy/solvers/curobo_planner_policy.py
def visualize_world_config_mesh(self, world_cfg: WorldConfig) -> None:
    """Visualize the world configuration as a mesh file.

    Args:
        world_cfg: Curobo WorldConfig to visualize.
    """
    from datetime import datetime

    import trimesh
    from curobo.geom.types import WorldConfig

    current_phase = getattr(self, "current_phase", "unknown")
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    file_name = f"world_config_mesh_{current_phase}_{timestamp}"
    obj_path = f"{file_name}.obj"
    initial_joint_configuration = np.array(self._get_current_joint_positions())

    try:
        # Create a combined scene with robot and world obstacles
        combined_scene = trimesh.scene.scene.Scene(base_frame="world_origin")

        # Add robot mesh if we have a joint configuration
        if initial_joint_configuration is not None and self.planner is not None:
            try:
                config_for_mesh = np.array(initial_joint_configuration).copy()
                # Zero out base joints to place robot at origin
                config_for_mesh[:3] = 0.0
                q_tensor = torch.tensor(config_for_mesh).unsqueeze(0).float().cuda()

                robot_spheres_batch = self.planner.motion_gen.kinematics.get_robot_as_spheres(
                    q_tensor
                )
                if robot_spheres_batch and len(robot_spheres_batch) > 0:
                    robot_spheres = robot_spheres_batch[0]
                    for i, sphere in enumerate(robot_spheres):
                        sphere_mesh = trimesh.creation.icosphere(radius=sphere.radius)
                        sphere_transform = np.eye(4)
                        sphere_transform[:3, 3] = sphere.pose[:3]
                        combined_scene.add_geometry(
                            sphere_mesh,
                            geom_name=f"robot_sphere_{i}",
                            parent_node_name="world_origin",
                            transform=sphere_transform,
                        )
                    log.info(
                        f"Added {len(robot_spheres)} robot collision spheres to visualization"
                    )

            except Exception as e:
                log.warning(f"Could not add robot mesh: {e}")

        # Add world obstacles to the scene
        try:
            world_scene = WorldConfig.get_scene_graph(world_cfg, process_color=True)
            for geom_name, geom in world_scene.geometry.items():
                transform = world_scene.graph.get(geom_name)[0]
                combined_scene.add_geometry(
                    geom,
                    geom_name=geom_name,
                    parent_node_name="world_origin",
                    transform=transform,
                )
        except Exception as e:
            log.warning(f"Could not add world obstacles: {e}")

        # Export combined scene
        if len(combined_scene.geometry) > 0:
            combined_scene.export(obj_path)
            log.info(f"Successfully saved combined robot + world mesh to {obj_path}")
        else:
            log.debug("Skipping mesh export - scene is empty")

    except ValueError as e:
        if "empty scene" in str(e).lower():
            log.debug("Skipping mesh export - world config is empty")
        else:
            raise