utils¶
utils
¶
Modules:
| Name | Description |
|---|---|
articulation_utils |
|
asset_names |
|
benchmark_utils |
Benchmark utilities for kinematics outlier detection in trajectory H5 files. |
camera_utils |
|
constants |
|
controller_utils |
|
depth_utils |
Utilities for depth image encoding and decoding. |
devices |
|
distance_transform_utils |
|
eval_camera_randomization_utils |
Level → value scaling for camera and light randomization. |
eval_utils |
Evaluation utilities for logging stats and videos to wandb. |
fisheye_warping |
GPU-accelerated fisheye lens distortion warping for camera images. |
function_utils |
|
grasp_sample |
This module contains functionality for filtering and sampling grasps based on heuristics. |
grasps |
This module contains functionality for loading grasps from registered grasp libraries. |
lazy_loading_utils |
|
lemma_utils |
|
license_utils |
|
linalg_utils |
|
mj_model_and_data_utils |
|
mp_logging |
|
mujoco_scene_utils |
|
object_metadata |
|
object_retriever |
|
patch_renderer_flags |
Import this module to configure the renderer flags for the current platform. |
pose |
|
profiler_utils |
|
rendering_utils |
|
sampler_utils |
|
save_utils |
|
scene_maps |
|
scene_metadata_utils |
|
spatial_utils |
Quaternions are assumed to be scalar first! |
synset_utils |
|
task_relevant_objects_and_workspace_utils |
Derive task-relevant object names and workspace center from task config fields. |
test_utils |
Shared utilities for data generation tests (Franka, RUM, etc.). |
video_utils |
Copied from video2sim_pipeline/video2sim/utils/video_utils.py |
articulation_utils
¶
Functions:
| Name | Description |
|---|---|
gather_joint_info |
|
step_circular_path |
joint_info: |
step_linear_path |
|
visualize_path |
Comprehensive visualization of the gripper base path and finger center arc. |
Attributes:
| Name | Type | Description |
|---|---|---|
GRIPPER_LENGTH |
|
gather_joint_info
¶
Source code in molmo_spaces/utils/articulation_utils.py
step_circular_path
¶
step_circular_path(current_pos, current_quat, joint_info, max_joint_angle, n_waypoints=10, gripper_length=0)
joint_info
joint_body_position joint_axis joint_body_orientation joint_position joint_range joint_pos
Source code in molmo_spaces/utils/articulation_utils.py
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 | |
step_linear_path
¶
step_linear_path(to_handle_dist, current_pos, current_quat, step_size, is_reverse=False, gripper_length=0)
Source code in molmo_spaces/utils/articulation_utils.py
visualize_path
¶
visualize_path(path, title='Gripper Base Path Visualization', save_path=None, joint_position=None, show_finger_center=True)
Comprehensive visualization of the gripper base path and finger center arc.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Dictionary with 'mocap_pos' and 'mocap_quat' lists (representing gripper base positions) |
required | |
title
|
Title for the plot |
'Gripper Base Path Visualization'
|
|
save_path
|
Optional path to save the plot |
None
|
|
joint_position
|
Optional joint position to visualize |
None
|
|
show_finger_center
|
If True, also show the finger center arc for reference |
True
|
Source code in molmo_spaces/utils/articulation_utils.py
175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 | |
asset_names
¶
Functions:
| Name | Description |
|---|---|
get_child_body_ids |
|
get_child_body_names |
|
get_thor_name |
|
get_child_body_ids
¶
Source code in molmo_spaces/utils/asset_names.py
get_child_body_names
¶
get_thor_name
¶
Source code in molmo_spaces/utils/asset_names.py
benchmark_utils
¶
Benchmark utilities for kinematics outlier detection in trajectory H5 files.
NOTE ON TEMPORAL INDEXING: All sensor arrays (qpos, cmd, jpr) share the same index space, but within a step() the controller target is set before physics runs, so qpos[t] only partially converges toward cmd[t] (~30% per step). We therefore use qpos[t+1] instead of qpos[t] to measure tracking quality::
tracking_error[t] = qpos[t+1] - cmd[t]
relative_tracking_error[t] = (qpos[t+1] - cmd[t]) / jpr[t]
NOTE ON COLLISION DETECTION: The better way to detect collisions would be via residual torques (measured minus expected from rigid-body dynamics), but no torque sensors are currently recorded. Tracking error is a reasonable proxy --collisions cause position deviations-- though less sensitive than torques.
Functions:
| Name | Description |
|---|---|
compute_bounds_std |
|
episodes_with_kinematics_outliers |
Find episodes with kinematics outliers across H5 trajectory files. |
resolve_asset_id |
Resolve the asset ID (UID) for a task object by name. |
save_outlier_gifs |
Save GIFs of merged outlier segments for visual inspection. |
save_signal_histograms |
Collect raw relative tracking error values and save histograms as PNGs. |
Attributes:
| Name | Type | Description |
|---|---|---|
THOR_CAT_SIMPLIFY |
|
|
log |
|
THOR_CAT_SIMPLIFY
module-attribute
¶
THOR_CAT_SIMPLIFY = {'saltshaker': 's/p shaker', 'peppershaker': 's/p shaker', 'tomato': 'fruit', 'apple': 'fruit', 'butterknife': 'knife', 'boiler': 'kettle', 'winebottle': 'bottle', 'atomizer': 'spray bottle', 'remotecontrol': 'remote control', 'soapdispenser': 'soap dispenser', 'tissuepaper': 'tissue paper'}
compute_bounds_std
¶
compute_bounds_std(stats: dict, std_mult: float) -> dict[tuple[str, int], tuple[float, float, float, float]]
Source code in molmo_spaces/utils/benchmark_utils.py
episodes_with_kinematics_outliers
¶
episodes_with_kinematics_outliers(data_path: Path | str, max_files: int | None = None, num_workers: int = 32, std_mult: float = 8.0, action_groups: Collection[str] = ('arm',), skip_first_n_steps: int = 1, min_joint_pos_rel_magnitude: float = 0.015, std_mult_clip_sigma: float = 4.0, std_mult_negative_only: bool = True, print_stats: bool = False) -> tuple[list[dict], dict]
Find episodes with kinematics outliers across H5 trajectory files.
Uses relative tracking error (qpos[t+1] - cmd[t]) / jpr[t] as the
outlier signal (see NOTE ON TEMPORAL INDEXING in module docstring).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_path
|
Path | str
|
Root directory containing house_/trajectories.h5 files. |
required |
max_files
|
int | None
|
Cap on the number of H5 files to process (for testing). |
None
|
num_workers
|
int
|
Parallel workers for stats collection and outlier detection. |
32
|
std_mult
|
float
|
Number of standard deviations beyond which a value is an outlier. |
8.0
|
action_groups
|
Collection[str]
|
Which move-group keys to examine (e.g. |
('arm',)
|
skip_first_n_steps
|
int
|
Ignore the first N timesteps of each trajectory. |
1
|
min_joint_pos_rel_magnitude
|
float
|
Minimum absolute |
0.015
|
std_mult_clip_sigma
|
float
|
Number of sigmas for iterative sigma-clipping to robustly estimate the spread. |
4.0
|
std_mult_negative_only
|
bool
|
When True, only flag values below the lower bound (undershooting). Positive overshooting is accepted. |
True
|
print_stats
|
bool
|
Print per-dimension statistics. |
False
|
Returns:
| Type | Description |
|---|---|
tuple[list[dict], dict]
|
Tuple of |
Source code in molmo_spaces/utils/benchmark_utils.py
393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 | |
resolve_asset_id
¶
resolve_asset_id(object_name: str, task_config, scene_dataset: str | None = None, data_split: str | None = None, house_index: int | None = None) -> str | None
Resolve the asset ID (UID) for a task object by name.
Tries two strategies in order:
-
added_objects (frozen config / JSON benchmark): If the object was dynamically added to the scene (e.g. a place receptacle), its XML path is stored in
task_config.added_objects. The UID is the stem of the XML filename (<uid>.xml). -
Scene metadata (via SceneMeta): For objects that are part of the base scene (e.g. pickup objects), the asset_id is looked up from the scene's
*_metadata.jsonusingscene_dataset,data_split, andhouse_index.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
object_name
|
str
|
MuJoCo body name of the object (e.g. |
required |
task_config
|
A task config object (e.g. |
required | |
scene_dataset
|
str | None
|
Scene dataset name (e.g. |
None
|
data_split
|
str | None
|
Data split (e.g. |
None
|
house_index
|
int | None
|
House index within the dataset/split. Required for the SceneMeta fallback. |
None
|
Returns:
| Type | Description |
|---|---|
str | None
|
The asset UID string, or |
Source code in molmo_spaces/utils/benchmark_utils.py
save_outlier_gifs
¶
save_outlier_gifs(outlier_episodes: list[dict], output_dir: Path | str, merge_gap: int = 20, context_frames: int = 5, camera_preference: tuple[str, ...] = ('exo_camera_1', 'wrist_camera'), fps_gif: float = 5.0, sample_rate: float = 0.1, max_samples: int = 50) -> int
Save GIFs of merged outlier segments for visual inspection.
Outlier timesteps within merge_gap frames of each other are merged into a single segment (iteratively, until no more merges are possible). Each segment is then padded by context_frames on both sides and saved as one GIF.
Filename convention::
{house}_{batch}_traj{idx}_f{start}-{end}_{max_std:.1f}std_{signal}.gif
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
outlier_episodes
|
list[dict]
|
Output of :func: |
required |
output_dir
|
Path | str
|
Directory to write GIF files into (created if needed). |
required |
merge_gap
|
int
|
Maximum frame distance between two outlier timesteps for them to be merged into the same segment. |
20
|
context_frames
|
int
|
Number of extra frames to include before the first and after the last outlier in each segment. |
5
|
camera_preference
|
tuple[str, ...]
|
Ordered list of camera names to try when looking for a video reference inside the H5 file. |
('exo_camera_1', 'wrist_camera')
|
fps_gif
|
float
|
Playback speed of the output GIF (frames per second). |
5.0
|
sample_rate
|
float
|
relative amount of examples to render |
0.1
|
max_samples
|
int
|
absolute max number of samples to save (only applied if sample_rate < 1.0). |
50
|
Returns:
| Type | Description |
|---|---|
int
|
Number of GIF files written. |
Source code in molmo_spaces/utils/benchmark_utils.py
883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 | |
save_signal_histograms
¶
save_signal_histograms(data_path: Path | str, output_dir: Path | str, bounds: dict[tuple[str, int], tuple[float, float, float, float]] | None = None, action_groups: Collection[str] = ('arm',), skip_first_n_steps: int = 1, max_files: int | None = None, num_workers: int = 32, min_joint_pos_rel_magnitude: float = 0.015) -> None
Collect raw relative tracking error values and save histograms as PNGs.
Generates one figure per eps threshold level, each with one subplot per joint dimension. If bounds is provided the outlier thresholds are drawn as vertical lines.
Also generates two joint-histogram (contour) plots of joint_pos_rel
vs relative tracking error: one with a tight x-range and one with a
wide range showing the chosen min_joint_pos_rel_magnitude threshold.
Source code in molmo_spaces/utils/benchmark_utils.py
643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 | |
camera_utils
¶
Functions:
| Name | Description |
|---|---|
erode_segmentation_mask |
Apply binary erosion to a segmentation mask. |
normalize_points |
Normalize image points to 0-1 range, optionally applying distortion correction. |
erode_segmentation_mask
¶
Apply binary erosion to a segmentation mask.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mask
|
ndarray
|
Binary segmentation mask |
required |
iterations
|
int
|
Number of erosion iterations |
2
|
Returns:
| Type | Description |
|---|---|
ndarray
|
Eroded binary mask |
Source code in molmo_spaces/utils/camera_utils.py
normalize_points
¶
normalize_points(points: ndarray, img_width: int, img_height: int, distortion_map: ndarray | None = None) -> ndarray
Normalize image points to 0-1 range, optionally applying distortion correction.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
points
|
ndarray
|
Array of shape (N, 2) containing (x, y) pixel coordinates |
required |
img_width
|
int
|
Image width in pixels |
required |
img_height
|
int
|
Image height in pixels |
required |
distortion_map
|
ndarray | None
|
Optional distortion map for warped cameras (e.g., GoPro) Currently not implemented - will be added in future |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
Normalized points in 0-1 range as array of shape (N, 2) |
Source code in molmo_spaces/utils/camera_utils.py
constants
¶
Modules:
| Name | Description |
|---|---|
camera_constants |
Camera hardware constants for fisheye warping and image processing. |
object_constants |
|
simulation_constants |
|
camera_constants
¶
Camera hardware constants for fisheye warping and image processing.
Attributes:
| Name | Type | Description |
|---|---|---|
DEFAULT_CROP_PERCENT |
|
|
DEFAULT_DISTORTION_PARAMETERS |
|
|
GOPRO_CAMERA_HEIGHT |
|
|
GOPRO_CAMERA_WIDTH |
|
|
GOPRO_VERTICAL_FOV |
|
|
MODEL_43_HEIGHT |
|
|
MODEL_43_WIDTH |
|
|
NULL_DISTORTION_PARAMETERS |
|
object_constants
¶
Functions:
| Name | Description |
|---|---|
bad_asset_ids |
|
Attributes:
AI2THOR_OBJECT_TYPE_TO_MOST_SPECIFIC_WORDNET_LEMMA
module-attribute
¶
AI2THOR_OBJECT_TYPE_TO_MOST_SPECIFIC_WORDNET_LEMMA = {'AlarmClock': 'alarm_clock', 'AluminumFoil': 'aluminum_foil', 'Apple': 'apple', 'AppleSliced': 'apple', 'ArmChair': 'armchair', 'BaseballBat': 'baseball_bat', 'BasketBall': 'basketball', 'Bathtub': 'bathtub', 'BathtubBasin': 'bathtub', 'Bed': 'bed', 'Blinds': 'blind', 'Book': 'book', 'Boots': 'boot', 'Bottle': 'bottle', 'Bowl': 'bowl', 'Box': 'box', 'Bread': 'bread', 'BreadSliced': 'bread', 'ButterKnife': 'butter_knife', 'CD': 'compact_disk', 'Cabinet': 'cabinet', 'Candle': 'candle', 'Cart': 'handcart', 'CellPhone': 'cellular_telephone', 'Chair': 'chair', 'Cloth': 'fabric', 'ClothesDryer': 'clothes_dryer', 'CoffeeMachine': 'coffee_maker', 'CoffeeTable': 'coffee_table', 'CounterTop': 'countertop', 'CreditCard': 'credit_card', 'Cup': 'cup', 'Curtains': 'curtain', 'Desk': 'desk', 'DeskLamp': 'table_lamp', 'Desktop': 'desktop_computer', 'DiningTable': 'dining_table', 'DishSponge': 'sponge', 'DogBed': 'pad', 'Doorframe': 'doorframe', 'Doorway': 'doorway', 'Drawer': 'drawer', 'Dresser': 'chest_of_drawers', 'Dumbbell': 'dumbbell', 'Egg': 'egg', 'EggCracked': 'egg', 'Faucet': 'faucet', 'Floor': 'flooring', 'FloorLamp': 'floor_lamp', 'Footstool': 'footstool', 'Fork': 'fork', 'Fridge': 'refrigerator', 'GarbageBag': 'bin_liner', 'GarbageCan': 'ashcan', 'HandTowel': 'hand_towel', 'HandTowelHolder': 'towel_rack', 'HousePlant': 'houseplant', 'Kettle': 'boiler', 'KeyChain': 'key_ring', 'Knife': 'knife', 'Ladle': 'ladle', 'Laptop': 'laptop', 'LaundryHamper': 'clothes_hamper', 'Lettuce': 'lettuce', 'LettuceSliced': 'lettuce', 'LightSwitch': 'electric_switch', 'Microwave': 'microwave_oven', 'Mirror': 'mirror', 'Mug': 'mug', 'Newspaper': 'newspaper', 'Ottoman': 'pouffe', 'Painting': 'painting', 'Pan': 'cooking_pan', 'PaperTowelRoll': 'paper_towel', 'Pen': 'pen', 'Pencil': 'pencil', 'PepperShaker': 'pepper_shaker', 'Pillow': 'pillow', 'Plate': 'plate', 'Plunger': "plumber's_helper", 'Poster': 'placard', 'Pot': 'pot', 'Potato': 'Irish_potato', 'PotatoSliced': 'Irish_potato', 'RemoteControl': 'remote_control', 'RoomDecor': 'decoration', 'Safe': 'safe', 'SaltShaker': 'saltshaker', 'ScrubBrush': 'scrub_brush', 'Shelf': 'shelf', 'ShelvingUnit': 'shelf', 'ShowerCurtain': 'shower_curtain', 'ShowerDoor': 'door', 'ShowerGlass': 'door', 'ShowerHead': 'showerhead', 'SideTable': 'stand', 'Sink': 'sink', 'SinkBasin': 'sink', 'SoapBar': 'bar_soap', 'SoapBottle': 'soap_dispenser', 'Sofa': 'sofa', 'Spatula': 'spatula', 'Spoon': 'spoon', 'SprayBottle': 'atomizer', 'Statue': 'statue', 'Stool': 'stool', 'StoveBurner': 'burner', 'StoveKnob': 'knob', 'TVStand': 'stand', 'TableTopDecor': 'knickknack', 'TeddyBear': 'teddy_bear', 'Television': 'television', 'TennisRacket': 'tennis_racket', 'TissueBox': 'tissue_paper', 'Toaster': 'toaster', 'Toilet': 'crapper', 'ToiletPaper': 'toilet_tissue', 'ToiletPaperHanger': 'hanger', 'Tomato': 'tomato', 'TomatoSliced': 'tomato', 'Towel': 'towel', 'TowelHolder': 'towel_rack', 'VacuumCleaner': 'vacuum_cleaner', 'Vase': 'vase', 'Wall': 'wall', 'WashingMachine': 'automatic_washer', 'Watch': 'watch', 'WateringCan': 'watering_can', 'Window': 'window', 'WineBottle': 'wine_bottle'}
AI2THOR_OBJECT_TYPE_TO_WORDNET_SYNSET
module-attribute
¶
AI2THOR_OBJECT_TYPE_TO_WORDNET_SYNSET = {'AlarmClock': 'alarm_clock.n.01', 'AluminumFoil': 'aluminum_foil.n.01', 'Apple': 'apple.n.01', 'AppleSliced': 'apple.n.01', 'ArmChair': 'armchair.n.01', 'BaseballBat': 'baseball_bat.n.01', 'BasketBall': 'basketball.n.02', 'Bathtub': 'bathtub.n.01', 'BathtubBasin': 'bathtub.n.01', 'Bed': 'bed.n.01', 'Blinds': 'blind.n.03', 'Book': 'book.n.02', 'Boots': 'boot.n.01', 'Bottle': 'bottle.n.01', 'Bowl': 'bowl.n.03', 'Box': 'carton.n.02', 'Bread': 'bread.n.01', 'BreadSliced': 'bread.n.01', 'ButterKnife': 'butter_knife.n.01', 'CD': 'compact_disk.n.01', 'Cabinet': 'cabinet.n.01', 'Candle': 'candle.n.01', 'Cart': 'handcart.n.01', 'CellPhone': 'cellular_telephone.n.01', 'Chair': 'straight_chair.n.01', 'Cloth': 'fabric.n.01', 'ClothesDryer': 'clothes_dryer.n.01', 'CoffeeMachine': 'coffee_maker.n.01', 'CoffeeTable': 'coffee_table.n.01', 'CounterTop': 'countertop.n.01', 'CreditCard': 'credit_card.n.01', 'Cup': 'cup.n.01', 'Curtains': 'curtain.n.01', 'Desk': 'desk.n.01', 'DeskLamp': 'table_lamp.n.01', 'Desktop': 'desktop_computer.n.01', 'DiningTable': 'dining_table.n.01', 'DishSponge': 'sponge.n.01', 'DogBed': 'pad.n.04', 'Doorframe': 'doorframe.n.01', 'Doorway': 'doorway.n.01', 'Drawer': 'drawer.n.01', 'Dresser': 'chest_of_drawers.n.01', 'Dumbbell': 'dumbbell.n.01', 'Egg': 'egg.n.02', 'EggCracked': 'egg.n.02', 'Faucet': 'faucet.n.01', 'Floor': 'floor.n.01', 'FloorLamp': 'floor_lamp.n.01', 'Footstool': 'footstool.n.01', 'Fork': 'fork.n.01', 'Fridge': 'refrigerator.n.01', 'GarbageBag': 'bin_liner.n.01', 'GarbageCan': 'ashcan.n.01', 'HandTowel': 'hand_towel.n.01', 'HandTowelHolder': 'towel_rack.n.01', 'HousePlant': 'houseplant.n.01', 'Kettle': 'kettle.n.01', 'KeyChain': 'key_ring.n.01', 'Knife': 'knife.n.01', 'Ladle': 'ladle.n.01', 'Laptop': 'laptop.n.01', 'LaundryHamper': 'clothes_hamper.n.01', 'Lettuce': 'lettuce.n.03', 'LettuceSliced': 'lettuce.n.03', 'LightSwitch': 'switch.n.01', 'Microwave': 'microwave.n.02', 'Mirror': 'mirror.n.01', 'Mug': 'mug.n.04', 'Newspaper': 'newspaper.n.03', 'Ottoman': 'footstool.n.01', 'Painting': 'painting.n.01', 'Pan': 'pan.n.01', 'PaperTowelRoll': 'paper_towel.n.01', 'Pen': 'pen.n.01', 'Pencil': 'pencil.n.01', 'PepperShaker': 'pepper_shaker.n.01', 'Pillow': 'pillow.n.01', 'Plate': 'plate.n.04', 'Plunger': 'plunger.n.03', 'Poster': 'poster.n.01', 'Pot': 'pot.n.01', 'Potato': 'potato.n.01', 'PotatoSliced': 'potato.n.01', 'RemoteControl': 'remote_control.n.01', 'RoomDecor': 'decoration.n.01', 'Safe': 'safe.n.01', 'SaltShaker': 'saltshaker.n.01', 'ScrubBrush': 'scrub_brush.n.01', 'Shelf': 'shelf.n.01', 'ShelvingUnit': 'shelf.n.01', 'ShowerCurtain': 'shower_curtain.n.01', 'ShowerDoor': 'door.n.01', 'ShowerGlass': 'door.n.01', 'ShowerHead': 'showerhead.n.01', 'SideTable': 'stand.n.04', 'Sink': 'sink.n.01', 'SinkBasin': 'sink.n.01', 'SoapBar': 'bar_soap.n.01', 'SoapBottle': 'soap_dispenser.n.01', 'Sofa': 'sofa.n.01', 'Spatula': 'spatula.n.01', 'Spoon': 'spoon.n.01', 'SprayBottle': 'atomizer.n.01', 'Statue': 'statue.n.01', 'Stool': 'stool.n.01', 'StoveBurner': 'burner.n.02', 'StoveKnob': 'knob.n.02', 'TVStand': 'stand.n.04', 'TableTopDecor': 'knickknack.n.01', 'TeddyBear': 'teddy.n.01', 'Television': 'television_receiver.n.01', 'TennisRacket': 'tennis_racket.n.01', 'TissueBox': 'tissue.n.02', 'Toaster': 'toaster.n.02', 'Toilet': 'toilet.n.02', 'ToiletPaper': 'toilet_tissue.n.01', 'ToiletPaperHanger': 'hanger.n.02', 'Tomato': 'tomato.n.01', 'TomatoSliced': 'tomato.n.01', 'Towel': 'towel.n.01', 'TowelHolder': 'towel_rack.n.01', 'VacuumCleaner': 'vacuum.n.04', 'Vase': 'vase.n.01', 'Wall': 'wall.n.01', 'WashingMachine': 'washer.n.03', 'Watch': 'watch.n.01', 'WateringCan': 'watering_can.n.01', 'Window': 'window.n.01', 'WineBottle': 'wine_bottle.n.01'}
ALL_ARTICULATION_TYPES_THOR
module-attribute
¶
ALL_ARTICULATION_TYPES_THOR = ['Toilet', 'Dresser', 'Safe', 'Shelving_Unit', 'ShelvingUnit', 'Side_Table', 'SideTable', 'Fridge', 'Microwave', 'Coffee_Table', 'CoffeeTable', 'Desk', 'Laptop', 'Doorways', 'Laundry_Hamper', 'LaundryHamper']
ALL_PICKUP_SYNSETS
module-attribute
¶
ALL_PICKUP_SYNSETS = [(AI2THOR_OBJECT_TYPE_TO_WORDNET_SYNSET[ot]) for ot in ALL_PICKUP_TYPES_THOR]
ALL_PICKUP_TYPES_THOR
module-attribute
¶
ALL_PICKUP_TYPES_THOR = ['AlarmClock', 'AluminumFoil', 'Apple', 'AppleSliced', 'Book', 'Boots', 'Bottle', 'Bowl', 'Box', 'Bread', 'BreadSliced', 'ButterKnife', 'Candle', 'CD', 'CellPhone', 'Cloth', 'CreditCard', 'Cup', 'DishSponge', 'Dumbbell', 'Egg', 'EggCracked', 'Fork', 'HandTowel', 'Kettle', 'KeyChain', 'Knife', 'Ladle', 'Laptop', 'Lettuce', 'LettuceSliced', 'Mug', 'Newspaper', 'Pan', 'PaperTowelRoll', 'Pen', 'Pencil', 'PepperShaker', 'Pillow', 'Plate', 'Plunger', 'Pot', 'Potato', 'PotatoSliced', 'RemoteControl', 'SaltShaker', 'ScrubBrush', 'SoapBar', 'SoapBottle', 'Spatula', 'Spoon', 'SprayBottle', 'Statue', 'TableTopDecor', 'TeddyBear', 'TennisRacket', 'TissueBox', 'ToiletPaper', 'Tomato', 'TomatoSliced', 'Towel', 'Vase', 'Watch', 'WateringCan', 'WineBottle']
BOOLSET_OBJECT_TYPES
module-attribute
¶
BOOLSET_OBJECT_TYPES = {'AlarmClock', 'Apple', 'ArmChair', 'BasketBall', 'Bed', 'Book', 'Boots', 'Bottle', 'Bowl', 'Box', 'Bread', 'ButterKnife', 'CD', 'Cabinet', 'Candle', 'Cart', 'CellPhone', 'Chair', 'Cloth', 'ClothesDryer', 'CoffeeMachine', 'CoffeeTable', 'CounterTop', 'CreditCard', 'Cup', 'Desk', 'DeskLamp', 'Desktop', 'DiningTable', 'DishSponge', 'DogBed', 'Drawer', 'Dresser', 'Dumbbell', 'Egg', 'Faucet', 'FloorLamp', 'Fork', 'Fridge', 'GarbageBag', 'GarbageCan', 'HousePlant', 'Kettle', 'KeyChain', 'Knife', 'Ladle', 'Laptop', 'LaundryHamper', 'Lettuce', 'Microwave', 'Mug', 'Newspaper', 'Ottoman', 'Painting', 'Pan', 'PaperTowelRoll', 'Pen', 'Pencil', 'PepperShaker', 'Pillow', 'Plate', 'Plunger', 'Pot', 'Potato', 'RemoteControl', 'Safe', 'SaltShaker', 'Shelf', 'ShelvingUnit', 'SideTable', 'Sink', 'SinkBasin', 'SoapBar', 'SoapBottle', 'Sofa', 'Spatula', 'Spoon', 'SprayBottle', 'Statue', 'Stool', 'TVStand', 'TeddyBear', 'Television', 'TennisRacket', 'TissueBox', 'Toaster', 'Toilet', 'ToiletPaper', 'Tomato', 'VacuumCleaner', 'Vase', 'WashingMachine', 'Watch', 'WineBottle'}
EXTENDED_ARTICULATION_TYPES_THOR
module-attribute
¶
EXTENDED_ARTICULATION_TYPES_THOR = ALL_ARTICULATION_TYPES_THOR + ITHOR_ARTICULATED_OBJECTS
ITHOR_ARTICULATED_OBJECTS
module-attribute
¶
OBJNAV_SYNSETS
module-attribute
¶
OBJNAV_SYNSETS = [(AI2THOR_OBJECT_TYPE_TO_WORDNET_SYNSET[ot]) for ot in OBJNAV_TYPES_THOR]
OBJNAV_TYPES_THOR
module-attribute
¶
OBJNAV_TYPES_THOR = ['AlarmClock', 'Apple', 'BasketBall', 'Bed', 'Bowl', 'Chair', 'GarbageCan', 'HousePlant', 'Laptop', 'Mug', 'Sofa', 'SprayBottle', 'Television', 'Toilet', 'Vase']
PICKUP_SYNSETS
module-attribute
¶
PICKUP_SYNSETS = [(AI2THOR_OBJECT_TYPE_TO_WORDNET_SYNSET[ot]) for ot in PICKUP_TYPES_THOR]
PICKUP_TYPES_THOR
module-attribute
¶
PICKUP_TYPES_THOR = ['AlarmClock', 'Apple', 'BasketBall', 'Bowl', 'Laptop', 'Mug', 'SprayBottle', 'Vase']
PICK_AND_PLACE_OBJECTS
module-attribute
¶
PICK_AND_PLACE_OBJECTS = ['alarm_clock', 'aluminum_foil', 'apple', 'bottle', 'bread', 'butterknife', 'candle', 'cd', 'cellphone', 'cloth', 'creditcard', 'cup', 'dish_sponge', 'egg', 'egg_cracked', 'fork', 'hand_towel', 'keychain', 'knife', 'ladle', 'mug', 'newspaper', 'paper_towel', 'pen', 'pencil', 'pepper_shaker', 'potato', 'remote', 'salt_shaker', 'scrub_brush', 'soap_bar', 'soap_bottle', 'spatula', 'spoon', 'spray_bottle', 'tissue_box', 'toilet_paper', 'toilet_paper_used_up', 'tomato', 'towel_statue', 'watch']
RECEPTACLE_SYNSETS
module-attribute
¶
RECEPTACLE_SYNSETS = [(AI2THOR_OBJECT_TYPE_TO_WORDNET_SYNSET[ot]) for ot in RECEPTACLE_TYPES_THOR]
RECEPTACLE_TYPES_THOR
module-attribute
¶
RECEPTACLE_TYPES_THOR = ['ArmChair', 'Bed', 'Chair', 'CoffeeTable', 'CounterTop', 'Desk', 'DiningTable', 'Dresser', 'Shelf', 'SideTable', 'Sofa', 'Stool', 'TVStand']
RELATIVE_ANCHOR_TYPES
module-attribute
¶
RELATIVE_ANCHOR_TYPES = {'Bed', 'CounterTop', 'DiningTable', 'Fridge', 'Sink', 'Sofa', 'Television', 'Toilet'}
TARGET_EXCLUDED_SYNSETS
module-attribute
¶
TARGET_EXCLUDED_SYNSETS = {'knickknack.n.01', 'countertop.n.01', 'doorway.n.01', 'shelf.n.01', 'decoration.n.01', 'window.n.01', 'doorframe.n.01', 'wall.n.01', 'drawer.n.01', 'floor.n.01', 'arch.n.03', 'needle.n.03', 'tank_car.n.01', 'swatch.n.01', 'visor.n.01', 'arrow.n.01', 'plug.n.01', 'lung.n.01', 'organ.n.01', 'monocle.n.01', 'power_tool.n.01', 'logo.n.01', 'spark_plug.n.01', 'optical_illusion.n.01', 'prize.n.01', 'window.n.04', 'window.n.07', 'projector.n.01', 'pack.n.09'}
THOR_OBJNAV_OBJECTS_LOWERCASE
module-attribute
¶
THOR_OBJNAV_OBJECTS_LOWERCASE = [(replace('_', '')) for x in THOR_OBJNAV_OBJECTS_LOWERCASE]
THOR_PICKUP_OBJECTS_LOWERCASE
module-attribute
¶
THOR_PICKUP_OBJECTS_LOWERCASE = [(replace('_', '')) for x in THOR_PICKUP_OBJECTS_LOWERCASE]
bad_asset_ids
¶
Source code in molmo_spaces/utils/constants/object_constants.py
simulation_constants
¶
Attributes:
| Name | Type | Description |
|---|---|---|
OBJAVERSE_FREE_JOINT_DEFAULT_DAMPING |
|
OBJAVERSE_FREE_JOINT_DEFAULT_DAMPING
module-attribute
¶
controller_utils
¶
Functions:
| Name | Description |
|---|---|
find_nearest_equivalent_angle |
|
optimize_all_steer_and_drive |
|
optimize_steer_and_drive |
|
find_nearest_equivalent_angle
¶
Source code in molmo_spaces/utils/controller_utils.py
optimize_all_steer_and_drive
¶
optimize_all_steer_and_drive(current_angles, target_angles, target_speeds, steer_angle_range, max_wheel_speed)
Source code in molmo_spaces/utils/controller_utils.py
optimize_steer_and_drive
¶
Source code in molmo_spaces/utils/controller_utils.py
depth_utils
¶
Utilities for depth image encoding and decoding.
Optimized for Intel RealSense D405 camera specs: - D405 actual spec: 7cm - 50cm range, ±1.4% at 20cm = ±2.8mm - Encoding range: 5cm - 55cm (extended for margin) - Resolution: 1280x720 - Baseline: 18mm, Global shutter
Depth images are encoded as 16-bit values across RG channels: 1. High precision: 7.6 microns over 50cm range (65,534 discrete values for valid data) 2. Video compatibility: Standard RGB video codecs (H.264 RGB) 3. Efficient lossy compression: Unused B channel reduces artifacts 4. Smaller file sizes vs 24-bit encoding 5. Invalid data handling: 0 reserved for missing/out-of-range pixels
The encoding range (5-55cm) extends slightly beyond D405's spec (7-50cm) to: - Provide margin for edge cases and measurement noise - Still maintain excellent precision (7.6μm vs 15μm with wider ranges) - Keep compression efficient (tight dynamic range = better lossy codec performance)
Invalid/missing data convention: - Pixels outside [DEPTH_MIN, DEPTH_MAX] are encoded as 0 (not clipped) - This allows easy masking: valid_mask = depth > 0 - Common for far-away regions or sensor failures in real-world depth cameras
Functions:
| Name | Description |
|---|---|
compute_depth_encoding_stats |
Compute statistics about depth encoding precision. |
decode_depth_from_rgb |
Decode RG-encoded depth back to metric depth in meters. |
detect_depth_edges |
Detect depth discontinuities (edges) where compression artifacts are expected. |
encode_depth_to_rgb |
Encode metric depth values as 16-bit RG channels for video storage. |
load_depth_video |
Load depth video and decode frames back to metric depth. |
print_depth_stats |
Print detailed depth statistics to console. |
save_depth_video |
Save depth frames as compressed video. |
validate_roundtrip_accuracy |
Validate that depth encoding/decoding roundtrip is accurate. |
visualize_depth_error |
Visualize the compression error between original and decoded depth. |
visualize_depth_image |
Visualize depth image with statistics and save to debug file. |
Attributes:
| Name | Type | Description |
|---|---|---|
DEPTH_MAX |
|
|
DEPTH_MIN |
|
|
DEPTH_VIDEO_CODEC |
|
|
DEPTH_VIDEO_CRF |
|
|
DEPTH_VIDEO_PIXELFORMAT |
|
|
log |
|
compute_depth_encoding_stats
¶
Compute statistics about depth encoding precision.
Useful for validating that the depth range and encoding are appropriate for your specific use case.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
depth_meters
|
ndarray
|
(H, W) float32 array of depth values in meters |
required |
Returns:
| Type | Description |
|---|---|
dict
|
Dictionary with statistics: |
dict
|
|
dict
|
|
dict
|
|
dict
|
|
dict
|
|
dict
|
|
Source code in molmo_spaces/utils/depth_utils.py
decode_depth_from_rgb
¶
Decode RG-encoded depth back to metric depth in meters.
Reverses the encoding from encode_depth_to_rgb() to recover floating-point depth values from uint8 RG channels.
Encoded value of 0 (RGB(0,0,0)) represents invalid/missing data and is decoded to 0.0 meters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rgb_frame
|
ndarray
|
(H, W, 3) uint8 array with depth encoded in RG channels |
required |
validate
|
bool
|
If True, warns if B channel is non-zero (indicates wrong pixel format) |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
depth_meters |
ndarray
|
(H, W) float32 array of depth values in meters. Valid pixels in range [DEPTH_MIN, DEPTH_MAX]. Invalid pixels are 0.0 (use depth > 0 to mask valid data). |
Example
depth_original = np.array([[0.5, 1.0], [0.1, 1.0]], dtype=np.float32) rgb = encode_depth_to_rgb(depth_original) depth_decoded = decode_depth_from_rgb(rgb) np.allclose(depth_original, depth_decoded, atol=0.001) True
Source code in molmo_spaces/utils/depth_utils.py
detect_depth_edges
¶
Detect depth discontinuities (edges) where compression artifacts are expected.
Used for analysis/visualization only - not part of encoding/decoding pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
depth
|
ndarray
|
(H, W) depth array in meters |
required |
gradient_threshold_mm
|
float
|
Depth gradient threshold in mm to classify as edge |
50.0
|
Returns:
| Name | Type | Description |
|---|---|---|
edge_mask |
ndarray
|
(H, W) boolean array, True at edge pixels |
Source code in molmo_spaces/utils/depth_utils.py
encode_depth_to_rgb
¶
Encode metric depth values as 16-bit RG channels for video storage.
Converts floating-point depth values (in meters) to uint8 RG encoding. Provides ~7.6 micron precision over the 50cm range using 16-bit encoding. The B channel is set to 0, which helps with lossy video compression.
Invalid pixels (outside [DEPTH_MIN, DEPTH_MAX]) are encoded as 0, allowing
downstream processing to use depth_mask = depth > 0 to identify valid data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
depth_meters
|
ndarray
|
(H, W) float32 array of depth values in meters. Values outside [DEPTH_MIN, DEPTH_MAX] are set to 0 (invalid). |
required |
Returns:
| Name | Type | Description |
|---|---|---|
rgb_frame |
ndarray
|
(H, W, 3) uint8 array with depth encoded as: - R channel: bits 8-15 (high byte) - G channel: bits 0-7 (low byte) - B channel: 0 (unused, helps compression) - RGB(0,0,0): invalid/missing data |
Example
depth = np.array([[0.5, 1.0], [0.1, 1.0]], dtype=np.float32) rgb = encode_depth_to_rgb(depth) rgb.shape (2, 2, 3) rgb.dtype dtype('uint8')
Source code in molmo_spaces/utils/depth_utils.py
load_depth_video
¶
Load depth video and decode frames back to metric depth.
Companion function to save_depth_video(). Ensures proper codec settings for reading depth videos (RGB pixel format, no YUV conversion).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
video_path
|
str | Path
|
Path to the depth video file (.mp4) |
required |
logger
|
Logger | None
|
Optional logger for debugging |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
depth_frames |
ndarray
|
(T, H, W) float32 array of depth values in meters |
Example
Save and load round-trip¶
depth_original = np.random.rand(10, 480, 640).astype(np.float32) * 0.4 + 0.1 save_depth_video(depth_original, "test_depth.mp4") depth_loaded = load_depth_video("test_depth.mp4") depth_loaded.shape (10, 480, 640)
Source code in molmo_spaces/utils/depth_utils.py
print_depth_stats
¶
Print detailed depth statistics to console.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
depth_meters
|
ndarray
|
(H, W) float32 array of depth values in meters |
required |
name
|
str
|
Name to display in the output (e.g., "Wrist Camera Depth") |
'Depth'
|
Source code in molmo_spaces/utils/depth_utils.py
save_depth_video
¶
save_depth_video(depth_frames: ndarray, video_path: str | Path, fps: float = 10, logger: Logger | None = None) -> None
Save depth frames as compressed video.
This is the single source of truth for depth video compression settings. Encodes depth frames using 16-bit RG encoding and saves with configured codec.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
depth_frames
|
ndarray
|
(T, H, W) float32 array of depth values in meters |
required |
video_path
|
str | Path
|
Path to save the video file |
required |
fps
|
float
|
Frames per second for the video |
10
|
logger
|
Logger | None
|
Optional logger for debugging |
None
|
Example
depth_frames = np.random.rand(100, 480, 640).astype(np.float32) * 0.5 + 0.3 save_depth_video(depth_frames, "depth.mp4")
Source code in molmo_spaces/utils/depth_utils.py
556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 | |
validate_roundtrip_accuracy
¶
Validate that depth encoding/decoding roundtrip is accurate.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
depth_meters
|
ndarray
|
(H, W) float32 array of depth values in meters |
required |
tolerance_mm
|
float
|
Maximum acceptable error in millimeters |
0.1
|
Returns:
| Type | Description |
|---|---|
dict
|
Dictionary with validation results: |
dict
|
|
dict
|
|
dict
|
|
dict
|
|
dict
|
|
Source code in molmo_spaces/utils/depth_utils.py
visualize_depth_error
¶
visualize_depth_error(original_depth: ndarray, decoded_depth: ndarray, error: ndarray, title: str, save_path: Path | None = None)
Visualize the compression error between original and decoded depth.
Shows where errors occur (smooth regions vs edges) to understand compression behavior. Creates a 4-panel visualization: 1. Original depth 2. Decoded depth (after compression) 3. Edge detection (shows discontinuities) 4. Error heatmap
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
original_depth
|
ndarray
|
(H, W) float32 array of original depth in meters |
required |
decoded_depth
|
ndarray
|
(H, W) float32 array of decoded depth in meters |
required |
error
|
ndarray
|
(H, W) float32 array of absolute errors in meters |
required |
title
|
str
|
Title for the visualization |
required |
save_path
|
Path | None
|
Optional path to save the visualization (PNG) |
None
|
Source code in molmo_spaces/utils/depth_utils.py
406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 | |
visualize_depth_image
¶
Visualize depth image with statistics and save to debug file.
Creates a 4-panel visualization showing: 1. Raw depth with full range (0-2m) 2. Raw depth with encoding range 3. Valid/invalid pixel visualization (too close/valid/too far) 4. Encoded RGB representation
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
depth_meters
|
ndarray
|
(H, W) float32 array of depth values in meters |
required |
title
|
str
|
Title for the visualization |
required |
save_path
|
Path | None
|
Optional path to save the visualization (PNG) |
None
|
Returns:
| Type | Description |
|---|---|
|
Dictionary of depth statistics from compute_depth_encoding_stats() |
Source code in molmo_spaces/utils/depth_utils.py
254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 | |
devices
¶
Modules:
| Name | Description |
|---|---|
keyboard |
|
spacemouse |
Driver class for SpaceMouse controller. Modified based on the robosuite code. |
keyboard
¶
Classes:
| Name | Description |
|---|---|
Keyboard |
|
Keyboard
¶
Methods:
| Name | Description |
|---|---|
on_press |
|
on_release |
|
Attributes:
Source code in molmo_spaces/utils/devices/keyboard.py
on_press
¶
on_release
¶
Source code in molmo_spaces/utils/devices/keyboard.py
spacemouse
¶
Driver class for SpaceMouse controller. Modified based on the robosuite code.
This class provides a driver support to SpaceMouse on Mac OS X. In particular, we assume you are using a SpaceMouse Wireless by default.
To set up a new SpaceMouse controller
- Download and install driver from https://www.3dconnexion.com/service/drivers.html
- Install hidapi library through pip (make sure you run uninstall hid first if it is installed).
- Make sure SpaceMouse is connected before running the script
- (Optional) Based on the model of SpaceMouse, you might need to change the vendor id and product id that correspond to the device.
For Linux support, you can find open-source Linux drivers and SDKs online. See http://spacenav.sourceforge.net/
Classes:
| Name | Description |
|---|---|
SpaceMouse |
A minimalistic driver class for SpaceMouse with HID library. |
Functions:
| Name | Description |
|---|---|
convert |
Converts SpaceMouse message to commands. |
nms_max_axis |
Suppress all but the axis with the maximum |value|. |
scale_to_control |
Normalize raw HID readings to target range. |
to_int16 |
Convert two 8 bit bytes to a signed 16 bit integer. |
Attributes:
| Name | Type | Description |
|---|---|---|
AxisSpec |
|
|
SPACE_MOUSE_SPEC |
|
|
SPACE_MOUSE_WIRELESS_SPEC |
|
|
space_mouse |
|
AxisSpec
module-attribute
¶
SPACE_MOUSE_SPEC
module-attribute
¶
SPACE_MOUSE_SPEC = {'x': AxisSpec(channel=1, byte1=3, byte2=4, scale=-1), 'y': AxisSpec(channel=1, byte1=1, byte2=2, scale=-1), 'z': AxisSpec(channel=1, byte1=5, byte2=6, scale=-1), 'roll': AxisSpec(channel=1, byte1=5, byte2=6, scale=-1), 'pitch': AxisSpec(channel=1, byte1=3, byte2=4, scale=-1), 'yaw': AxisSpec(channel=1, byte1=1, byte2=2, scale=1)}
SPACE_MOUSE_WIRELESS_SPEC
module-attribute
¶
SPACE_MOUSE_WIRELESS_SPEC = {'x': AxisSpec(channel=1, byte1=1, byte2=2, scale=1), 'y': AxisSpec(channel=1, byte1=3, byte2=4, scale=-1), 'z': AxisSpec(channel=1, byte1=5, byte2=6, scale=-1), 'roll': AxisSpec(channel=1, byte1=7, byte2=8, scale=-1), 'pitch': AxisSpec(channel=1, byte1=9, byte2=10, scale=-1), 'yaw': AxisSpec(channel=1, byte1=11, byte2=12, scale=1)}
SpaceMouse
¶
A minimalistic driver class for SpaceMouse with HID library.
Note: Use hid.enumerate() to view all USB human interface devices (HID). Make sure SpaceMouse is detected before running the script. You can look up its vendor/product id from this method.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
env
|
RobotEnv
|
The environment which contains the robot(s) to control using this device. |
required |
pos_sensitivity
|
float
|
Magnitude of input position command scaling |
1.0
|
rot_sensitivity
|
float
|
Magnitude of scale input rotation commands scaling |
1.0
|
Methods:
| Name | Description |
|---|---|
get_controller_state |
Grabs the current state of the 3D mouse. |
rotation_matrix |
|
run |
Listener method that keeps pulling new messages. |
start_control |
Method that should be called externally before controller can |
Attributes:
| Name | Type | Description |
|---|---|---|
control |
Grabs current pose of Spacemouse |
|
device |
|
|
gripper |
|
|
gripper_state |
|
|
last_button_state |
|
|
last_reset_button_state |
|
|
pos_sensitivity |
|
|
product_id |
|
|
reset_button_state |
|
|
rot_sensitivity |
|
|
rotation |
|
|
thread |
|
|
vendor_id |
|
Source code in molmo_spaces/utils/devices/spacemouse.py
control
property
¶
Grabs current pose of Spacemouse
Returns:
| Type | Description |
|---|---|
|
np.array: 6-DoF control value |
rotation
instance-attribute
¶
get_controller_state
¶
Grabs the current state of the 3D mouse.
Returns:
| Name | Type | Description |
|---|---|---|
dict |
A dictionary containing dpos, orn, unmodified orn, grasp, and reset |
Source code in molmo_spaces/utils/devices/spacemouse.py
rotation_matrix
¶
run
¶
Listener method that keeps pulling new messages.
Source code in molmo_spaces/utils/devices/spacemouse.py
start_control
¶
Method that should be called externally before controller can start receiving commands.
convert
¶
Converts SpaceMouse message to commands.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
b1
|
int
|
8-bit byte |
required |
b2
|
int
|
8-bit byte |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
Scaled value from Spacemouse message |
Source code in molmo_spaces/utils/devices/spacemouse.py
nms_max_axis
¶
Suppress all but the axis with the maximum |value|. The max axis is set to -1 or 1 based on sign, others are zeroed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
control
|
ndarray
|
6D input vector, assumed scaled in [-1, 1] |
required |
threshold
|
float
|
minimum |value| to count as valid input |
0.6
|
Returns:
| Type | Description |
|---|---|
|
np.ndarray: filtered control vector with only max direction |
Source code in molmo_spaces/utils/devices/spacemouse.py
scale_to_control
¶
Normalize raw HID readings to target range.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
int
|
Raw reading from HID |
required |
axis_scale
|
float
|
(Inverted) scaling factor for mapping raw input value |
350.0
|
min_v
|
float
|
Minimum limit after scaling |
-1.0
|
max_v
|
float
|
Maximum limit after scaling |
1.0
|
Returns:
| Name | Type | Description |
|---|---|---|
float |
Clipped, scaled input from HID |
Source code in molmo_spaces/utils/devices/spacemouse.py
to_int16
¶
Convert two 8 bit bytes to a signed 16 bit integer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y1
|
int
|
8-bit byte |
required |
y2
|
int
|
8-bit byte |
required |
Returns:
| Name | Type | Description |
|---|---|---|
int |
16-bit integer |
Source code in molmo_spaces/utils/devices/spacemouse.py
distance_transform_utils
¶
Functions:
| Name | Description |
|---|---|
cost_function |
|
get_pixel_cost |
|
get_segment_cost |
|
make_discrete_path |
|
make_distance_transform |
|
make_grid_graph |
|
simplify_path_greedy |
|
cost_function
¶
get_pixel_cost
¶
get_segment_cost
¶
make_discrete_path
¶
make_discrete_path(graph, source_row, source_col, target_row, target_col, distance_transform, weight_exp, grid_spacing, max_distance_to_obstacle)
Source code in molmo_spaces/utils/distance_transform_utils.py
make_distance_transform
¶
Source code in molmo_spaces/utils/distance_transform_utils.py
make_grid_graph
¶
Source code in molmo_spaces/utils/distance_transform_utils.py
simplify_path_greedy
¶
Source code in molmo_spaces/utils/distance_transform_utils.py
eval_camera_randomization_utils
¶
Level → value scaling for camera and light randomization.
Level is in [0, 100]: 0 = no randomization, 100 = maximum. Each parameter has an output range (min, max) and a mapping function that maps level to a value in that range.
Functions:
| Name | Description |
|---|---|
add_eval_camera_args |
Add eval camera CLI flags to an argparse parser. |
apply_camera_perturbation |
Sample a camera pose by perturbing the reference pose in spherical coordinates. |
apply_camera_randomization_level |
Return a copy of the camera system config with randomization params set via interpolation. |
build_eval_camera_config_from_args |
Build a FrankaEvalCameraSystem from parsed CLI args, or return None if not requested. |
debug |
|
derive_episode_camera_seed |
Derive a deterministic seed for camera randomization from episode identity. |
piecewise_linear |
Piecewise linear interpolation. |
resolve_reference_pose |
Compute world-frame pos/forward/up from the reference body and return an updated copy. |
setup_eval_cameras |
Set up eval cameras: wrist via MJCF, exo via spherical perturbation. |
Attributes:
| Name | Type | Description |
|---|---|---|
log |
|
add_eval_camera_args
¶
Add eval camera CLI flags to an argparse parser.
These flags are shared across all JSON eval entry points (standalone and distributed).
Source code in molmo_spaces/utils/eval_camera_randomization_utils.py
apply_camera_perturbation
¶
apply_camera_perturbation(cam: EvalExocentricCameraConfig, ref_forward: ndarray, ref_up: ndarray, workspace_center: ndarray, rng: RandomState) -> tuple[ndarray, ndarray, ndarray, float]
Sample a camera pose by perturbing the reference pose in spherical coordinates.
Orientation is computed via slerp between the calibrated rotation (from the
shoulder-mount quaternion) and a lookat-at-workspace-center rotation, controlled
by workspace_center_weight. At weight 0 the camera keeps its original
orientation; at weight 1 it looks straight at the workspace center (plus
optional noise).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cam
|
EvalExocentricCameraConfig
|
Resolved EvalExocentricCameraConfig (pos/forward/up already set). |
required |
ref_forward
|
ndarray
|
Forward vector from the resolved quaternion-based reference pose. |
required |
ref_up
|
ndarray
|
Up vector from the resolved quaternion-based reference pose. |
required |
workspace_center
|
ndarray
|
3D point the camera should look at. |
required |
rng
|
RandomState
|
Seeded random state for deterministic sampling. |
required |
Returns:
| Type | Description |
|---|---|
tuple[ndarray, ndarray, ndarray, float]
|
(pos, forward, up, fov) in world frame. |
Source code in molmo_spaces/utils/eval_camera_randomization_utils.py
219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 | |
apply_camera_randomization_level
¶
apply_camera_randomization_level(camera_config: FrankaEvalCameraSystem, level: float) -> FrankaEvalCameraSystem
Return a copy of the camera system config with randomization params set via interpolation.
Source code in molmo_spaces/utils/eval_camera_randomization_utils.py
build_eval_camera_config_from_args
¶
build_eval_camera_config_from_args(args: Namespace) -> FrankaEvalCameraSystem | None
Build a FrankaEvalCameraSystem from parsed CLI args, or return None if not requested.
Returns None if --use_eval_cameras was not passed. Otherwise, creates the eval camera system with the requested camera subset and randomization level applied.
Source code in molmo_spaces/utils/eval_camera_randomization_utils.py
debug
¶
Source code in molmo_spaces/utils/eval_camera_randomization_utils.py
derive_episode_camera_seed
¶
Derive a deterministic seed for camera randomization from episode identity.
The seed is a hash of fields that uniquely identify an episode so that the same (episode, level) pair always produces the same camera placement.
Source code in molmo_spaces/utils/eval_camera_randomization_utils.py
piecewise_linear
¶
Piecewise linear interpolation.
Source code in molmo_spaces/utils/eval_camera_randomization_utils.py
resolve_reference_pose
¶
Compute world-frame pos/forward/up from the reference body and return an updated copy.
Source code in molmo_spaces/utils/eval_camera_randomization_utils.py
setup_eval_cameras
¶
setup_eval_cameras(env: CPUMujocoEnv, eval_system: FrankaEvalCameraSystem, task_relevant_bodies: list[str], workspace_center: ndarray, rng_seed: int) -> None
Set up eval cameras: wrist via MJCF, exo via spherical perturbation.
For each camera in eval_system:
- Wrist (
MjcfCameraConfig): placed directly, no visibility check. - Exo (
EvalExocentricCameraConfig): reference pose is resolved from the shoulder mount, thenapply_camera_perturbationsamples a pose in spherical coords around the workspace center. Multiple attempts are made to satisfy visibility constraints; if all fail aCameraPlacementErroris raised.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
env
|
CPUMujocoEnv
|
CPUMujocoEnv with the scene already set up. |
required |
eval_system
|
FrankaEvalCameraSystem
|
FrankaEvalCameraSystem with randomization ranges already interpolated for the desired level. |
required |
task_relevant_bodies
|
list[str]
|
Body names to check visibility against. |
required |
workspace_center
|
ndarray
|
3D centroid of task-relevant objects. |
required |
rng_seed
|
int
|
Deterministic seed for repeatable placement. |
required |
Raises:
| Type | Description |
|---|---|
CameraPlacementError
|
If an exo camera cannot be placed with
visibility constraints after |
Source code in molmo_spaces/utils/eval_camera_randomization_utils.py
384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 | |
eval_utils
¶
Evaluation utilities for logging stats and videos to wandb.
Classes:
| Name | Description |
|---|---|
EpisodeResult |
Result from a single evaluation episode. |
Functions:
| Name | Description |
|---|---|
collect_episode_results |
Scan output directory for HDF5 files and extract episode results. |
compose_episode_videos |
Compose videos from multiple cameras for each episode. |
compose_videos_side_by_side |
Compose multiple videos side-by-side into a single video. |
compute_eval_stats |
Compute aggregate statistics from evaluation results. |
create_video_results_table |
Create and log a WandB table with videos and episode metadata. |
load_video_frames |
Load frames from a video file. |
log_eval_results_to_wandb |
Log evaluation results and composed videos to wandb. |
log_eval_videos_to_wandb |
Find and log evaluation videos to wandb. |
parse_obs_scene |
Parse obs_scene from HDF5 dataset. |
Attributes:
| Name | Type | Description |
|---|---|---|
log |
|
EpisodeResult
dataclass
¶
EpisodeResult(episode_idx: int, house_id: int | str, success: bool, num_steps: int, task_description: str | None = None, object_name: str | None = None, seed: int | None = None, data_file_path: Path | None = None, oracle_done: bool | None = None, metadata: dict[str, Any] = dict())
Result from a single evaluation episode.
Attributes:
| Name | Type | Description |
|---|---|---|
episode_idx |
int
|
Index of the episode within its house. |
house_id |
int | str
|
House identifier (int or str like "house_5"). |
success |
bool
|
Whether the episode was successful (at end of episode). |
num_steps |
int
|
Number of steps taken in the episode. |
task_description |
str | None
|
Natural language task description. |
object_name |
str | None
|
Name of the target object (if applicable). |
seed |
int | None
|
Random seed used for the episode. |
data_file_path |
Path | None
|
Path to the HDF5 file containing this episode's data. Use this together with episode_idx to uniquely identify an episode, especially when there are multiple batches per house. |
oracle_done |
bool | None
|
Whether success was achieved at ANY point during the episode. |
metadata |
dict[str, Any]
|
Additional metadata about the episode. |
metadata
class-attribute
instance-attribute
¶
collect_episode_results
¶
collect_episode_results(output_dir: Path) -> list[EpisodeResult]
Scan output directory for HDF5 files and extract episode results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
output_dir
|
Path
|
Directory containing evaluation output |
required |
Returns:
| Type | Description |
|---|---|
list[EpisodeResult]
|
List of EpisodeResult objects. Each result includes data_file_path |
list[EpisodeResult]
|
to uniquely identify the episode even when there are multiple batches |
list[EpisodeResult]
|
per house. |
Source code in molmo_spaces/utils/eval_utils.py
compose_episode_videos
¶
compose_episode_videos(eval_dir: Path, camera_names: list[str], output_dir: Path | None = None, success_status: dict[str, bool] | None = None) -> dict[str, Path]
Compose videos from multiple cameras for each episode.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
eval_dir
|
Path
|
Directory containing evaluation videos |
required |
camera_names
|
list[str]
|
List of camera names to compose |
required |
output_dir
|
Path | None
|
Directory to save composed videos (defaults to eval_dir/composed) |
None
|
success_status
|
dict[str, bool] | None
|
Optional dict mapping episode keys to success status |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, Path]
|
Dict mapping episode keys to composed video paths |
Source code in molmo_spaces/utils/eval_utils.py
compose_videos_side_by_side
¶
compose_videos_side_by_side(video_paths: list[Path], output_path: Path, target_height: int = 368, target_width: int = 1280) -> Path | None
Compose multiple videos side-by-side into a single video.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
video_paths
|
list[Path]
|
List of paths to input videos |
required |
output_path
|
Path
|
Path for the output composed video |
required |
target_height
|
int
|
Target height for the output video |
368
|
target_width
|
int
|
Target width for the output video |
1280
|
Returns:
| Type | Description |
|---|---|
Path | None
|
Path to the composed video, or None if failed |
Source code in molmo_spaces/utils/eval_utils.py
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 | |
compute_eval_stats
¶
compute_eval_stats(results: list[EpisodeResult]) -> dict[str, Any]
Compute aggregate statistics from evaluation results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
results
|
list[EpisodeResult]
|
List of episode results |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dict of aggregate statistics |
Source code in molmo_spaces/utils/eval_utils.py
create_video_results_table
¶
create_video_results_table(episode_data: list[dict], table_name: str = 'eval/video_results') -> None
Create and log a WandB table with videos and episode metadata.
This is a shared utility for both distributed and non-distributed evaluation. Each dict in episode_data should contain: - video_path: Path to the video file (required) - task_description: Natural language task description - object_name: Target object name - house_id: House identifier - episode_idx: Episode index - num_steps: Number of steps taken - success: Boolean success status (at end of episode) - oracle_done: Boolean, success at ANY point during episode (optional) - source_episode_path: Original episode path (optional, for provenance)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
episode_data
|
list[dict]
|
List of dicts with video paths and metadata |
required |
table_name
|
str
|
Name for the WandB table (default: "eval/video_results") |
'eval/video_results'
|
Source code in molmo_spaces/utils/eval_utils.py
load_video_frames
¶
Load frames from a video file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
video_path
|
Path
|
Path to the video file |
required |
Returns:
| Type | Description |
|---|---|
tuple[list[ndarray], float]
|
Tuple of (list of frames as numpy arrays in RGB format, fps) |
Source code in molmo_spaces/utils/eval_utils.py
log_eval_results_to_wandb
¶
log_eval_results_to_wandb(results: list[EpisodeResult], composed_videos: dict[str, Path] | None = None) -> None
Log evaluation results and composed videos to wandb.
Creates a video table with composed videos in the first column and metadata (task description, episode length, success/fail, etc.) in subsequent columns.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
results
|
list[EpisodeResult]
|
List of episode results |
required |
composed_videos
|
dict[str, Path] | None
|
Optional dict mapping episode keys to composed video paths |
None
|
Source code in molmo_spaces/utils/eval_utils.py
log_eval_videos_to_wandb
¶
Find and log evaluation videos to wandb.
DEPRECATED: Use log_eval_results_to_wandb with compose_episode_videos instead.
Source code in molmo_spaces/utils/eval_utils.py
parse_obs_scene
¶
Parse obs_scene from HDF5 dataset.
Source code in molmo_spaces/utils/eval_utils.py
fisheye_warping
¶
GPU-accelerated fisheye lens distortion warping for camera images.
This module provides functions to apply fisheye distortion to images and videos, simulating the effect of wide-angle GoPro cameras. The warping is GPU-accelerated using PyTorch and uses a radial distortion model with parameters k1, k2, k3, k4.
Functions:
| Name | Description |
|---|---|
apply_fisheye_warping_to_video_file |
Apply fisheye warping to a video file and save the result. |
calc_camera_intrinsics |
Calculate camera intrinsic matrix from field of view and frame dimensions. |
get_default_distortion_map |
Get the default distortion map for a camera, loading from disk if necessary. |
get_randomized_distortion_parameters |
Get distortion parameters with random perturbations. |
load_frames_from_mp4 |
Load frames from an MP4 video file. |
make_distorted_grid |
Create a distorted sampling grid for warping images. |
warp_image_gpu |
Apply fisheye distortion to an image using GPU acceleration. |
warp_point |
Warp a single point through the fisheye distortion. |
warp_video_frames_batch |
Apply fisheye warping to a list of video frames in batches. |
warp_video_gpu |
Apply fisheye distortion to a video using GPU acceleration. |
apply_fisheye_warping_to_video_file
¶
apply_fisheye_warping_to_video_file(video_path: Path | str, output_path: Path | str, K: ndarray, distortion_parameters: dict, crop_percent: float, output_shape: tuple[int, int] | None, device: device | None = None) -> bool
Apply fisheye warping to a video file and save the result.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
video_path
|
Path | str
|
Path to input video |
required |
output_path
|
Path | str
|
Path to save warped video |
required |
K
|
ndarray
|
Camera intrinsic matrix |
required |
distortion_parameters
|
dict
|
Distortion parameters |
required |
crop_percent
|
float
|
Crop percentage after warping |
required |
output_shape
|
tuple[int, int] | None
|
Output size (H, W) or None |
required |
device
|
device | None
|
PyTorch device (defaults to CUDA if available) |
None
|
Returns:
| Type | Description |
|---|---|
bool
|
True if successful, False otherwise |
Source code in molmo_spaces/utils/fisheye_warping.py
calc_camera_intrinsics
¶
Calculate camera intrinsic matrix from field of view and frame dimensions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fov_y
|
float
|
Vertical field of view in degrees |
required |
frame_height
|
int
|
Image height in pixels |
required |
frame_width
|
int
|
Image width in pixels |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
3x3 camera intrinsic matrix K |
Source code in molmo_spaces/utils/fisheye_warping.py
get_default_distortion_map
¶
Get the default distortion map for a camera, loading from disk if necessary.
Source code in molmo_spaces/utils/fisheye_warping.py
get_randomized_distortion_parameters
¶
get_randomized_distortion_parameters(distortion_parameters: dict | None = None, randomization_factor: float = 0.001) -> dict
Get distortion parameters with random perturbations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
distortion_parameters
|
dict | None
|
Base distortion parameters (uses DEFAULT if None) |
None
|
randomization_factor
|
float
|
Magnitude of random perturbation |
0.001
|
Returns:
| Type | Description |
|---|---|
dict
|
Dictionary of randomized distortion parameters |
Source code in molmo_spaces/utils/fisheye_warping.py
load_frames_from_mp4
¶
Load frames from an MP4 video file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
video_path
|
Path | str
|
Path to MP4 video file |
required |
Returns:
| Type | Description |
|---|---|
list[ndarray]
|
List of frames as numpy arrays (H, W, C) in RGB format |
float
|
FPS of the video |
Source code in molmo_spaces/utils/fisheye_warping.py
make_distorted_grid
¶
make_distorted_grid(H: int, W: int, K: ndarray, distortion_parameters: dict, device: device | None = None, x_normalized: Tensor | None = None, y_normalized: Tensor | None = None, r: Tensor | None = None) -> Tensor
Create a distorted sampling grid for warping images.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
H
|
int
|
Image height |
required |
W
|
int
|
Image width |
required |
K
|
ndarray
|
Camera intrinsic matrix (3x3) |
required |
distortion_parameters
|
dict
|
Dict with keys k1, k2, k3, k4 |
required |
device
|
device | None
|
PyTorch device (defaults to CUDA if available) |
None
|
x_normalized
|
Tensor | None
|
Pre-computed normalized x coordinates (optional) |
None
|
y_normalized
|
Tensor | None
|
Pre-computed normalized y coordinates (optional) |
None
|
r
|
Tensor | None
|
Pre-computed radial distances (optional) |
None
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Grid tensor of shape [1, H, W, 2] for use with grid_sample |
Source code in molmo_spaces/utils/fisheye_warping.py
warp_image_gpu
¶
warp_image_gpu(image: Tensor, K: ndarray | None = None, distortion_parameters: dict | None = None, crop_percent: float = DEFAULT_CROP_PERCENT, grid: Tensor | None = None, x_normalized: Tensor | None = None, y_normalized: Tensor | None = None, r: Tensor | None = None, output_shape: tuple[int, int] | None = None) -> Tensor
Apply fisheye distortion to an image using GPU acceleration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
Tensor
|
Input image tensor of shape [B, C, H, W] |
required |
K
|
ndarray | None
|
Camera intrinsic matrix (required if grid is None) |
None
|
distortion_parameters
|
dict | None
|
Distortion parameters (required if grid is None) |
None
|
crop_percent
|
float
|
Percentage to crop from each edge after warping |
DEFAULT_CROP_PERCENT
|
grid
|
Tensor | None
|
Pre-computed distortion grid (optional) |
None
|
x_normalized
|
Tensor | None
|
Pre-computed normalized x coordinates (optional) |
None
|
y_normalized
|
Tensor | None
|
Pre-computed normalized y coordinates (optional) |
None
|
r
|
Tensor | None
|
Pre-computed radial distances (optional) |
None
|
output_shape
|
tuple[int, int] | None
|
Target output size (H, W) for resizing (optional) |
None
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Warped image tensor |
Source code in molmo_spaces/utils/fisheye_warping.py
warp_point
¶
warp_point(pixel_x: float, pixel_y: float, K: ndarray, distortion_parameters: dict, crop_percent: float, output_shape: tuple[int, int]) -> tuple[int, int]
Warp a single point through the fisheye distortion.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pixel_x
|
float
|
X coordinate in original image |
required |
pixel_y
|
float
|
Y coordinate in original image |
required |
K
|
ndarray
|
Camera intrinsic matrix |
required |
distortion_parameters
|
dict
|
Distortion parameters |
required |
crop_percent
|
float
|
Crop percentage used in warping |
required |
output_shape
|
tuple[int, int]
|
Output image size (H, W) |
required |
Returns:
| Type | Description |
|---|---|
tuple[int, int]
|
Tuple of (warped_x, warped_y) coordinates |
Source code in molmo_spaces/utils/fisheye_warping.py
warp_video_frames_batch
¶
warp_video_frames_batch(frames: list[ndarray], K: ndarray, distortion_parameters: dict, crop_percent: float, output_shape: tuple[int, int] | None, device: device, batch_size: int = 16) -> list[ndarray]
Apply fisheye warping to a list of video frames in batches.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
frames
|
list[ndarray]
|
List of frames as numpy arrays (H, W, C) |
required |
K
|
ndarray
|
Camera intrinsic matrix |
required |
distortion_parameters
|
dict
|
Distortion parameters |
required |
crop_percent
|
float
|
Crop percentage after warping |
required |
output_shape
|
tuple[int, int] | None
|
Output size (H, W) or None |
required |
device
|
device
|
PyTorch device |
required |
batch_size
|
int
|
Number of frames to process at once |
16
|
Returns:
| Type | Description |
|---|---|
list[ndarray]
|
List of warped frames as numpy arrays (H, W, C) |
Source code in molmo_spaces/utils/fisheye_warping.py
warp_video_gpu
¶
warp_video_gpu(video: ndarray | Tensor, K: ndarray | None = None, randomize_distortion_parameters: bool = False, crop_percent: float = DEFAULT_CROP_PERCENT, output_shape: tuple[int, int] | None = None) -> ndarray
Apply fisheye distortion to a video using GPU acceleration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
video
|
ndarray | Tensor
|
Input video as numpy array [T, H, W, C] or tensor |
required |
K
|
ndarray | None
|
Camera intrinsic matrix (computed from defaults if None) |
None
|
randomize_distortion_parameters
|
bool
|
Whether to randomize distortion params |
False
|
crop_percent
|
float
|
Percentage to crop from each edge after warping |
DEFAULT_CROP_PERCENT
|
output_shape
|
tuple[int, int] | None
|
Target output size (H, W) for resizing (optional) |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
Warped video as numpy array [T, H, W, C] with uint8 values |
Source code in molmo_spaces/utils/fisheye_warping.py
function_utils
¶
Functions:
| Name | Description |
|---|---|
make_lenient |
Wrap |
make_lenient
¶
Wrap func so extra args/kwargs are silently dropped.
Args matching a declared parameter are forwarded; leftover positional and
keyword args flow into func's *args / **kwargs when present, and
are dropped otherwise. Raises TypeError on double-binding and (via
Python's normal call machinery) on missing required parameters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
func
|
Callable
|
The function (or class) to wrap. |
required |
Returns:
| Type | Description |
|---|---|
Callable
|
A picklable callable. |
Note
Positional-only parameters (after /) are not supported.
Source code in molmo_spaces/utils/function_utils.py
grasp_sample
¶
This module contains functionality for filtering and sampling grasps based on heuristics.
Functions:
| Name | Description |
|---|---|
add_grasp_collision_bodies |
Add grasp collision bodies to the scene. |
get_feasible_grasp_idx |
|
get_grasp_collision_body_name |
|
get_noncolliding_grasp_mask |
|
select_grasp_pose |
|
Attributes:
| Name | Type | Description |
|---|---|---|
log |
|
add_grasp_collision_bodies
¶
add_grasp_collision_bodies(spec: MjSpec, num_grasps: int, grasp_width: float, grasp_length: float, grasp_height: float, grasp_base_pos: ndarray)
Add grasp collision bodies to the scene.
Source code in molmo_spaces/utils/grasp_sample.py
get_feasible_grasp_idx
¶
get_feasible_grasp_idx(mg_id: str, robot: Robot, grasp_poses_world: ndarray, n_ik_checks: int, ik_batch_size: int)
Source code in molmo_spaces/utils/grasp_sample.py
get_grasp_collision_body_name
¶
get_noncolliding_grasp_mask
¶
get_noncolliding_grasp_mask(mj_model: MjModel, mj_data: MjData, grasp_poses_world: ndarray, batch_size: int) -> ndarray
Source code in molmo_spaces/utils/grasp_sample.py
select_grasp_pose
¶
select_grasp_pose(env: CPUMujocoEnv, grasp_poses_world: ndarray, object_pose: ndarray, check_collision: bool, n_collision_checks: int, collision_batch_size: int, check_ik: bool, n_ik_checks: int, ik_batch_size: int, pos_cost_weight: float = 1.0, rot_cost_weight: float = 0.01, vertical_cost_weight: float = 2.0, horizontal_cost_weight: float = 0, com_dist_cost_weight: float = 8.0) -> ndarray
Source code in molmo_spaces/utils/grasp_sample.py
154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 | |
grasps
¶
This module contains functionality for loading grasps from registered grasp libraries.
Note: this module caches aggressively, so grasp/asset libraries must be registered before the first call into this module. New registrations will not be visible until the caches are cleared.
Functions:
| Name | Description |
|---|---|
flip_grasps |
|
get_grasp_libraries_for_object |
|
get_joint_grasp_path |
|
get_joint_grasps |
Load the first available joint grasps for a given object and joint in the world frame. |
get_pickup_grasp_path |
|
get_pickup_grasps |
Load the first available pickup grasps for a given object in the world frame. |
has_joint_grasp_path |
|
has_pickup_grasp_path |
|
has_valid_joint_grasps |
|
has_valid_pickup_grasps |
|
load_joint_grasps |
Load the first available joint grasps for a given object and joint in the joint's local frame. |
load_pickup_grasps |
Load the first available pickup grasps for a given object in the local frame. |
sanitize_grasp_library_list_and_cache |
|
Attributes:
| Name | Type | Description |
|---|---|---|
log |
|
flip_grasps
¶
get_grasp_libraries_for_object
¶
get_joint_grasp_path
¶
get_joint_grasp_path(uid: str, joint_name: str, grasp_libraries: Sequence[str] | None = None) -> Path | None
Source code in molmo_spaces/utils/grasps.py
get_joint_grasps
¶
get_joint_grasps(env: CPUMujocoEnv, obj: MlSpacesArticulationObject, joint_idx: int, include_flipped: bool = True, grasp_libraries: list[str] | None = None) -> tuple[ndarray, ndarray]
Load the first available joint grasps for a given object and joint in the world frame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
env
|
CPUMujocoEnv
|
The environment |
required |
obj
|
MlSpacesArticulationObject
|
The object |
required |
joint_idx
|
int
|
The index of the joint |
required |
include_flipped
|
bool
|
Whether to include flipped grasps |
True
|
grasp_libraries
|
list[str] | None
|
The grasp libraries to use (defaults to all available libraries for the object) |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
Numpy array of shape (N, 4, 4) containing the grasp poses in the world frame. |
ndarray
|
Numpy array of shape (4, 4) containing the joint body pose in the world frame. |
Source code in molmo_spaces/utils/grasps.py
get_pickup_grasp_path
¶
Source code in molmo_spaces/utils/grasps.py
get_pickup_grasps
¶
get_pickup_grasps(env: CPUMujocoEnv, obj: MlSpacesObject, include_flipped: bool = True, grasp_libraries: list[str] | None = None) -> ndarray
Load the first available pickup grasps for a given object in the world frame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
env
|
CPUMujocoEnv
|
The environment |
required |
obj
|
MlSpacesObject
|
The object |
required |
include_flipped
|
bool
|
Whether to include flipped grasps |
True
|
grasp_libraries
|
list[str] | None
|
The grasp libraries to use (defaults to all available libraries for the object) |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
A numpy array of shape (N, 4, 4) containing the grasp poses in the world frame. |
Source code in molmo_spaces/utils/grasps.py
has_joint_grasp_path
¶
has_joint_grasp_path(uid: str, joint_name: str, grasp_libraries: Sequence[str] | None = None) -> bool
Source code in molmo_spaces/utils/grasps.py
has_pickup_grasp_path
¶
has_valid_joint_grasps
¶
has_valid_joint_grasps(uid: str, joint_name: str, num_grasps: int = 1, grasp_libraries: Sequence[str] | None = None) -> bool
Source code in molmo_spaces/utils/grasps.py
has_valid_pickup_grasps
¶
has_valid_pickup_grasps(uid: str, num_grasps: int = 1, grasp_libraries: Sequence[str] | None = None) -> bool
Source code in molmo_spaces/utils/grasps.py
load_joint_grasps
¶
load_joint_grasps(uid: str, joint_name: str, grasp_libraries: list[str] | None = None, num_grasps: int = 50) -> ndarray
Load the first available joint grasps for a given object and joint in the joint's local frame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
uid
|
str
|
The asset ID of the object |
required |
joint_name
|
str
|
The name of the joint |
required |
grasp_libraries
|
list[str] | None
|
The grasp libraries to use (defaults to all available libraries for the object) |
None
|
num_grasps
|
int
|
The maximum number of grasps to load |
50
|
Returns:
| Type | Description |
|---|---|
ndarray
|
A numpy array of shape (N, 4, 4) containing the grasp poses in the joint's local frame. |
Source code in molmo_spaces/utils/grasps.py
load_pickup_grasps
¶
load_pickup_grasps(uid: str, grasp_libraries: list[str] | None = None, num_grasps: int = 50) -> ndarray
Load the first available pickup grasps for a given object in the local frame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
uid
|
str
|
The asset ID of the object |
required |
grasp_libraries
|
list[str] | None
|
The grasp libraries to use (defaults to all available libraries for the object) |
None
|
num_grasps
|
int
|
The maximum number of grasps to load |
50
|
Returns:
| Type | Description |
|---|---|
ndarray
|
A numpy array of shape (N, 4, 4) containing the grasp poses in the local frame |
Source code in molmo_spaces/utils/grasps.py
sanitize_grasp_library_list_and_cache
¶
Source code in molmo_spaces/utils/grasps.py
lazy_loading_utils
¶
Classes:
| Name | Description |
|---|---|
UserAssetLibraryIndexEntry |
|
UserGraspLibraryIndex |
|
Functions:
| Name | Description |
|---|---|
add_install_prefixes |
|
debug_lazy_search |
|
find_object_paths |
|
get_thor_uid_to_xmls |
|
get_user_grasp_library_index |
|
get_user_library_index |
|
install_grasps_for_scene |
|
install_objects_for_scene |
|
install_scene_from_path |
|
install_scene_from_source_index |
|
install_scene_with_objects_and_grasps_from_path |
|
install_uid |
|
locate_uid_package |
Locate the package containing the given object UID. |
Attributes:
| Name | Type | Description |
|---|---|---|
UserAssetLibraryIndex |
|
UserAssetLibraryIndex
module-attribute
¶
UserAssetLibraryIndex = TypeAdapter(dict[str, UserAssetLibraryIndexEntry])
UserAssetLibraryIndexEntry
¶
Bases: BaseModel
Attributes:
| Name | Type | Description |
|---|---|---|
metadata_npz_path |
Path | None
|
|
metadata_path |
Path
|
|
object_path |
Path
|
|
uid |
str
|
|
UserGraspLibraryIndex
¶
Bases: BaseModel
Attributes:
| Name | Type | Description |
|---|---|---|
articulated_grasp_paths |
dict[str, dict[str, dict[str, Path]]]
|
|
grasp_paths |
dict[str, dict[str, Path]]
|
|
add_install_prefixes
¶
debug_lazy_search
¶
Source code in molmo_spaces/utils/lazy_loading_utils.py
find_object_paths
¶
Source code in molmo_spaces/utils/lazy_loading_utils.py
get_thor_uid_to_xmls
cached
¶
get_user_grasp_library_index
cached
¶
get_user_library_index
cached
¶
install_grasps_for_scene
¶
Source code in molmo_spaces/utils/lazy_loading_utils.py
install_objects_for_scene
¶
Source code in molmo_spaces/utils/lazy_loading_utils.py
install_scene_from_path
¶
Source code in molmo_spaces/utils/lazy_loading_utils.py
install_scene_from_source_index
¶
Source code in molmo_spaces/utils/lazy_loading_utils.py
install_scene_with_objects_and_grasps_from_path
¶
install_scene_with_objects_and_grasps_from_path(xml_path, grasp_sources=('droid_objaverse',), exclude_thor=True)
Source code in molmo_spaces/utils/lazy_loading_utils.py
install_uid
¶
Source code in molmo_spaces/utils/lazy_loading_utils.py
locate_uid_package
¶
locate_uid_package(uid: str, extension: str = 'xml') -> tuple[str, str | None, Path] | tuple[None, None, None]
Locate the package containing the given object UID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
uid
|
str
|
The UID of the object to locate. |
required |
extension
|
str
|
The extension of the file to locate. |
'xml'
|
Returns:
| Type | Description |
|---|---|
tuple[str, str | None, Path] | tuple[None, None, None]
|
A tuple containing the source, package, and XML path of the object. If the object is not found, returns (None, None, None). |
Source code in molmo_spaces/utils/lazy_loading_utils.py
lemma_utils
¶
Functions:
| Name | Description |
|---|---|
best_lemma_via_specificity |
|
is_physical_entity |
|
normalize_expression |
|
simple_lemma |
|
Attributes:
| Name | Type | Description |
|---|---|---|
PHYSICAL_ENTITY_SYNSET |
|
best_lemma_via_specificity
cached
¶
Source code in molmo_spaces/utils/lemma_utils.py
is_physical_entity
¶
normalize_expression
¶
license_utils
¶
Functions:
Attributes:
| Name | Type | Description |
|---|---|---|
ATTRIBUTION_TEMPLATE |
|
|
DEFAULT_LICENSE |
|
|
ROBOT_LICENSE |
|
ATTRIBUTION_TEMPLATE
module-attribute
¶
ATTRIBUTION_TEMPLATE = '{assets}' + f' by the {DEFAULT_LICENSE['creator_name']}, licensed under {replace('-', ' ')}.'
DEFAULT_LICENSE
module-attribute
¶
DEFAULT_LICENSE = {'license': 'CC-BY-4.0', 'license_url': 'https://creativecommons.org/licenses/by/4.0/', 'creator_name': 'Allen Institute for AI (Ai2)', 'source': 'In-house'}
grasp_targets
¶
Source code in molmo_spaces/utils/license_utils.py
ithor_resolver
¶
procthor_resolver
¶
procthor_resolver(source: str, idx: int, scene_info: SourceInfo, modalities: list[Path], variant: str = '_ceiling') -> Path
Source code in molmo_spaces/utils/license_utils.py
resolve_grasps_license
¶
Source code in molmo_spaces/utils/license_utils.py
resolve_license
¶
Source code in molmo_spaces/utils/license_utils.py
resolve_object_license
¶
Source code in molmo_spaces/utils/license_utils.py
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 | |
resolve_robot_license
¶
Source code in molmo_spaces/utils/license_utils.py
resolve_scene_license
¶
Source code in molmo_spaces/utils/license_utils.py
scene_includes
¶
Source code in molmo_spaces/utils/license_utils.py
scene_path_resolve
¶
Source code in molmo_spaces/utils/license_utils.py
validate_identifier
¶
Source code in molmo_spaces/utils/license_utils.py
validate_objaverse_identifier
¶
Source code in molmo_spaces/utils/license_utils.py
validate_thor_identifier
¶
Source code in molmo_spaces/utils/license_utils.py
linalg_utils
¶
Functions:
| Name | Description |
|---|---|
euler_yaw_to_quat |
Convert euler (0, 0, yaw) to quat (w, x, y, z) |
global_to_relative_transform |
|
homogenize |
Project a vector to homogenous coordinates. Accepts either a single vector or a batch. |
interp |
Linear interpolation of vector-valued functions of scalars. Similar to np.interp but for multi-dimensional arrays. |
inverse_homogeneous_matrix |
Compute the inverse of a 4x4 homogeneous transformation matrix. |
normalize_ang_error |
|
obb_2d |
Compute the oriented bounding box (OBB) of a set of 2D points. |
quat_to_euler_yaw |
Convert quaternion (w, x, y, z) to euler yaw (radians) |
relative_to_global_transform |
|
single_or_batch |
Decorator to allow a function to accept a single input or a batch of inputs. |
skew |
Compute the skew-symmetric matrix of a 3D vector. |
swing_twist |
Decomposes quat into a rotation around axis and a rotation around an |
transform_to_twist |
Given a 4x4 transformation matrix, return the twist as (lin_vel, ang_vel). |
twist_to_transform |
Given a linear velocity and angular velocity, return the 4x4 transformation matrix. |
euler_yaw_to_quat
¶
global_to_relative_transform
¶
homogenize
¶
Project a vector to homogenous coordinates. Accepts either a single vector or a batch.
interp
¶
interp(x: ArrayLike, xp: ArrayLike, fp: ArrayLike, left: ArrayLike | None = None, right: ArrayLike | None = None)
Linear interpolation of vector-valued functions of scalars. Similar to np.interp but for multi-dimensional arrays.
Source code in molmo_spaces/utils/linalg_utils.py
inverse_homogeneous_matrix
¶
Compute the inverse of a 4x4 homogeneous transformation matrix.
Args: matrix (numpy.ndarray): A 4x4 homogeneous transformation matrix.
Returns: numpy.ndarray: The inverse of the input matrix.
Source code in molmo_spaces/utils/linalg_utils.py
normalize_ang_error
¶
obb_2d
¶
Compute the oriented bounding box (OBB) of a set of 2D points. Parameters: points (np.ndarray): A 2D numpy array of shape (N, 2) representing the coordinates of the points.
tuple[np.ndarray, np.ndarray, np.ndarray]: A tuple containing: - pos (np.ndarray): The center position of the OBB. - minor_axis (np.ndarray): The minor axis of the OBB, i.e. half the shorter side. - major_axis (np.ndarray): The major axis of the OBB, i.e. half the longer side.
Source code in molmo_spaces/utils/linalg_utils.py
quat_to_euler_yaw
¶
relative_to_global_transform
¶
single_or_batch
¶
Decorator to allow a function to accept a single input or a batch of inputs. The decorated function should always accept and return batches.
Source code in molmo_spaces/utils/linalg_utils.py
skew
¶
swing_twist
¶
Decomposes quat into a rotation around axis and a rotation around an axis perpendicular to axis.
Note: Assumes quaternions are [w,x,y,z]
Returns quaternions (swing, twist) where quat = swing * twist, and twist is a rotation around axis
Source code in molmo_spaces/utils/linalg_utils.py
transform_to_twist
¶
Given a 4x4 transformation matrix, return the twist as (lin_vel, ang_vel). Mathematically, this is computing the logarithmic map of SE(3). Equivalent to pin.log6.
See: https://jinyongjeong.github.io/Download/SE3/jlblanco2010geometry3d_techrep.pdf (Sec 9.4.2)
Source code in molmo_spaces/utils/linalg_utils.py
twist_to_transform
¶
Given a linear velocity and angular velocity, return the 4x4 transformation matrix. Mathematically, this is computing the exponential map of SE(3). Equivalent to pin.exp6.
See: https://jinyongjeong.github.io/Download/SE3/jlblanco2010geometry3d_techrep.pdf (Sec 9.4.2)
Source code in molmo_spaces/utils/linalg_utils.py
mj_model_and_data_utils
¶
Functions:
| Name | Description |
|---|---|
body_aabb |
Computes the axis-aligned bounding box (AABB) for a body in a MuJoCo model. |
body_base_pos |
Returns the base position of a body in the world frame. |
body_pose |
|
descendant_bodies |
Get all bodies descended from a body in a MuJoCo model. |
descendant_geoms |
Get all geoms attached to descendants of a body in a MuJoCo model. |
extract_mj_names |
See https://github.com/openai/mujoco-py/blob/ab86d331c9a77ae412079c6e58b8771fe63747fc/mujoco_py/generated/wrappers.pxi#L1127 |
geom_aabb |
Computes the axis-aligned bounding box (AABB) for a list of geometries in a MuJoCo model. |
mesh_aabb |
Compute the tight AABB in world space for a mesh geom using its vertices. |
site_pose |
|
body_aabb
¶
body_aabb(model: MjModel, data: MjData, body_id: int, visible_only: bool = True) -> tuple[ndarray, ndarray]
Computes the axis-aligned bounding box (AABB) for a body in a MuJoCo model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
MjModel
|
The MuJoCo model containing the body. |
required |
data
|
MjData
|
The MuJoCo data containing the state of the model. |
required |
body_id
|
int
|
The id of the body to compute the AABB for. |
required |
visible_only
|
bool
|
Whether to only include visible geoms (groups 0-2). This can help make the AABB fit tighter. |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
tuple[ndarray, ndarray]
|
A tuple containing: - numpy.ndarray: The center of the AABB in world space. - numpy.ndarray: The x,y,z dimensions of the AABB. |
Source code in molmo_spaces/utils/mj_model_and_data_utils.py
body_base_pos
¶
Returns the base position of a body in the world frame. In XY, this is the center of the AABB, and in Z, this is the bottom of the AABB.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
MjData
|
MjData object |
required |
body_id
|
int
|
ID of the body to get the base position of. |
required |
visible_only
|
bool
|
Whether to only include visible geoms (groups 0-2). This can help make the AABB fit tighter. |
True
|
Returns:
| Type | Description |
|---|---|
ndarray
|
np.ndarray: The base position of the body in the world frame, of shape (3,). |
Source code in molmo_spaces/utils/mj_model_and_data_utils.py
body_pose
¶
descendant_bodies
¶
Get all bodies descended from a body in a MuJoCo model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
MjModel
|
The MuJoCo model to use. |
required |
body_id
|
int
|
The id of the body to get the descendants of. |
required |
Returns:
| Type | Description |
|---|---|
|
set[int]: A set of the ids of the bodies descended from the body, including the body itself. |
Source code in molmo_spaces/utils/mj_model_and_data_utils.py
descendant_geoms
¶
Get all geoms attached to descendants of a body in a MuJoCo model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
MjModel
|
The MuJoCo model to use. |
required |
body_id
|
int
|
The id of the body to get the geoms of. |
required |
visible_only
|
bool
|
Whether to only include visible geoms (groups 0-2). |
True
|
Returns:
| Type | Description |
|---|---|
list[int]
|
list[int]: A sorted list of the ids of the geoms attached to descendants of the body, or the body itself. |
Source code in molmo_spaces/utils/mj_model_and_data_utils.py
extract_mj_names
¶
See https://github.com/openai/mujoco-py/blob/ab86d331c9a77ae412079c6e58b8771fe63747fc/mujoco_py/generated/wrappers.pxi#L1127
Source code in molmo_spaces/utils/mj_model_and_data_utils.py
geom_aabb
¶
geom_aabb(model: MjModel, data: MjData, geom_ids: list[int], tight_mesh: bool = True) -> tuple[ndarray, ndarray]
Computes the axis-aligned bounding box (AABB) for a list of geometries in a MuJoCo model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
MjModel
|
The MuJoCo model containing the geometries. |
required |
data
|
MjData
|
The MuJoCo data containing the state of the model. |
required |
geom_ids
|
list[int]
|
A list of geometry IDs for which to compute the AABB. |
required |
tight_mesh
|
bool
|
Whether to compute the tight AABB for mesh geoms. If False, the AABB will be computed using the geom_aabb field, and may not be tight in world space. |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
tuple[ndarray, ndarray]
|
A tuple containing: - numpy.ndarray: The center of the merged AABB in world space. - numpy.ndarray: The x,y,z dimensions of the merged AABB. |
Source code in molmo_spaces/utils/mj_model_and_data_utils.py
mesh_aabb
¶
Compute the tight AABB in world space for a mesh geom using its vertices.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
MjModel
|
The MuJoCo model containing the geom. |
required |
data
|
MjData
|
The MuJoCo data containing the state of the model. |
required |
geom_id
|
int
|
The id of the mesh geom to compute the AABB for. Must be a mesh geom. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
tuple[ndarray, ndarray]
|
A tuple containing: - numpy.ndarray: The center of the AABB in world space. - numpy.ndarray: The x,y,z dimensions of the AABB. |
Source code in molmo_spaces/utils/mj_model_and_data_utils.py
site_pose
¶
mp_logging
¶
Classes:
| Name | Description |
|---|---|
ColoredFormatter |
Format a log string with colors. |
ImportChecker |
|
Functions:
| Name | Description |
|---|---|
find_free_port |
Finds a free port for distributed training. |
get_logger |
Get a |
get_worker_logger |
Create a logger specific to a worker that includes the worker ID in all messages |
init_logging |
Init the |
restore_worker_stdout |
Restore the previous stdout for the current thread |
setup_worker_stdout |
Set up stdout redirection for a worker thread to use the worker's logger |
update_log_level |
|
worker_stdout_context |
Context manager for worker-specific stdout redirection |
Attributes:
| Name | Type | Description |
|---|---|---|
HUMAN_LOG_LEVELS |
tuple[str, ...]
|
Available log levels: "debug", "info", "warning", "error", "none" |
HUMAN_LOG_LEVELS
module-attribute
¶
Available log levels: "debug", "info", "warning", "error", "none"
ColoredFormatter
¶
Bases: Formatter
Format a log string with colors.
This implementation taken (with modifications) from https://stackoverflow.com/a/384125.
Methods:
| Name | Description |
|---|---|
format |
|
Attributes:
| Name | Type | Description |
|---|---|---|
BOLD_SEQ |
|
|
COLORS |
|
|
COLOR_SEQ |
|
|
RESET_SEQ |
|
|
use_color |
|
Source code in molmo_spaces/utils/mp_logging.py
COLORS
class-attribute
instance-attribute
¶
format
¶
Source code in molmo_spaces/utils/mp_logging.py
ImportChecker
¶
find_free_port
¶
Finds a free port for distributed training.
Returns¶
port: port number that can be used to listen
Source code in molmo_spaces/utils/mp_logging.py
get_logger
¶
Get a logging.Logger to stderr. It can be called whenever we wish to
log some message. Messages can get mixed-up
(https://docs.python.org/3.6/library/multiprocessing.html#logging), but it
works well in most cases.
Returns¶
logger: the logging.Logger object
Source code in molmo_spaces/utils/mp_logging.py
get_worker_logger
¶
Create a logger specific to a worker that includes the worker ID in all messages
Source code in molmo_spaces/utils/mp_logging.py
298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 | |
init_logging
¶
Init the logging.Logger.
It should be called only once in the app (e.g. in main). It sets
the log_level to one of HUMAN_LOG_LEVELS. And sets up handlers
for stderr and optionally a log file. The logging level is propagated to all subprocesses.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
human_log_level
|
str
|
Log level as a human-readable string. One of "debug", "info", "warning", "error", "none". |
'info'
|
log_file
|
str | None
|
Optional path to a log file. If provided, logs will also be written to this file. All worker loggers will also write to the same file with worker ID prefixes. |
None
|
Source code in molmo_spaces/utils/mp_logging.py
restore_worker_stdout
¶
Restore the previous stdout for the current thread
Source code in molmo_spaces/utils/mp_logging.py
setup_worker_stdout
¶
Set up stdout redirection for a worker thread to use the worker's logger
Source code in molmo_spaces/utils/mp_logging.py
update_log_level
¶
worker_stdout_context
¶
mujoco_scene_utils
¶
Functions:
| Name | Description |
|---|---|
add_visual_capsule |
Adds one capsule to an mjvScene. |
get_supporting_geom |
Finds the supporting geometry for an object, using a heuristic. |
is_object_supported_by_body |
Checks if an object is supported by a given body, using heuristics. |
place_object_near |
Place an object near a point such that the bottom of the object (i.e. the base) is at the specified z-value, with a random yaw. |
randomize_door_joints |
Modify door and handle joint parameters in a house spec. |
Attributes:
| Name | Type | Description |
|---|---|---|
log |
|
add_visual_capsule
¶
Adds one capsule to an mjvScene. these geometries are automatically visual-only and don't participate in collision detection
Source code in molmo_spaces/utils/mujoco_scene_utils.py
get_supporting_geom
¶
get_supporting_geom(data: MjData, object_id: int, angle_threshold: float = radians(80)) -> int | None
Finds the supporting geometry for an object, using a heuristic. Searches for a geom in contact with the object, such that the contact is in the bottom half of the object's AABB and the normal is pointing upwards.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
MjData
|
MjData object |
required |
object_id
|
int
|
Body ID of the root body to find the supporting geometry for |
required |
angle_threshold
|
float
|
Threshold for the angle between the normal and the vertical axis to be considered parallel, in radians |
radians(80)
|
Returns:
| Name | Type | Description |
|---|---|---|
int |
int | None
|
Geom ID of the supporting geometry, or None if no supporting geometry is found |
Source code in molmo_spaces/utils/mujoco_scene_utils.py
is_object_supported_by_body
¶
is_object_supported_by_body(data: MjData, object_id: int, support_id: int, angle_threshold: float = radians(30), frac_weight_threshold: float = 0.5, eps: float = 1e-06) -> bool
Checks if an object is supported by a given body, using heuristics. This is more precise than get_supporting_geom.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
MjData
|
MjData object |
required |
object_id
|
int
|
Body ID of the root body to check if it is supported by the supporting body |
required |
support_id
|
int
|
Body ID of the supporting body to check if it is supporting the object |
required |
angle_threshold
|
float
|
Threshold for the angle between the normal and the vertical axis to be considered parallel, in radians |
radians(30)
|
frac_weight_threshold
|
float
|
The upward component of the contact force must be at least this fraction of the object weight to be considered supported |
0.5
|
eps
|
float
|
Threshold for the net contact force to be considered non-zero |
1e-06
|
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
True if the object is supported by the given support, False otherwise |
Source code in molmo_spaces/utils/mujoco_scene_utils.py
place_object_near
¶
place_object_near(data: MjData, object_id: int, placement_point: ndarray, min_dist: float, max_dist: float, max_tries: int = 100, reference_pos: ndarray | None = None, max_dist_to_reference: float = 1.0, supporting_geom_id: int | None = None, z_eps: float = 0.001)
Place an object near a point such that the bottom of the object (i.e. the base) is at the specified z-value, with a random yaw. Optionally, ensure the placed object is within a certain distance of a reference position.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
MjData
|
MjData object |
required |
object_id
|
int
|
ID of the object to place |
required |
placement_point
|
ndarray
|
Point to place the object near |
required |
min_dist
|
float
|
Minimum distance from the placement point |
required |
max_dist
|
float
|
Maximum distance from the placement point |
required |
max_tries
|
int
|
Maximum number of placement attempts |
100
|
reference_pos
|
ndarray | None
|
Reference position to place the object near |
None
|
max_dist_to_reference
|
float
|
Maximum distance to the reference position |
1.0
|
supporting_geom_id
|
int | None
|
ID of the supporting geometry to optionally ensure the object is placed on top of |
None
|
z_eps
|
float
|
Epsilon to add to the z-offset to avoid collision |
0.001
|
Raises:
| Type | Description |
|---|---|
ObjectPlacementError
|
If the object cannot be placed within the specified number of attempts |
Source code in molmo_spaces/utils/mujoco_scene_utils.py
148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 | |
randomize_door_joints
¶
randomize_door_joints(spec: MjSpec, scene_metadata: dict, door_stiffness_range: tuple = (3, 7), door_damping_range: tuple = (8, 12), door_frictionloss_range: tuple = (8, 12), handle_stiffness_range: tuple = (200, 300), handle_damping_range: tuple = (80, 120), handle_frictionloss_range: tuple = (40, 60), add_handle_limits: bool = True) -> None
Modify door and handle joint parameters in a house spec.
This function identifies door joints and handle joints by their naming patterns and modifies their physical parameters (stiffness, damping, frictionloss) with randomized values within specified ranges.
It also sets the ref and springref based on range heuristics.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec
|
MjSpec
|
The model spec |
required |
door_stiffness_range
|
tuple
|
(min, max) range for door joint stiffness (default: reduce from ~250 to 3-7) |
(3, 7)
|
door_damping_range
|
tuple
|
(min, max) range for door joint damping (default: reduce from ~100 to 8-12) |
(8, 12)
|
door_frictionloss_range
|
tuple
|
(min, max) range for door joint frictionloss (default: reduce from ~50 to 8-12) |
(8, 12)
|
handle_stiffness_range
|
tuple
|
(min, max) range for handle joint stiffness (default: increase from ~0 to 200-300) |
(200, 300)
|
handle_damping_range
|
tuple
|
(min, max) range for handle joint damping (default: increase from ~0.1 to 80-120) |
(80, 120)
|
handle_frictionloss_range
|
tuple
|
(min, max) range for handle joint frictionloss (default: increase from ~0 to 40-60) |
(40, 60)
|
add_handle_limits
|
bool
|
Whether to add limited="true" and ref/springref attributes to handle joints |
True
|
Source code in molmo_spaces/utils/mujoco_scene_utils.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 | |
object_metadata
¶
Classes:
| Name | Description |
|---|---|
DictUnion |
Union of multiple nonoverlapping dictionaries. |
ObjectMeta |
|
UserLibraryMetadata |
Class which provides dict-like access to a user library metadata. |
Functions:
| Name | Description |
|---|---|
clip_sim |
|
compute_text_clip |
|
get_annotation |
|
get_clip_model |
|
get_db |
|
get_metadata_lmdb_dir |
|
Attributes:
| Name | Type | Description |
|---|---|---|
DEFAULT_CLIP_MODEL |
|
|
DEFAULT_CLIP_PRETRAIN |
|
|
DEFAULT_DEVICE |
|
|
all_descriptions |
|
|
all_descs |
|
|
all_sims |
|
|
asset_id |
|
|
asset_ids |
|
|
description |
|
|
descriptions |
|
|
img |
|
|
text_clips |
|
|
texts |
|
descriptions
module-attribute
¶
descriptions = list((annotation(asset_id)['description']) for asset_id in asset_ids)
texts
module-attribute
¶
DictUnion
¶
Bases: Mapping
Union of multiple nonoverlapping dictionaries. This will not check for key collisions between dictionaries!
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*dicts
|
The dictionaries to union. |
()
|
|
raise_on_missing
|
bool
|
Whether to raise an error if a key is not found in any of the dictionaries. |
False
|
Methods:
| Name | Description |
|---|---|
__contains__ |
|
__getitem__ |
|
__iter__ |
|
__len__ |
|
get |
|
Source code in molmo_spaces/utils/object_metadata.py
ObjectMeta
¶
Methods:
| Name | Description |
|---|---|
all_descriptions |
|
all_uids |
|
annotation |
|
clean_object_name |
|
description_text_features |
|
get_features |
|
get_short_description |
|
get_target_object_uid |
|
img_features |
|
short_descriptions |
|
all_descriptions
classmethod
¶
Source code in molmo_spaces/utils/object_metadata.py
all_uids
staticmethod
¶
annotation
staticmethod
¶
Source code in molmo_spaces/utils/object_metadata.py
clean_object_name
classmethod
¶
clean_object_name(task: BaseMujocoTask) -> str
Source code in molmo_spaces/utils/object_metadata.py
description_text_features
classmethod
¶
get_features
classmethod
¶
Source code in molmo_spaces/utils/object_metadata.py
get_short_description
classmethod
¶
Source code in molmo_spaces/utils/object_metadata.py
get_target_object_uid
staticmethod
¶
Source code in molmo_spaces/utils/object_metadata.py
img_features
classmethod
¶
short_descriptions
classmethod
¶
Source code in molmo_spaces/utils/object_metadata.py
UserLibraryMetadata
¶
UserLibraryMetadata(user_library_path: Path, user_library_index: dict[str, UserAssetLibraryIndexEntry], lru_cache_size: int = 1000)
Bases: Mapping
Class which provides dict-like access to a user library metadata.
Methods:
| Name | Description |
|---|---|
__contains__ |
|
__getitem__ |
|
__iter__ |
|
__len__ |
|
Source code in molmo_spaces/utils/object_metadata.py
clip_sim
¶
Source code in molmo_spaces/utils/object_metadata.py
compute_text_clip
¶
Source code in molmo_spaces/utils/object_metadata.py
get_annotation
¶
get_clip_model
¶
Source code in molmo_spaces/utils/object_metadata.py
get_db
¶
Source code in molmo_spaces/utils/object_metadata.py
get_metadata_lmdb_dir
¶
Source code in molmo_spaces/utils/object_metadata.py
object_retriever
¶
Classes:
| Name | Description |
|---|---|
ObjectRetriever |
|
Attributes:
| Name | Type | Description |
|---|---|---|
anno |
|
|
r |
|
ObjectRetriever
¶
Methods:
| Name | Description |
|---|---|
get_keys_values |
|
query |
|
Attributes:
| Name | Type | Description |
|---|---|---|
max_results |
|
|
storage_path |
|
|
thres |
|
Source code in molmo_spaces/utils/object_retriever.py
storage_path
class-attribute
instance-attribute
¶
storage_path = ASSETS_DIR / '.lmdb' / 'object_retriever'
get_keys_values
¶
Source code in molmo_spaces/utils/object_retriever.py
query
¶
Source code in molmo_spaces/utils/object_retriever.py
patch_renderer_flags
¶
Import this module to configure the renderer flags for the current platform.
Functions:
| Name | Description |
|---|---|
patch_renderer_flags |
|
patch_renderer_flags
¶
Source code in molmo_spaces/utils/patch_renderer_flags.py
pose
¶
Functions:
| Name | Description |
|---|---|
compute_lookat_forward_up |
Compute forward and up unit vectors for a camera looking at a target. |
pos_quat_to_pose_mat |
|
pose_mat_to_7d |
Convert 4x4 pose matrix to 7D vector (x, y, z ,qw, qx, qy, qz). |
pose_mat_to_pos_quat |
|
compute_lookat_forward_up
¶
compute_lookat_forward_up(camera_pos: ndarray, lookat_target: ndarray, camera_up: ndarray | None = None) -> tuple[ndarray, ndarray]
Compute forward and up unit vectors for a camera looking at a target.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
camera_pos
|
ndarray
|
Camera position in world frame. |
required |
lookat_target
|
ndarray
|
Point to look at in world frame. |
required |
camera_up
|
ndarray | None
|
Desired up direction. Defaults to world Z-up [0, 0, 1]. |
None
|
Returns:
| Type | Description |
|---|---|
tuple[ndarray, ndarray]
|
(forward, up) unit vectors in world frame. |
Source code in molmo_spaces/utils/pose.py
pos_quat_to_pose_mat
¶
Source code in molmo_spaces/utils/pose.py
pose_mat_to_7d
¶
Convert 4x4 pose matrix to 7D vector (x, y, z ,qw, qx, qy, qz).
Source code in molmo_spaces/utils/pose.py
pose_mat_to_pos_quat
¶
profiler_utils
¶
Classes:
| Name | Description |
|---|---|
DatagenProfiler |
Per-worker profiler for distributed data generation that accumulates timing stats |
MutableFloat |
|
Profiler |
|
Functions:
| Name | Description |
|---|---|
Timer |
|
DatagenProfiler
¶
Per-worker profiler for distributed data generation that accumulates timing stats across episodes and houses, logging summaries to the worker logger.
Tracks operations like: - task_sampling: Time to sample a task from the task sampler - policy_setup: Time to create/setup the policy - rollout_total: Total time for a rollout (reset + all steps) - rollout_reset: Time for task.reset() - policy_get_action: Time for policy.get_action() calls (per step, accumulated) - task_step: Time for task.step() calls (per step, accumulated) - episode_total: Total time for one episode (sampling + policy setup + rollout) - save_batch_prep: Time to prepare episode for saving - save_trajectories: Time to save trajectory data
Usage
profiler = DatagenProfiler(logger)
For each episode:¶
with profiler.profile("task_sampling"): task = task_sampler.sample_task(...)
After each episode:¶
profiler.log_episode_summary(episode_idx, house_id)
After each house:¶
profiler.log_house_summary(house_id)
Initialize the datagen profiler.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
logger
|
Logger instance to output summaries to. If None, uses get_logger(). |
None
|
|
enabled
|
bool
|
Whether profiling is enabled. If False, all operations are no-ops. |
True
|
Methods:
| Name | Description |
|---|---|
end |
End timing an operation and record the duration. |
get_episode_stats |
Get current episode timing stats as a dict. |
get_house_stats |
Get current house timing stats as a dict. |
log_episode_summary |
Log a summary of timing for the current episode. |
log_house_summary |
Log a summary of timing for the current house (accumulated across all episodes). |
log_worker_summary |
Log a summary of timing for the entire worker (accumulated across all houses). |
profile |
Context manager for profiling a block of code. |
record |
Directly record a duration for an operation (useful when timing is external). |
start |
Start timing an operation. |
Attributes:
| Name | Type | Description |
|---|---|---|
enabled |
|
|
logger |
|
Source code in molmo_spaces/utils/profiler_utils.py
end
¶
End timing an operation and record the duration.
Source code in molmo_spaces/utils/profiler_utils.py
get_episode_stats
¶
Get current episode timing stats as a dict.
Source code in molmo_spaces/utils/profiler_utils.py
get_house_stats
¶
Get current house timing stats as a dict.
Source code in molmo_spaces/utils/profiler_utils.py
log_episode_summary
¶
Log a summary of timing for the current episode.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
episode_idx
|
int
|
Index of the episode within the house |
required |
house_id
|
int
|
ID of the house being processed |
required |
success
|
bool | None
|
Whether the episode was successful (optional) |
None
|
Source code in molmo_spaces/utils/profiler_utils.py
log_house_summary
¶
Log a summary of timing for the current house (accumulated across all episodes).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
house_id
|
int
|
ID of the house that was processed |
required |
success_count
|
int
|
Number of successful episodes in this house |
required |
total_count
|
int
|
Total number of episodes attempted in this house |
required |
Source code in molmo_spaces/utils/profiler_utils.py
log_worker_summary
¶
Log a summary of timing for the entire worker (accumulated across all houses). Call this when the worker is shutting down.
Source code in molmo_spaces/utils/profiler_utils.py
profile
¶
record
¶
Directly record a duration for an operation (useful when timing is external).
Source code in molmo_spaces/utils/profiler_utils.py
MutableFloat
dataclass
¶
Profiler
¶
Methods:
| Name | Description |
|---|---|
end |
|
get_avg_time |
|
get_n |
|
print_all |
|
profile |
|
save_summary |
|
start |
|
Attributes:
| Name | Type | Description |
|---|---|---|
log_realtime |
|
|
save_path |
|
|
start_timestamp |
|
Source code in molmo_spaces/utils/profiler_utils.py
rendering_utils
¶
Functions:
| Name | Description |
|---|---|
get_geom_seg_mask |
Get a mask of all geoms descended from a body in a segmentation mask. |
get_geom_seg_mask
¶
Get a mask of all geoms descended from a body in a segmentation mask.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
MjModel
|
The model to use. |
required |
seg
|
ndarray
|
The (H, W, 2) segmentation mask, as returned by the renderer. |
required |
body_id
|
int
|
The id of the body to get the mask for. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
np.ndarray: A (H, W) mask of the geoms descended from the body. |
Source code in molmo_spaces/utils/rendering_utils.py
sampler_utils
¶
Classes:
| Name | Description |
|---|---|
UniformRandomMapSampler |
|
Functions:
| Name | Description |
|---|---|
furthest_point_sampling |
Furthest Point Sampling (FPS) |
Attributes:
| Name | Type | Description |
|---|---|---|
log |
|
UniformRandomMapSampler
¶
UniformRandomMapSampler(thormap: ProcTHORMap, seed: int = 0, debug: bool = False)
Methods:
| Name | Description |
|---|---|
sample |
Samples N points from free space on the map. |
Attributes:
| Name | Type | Description |
|---|---|---|
debug |
|
|
rng |
|
|
thormap |
|
Source code in molmo_spaces/utils/sampler_utils.py
sample
¶
sample(N=1, positions: ndarray | None = None, quaternions: ndarray | None = None, constraint_positions: ndarray | None = None, constraint_distances: ndarray | None = None, z_pos: float | None = None, look_at: bool | None = True, camera_pose_rel_base: ndarray | None = None, view_range_deg: float | None = 30.0)
Samples N points from free space on the map. If positions and quaternions are not provided, sample uniformly from all free points. If constraint_positions and constraint_distances are provided, use them to sample points within the specified distances from the given positions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
N
|
int
|
The number of points to sample. Defaults to 1. |
1
|
positions
|
Optional[ndarray]
|
An array of positions to force. |
None
|
quaternions
|
Optional[ndarray]
|
An array of quaternions to force. |
None
|
constraint_positions
|
Optional[ndarray]
|
An array of positions to treat as the center of circular constraints. |
None
|
constraint_distances
|
Optional[ndarray]
|
An array of distances to treat as the radius of circular constraints. |
None
|
z_pos
|
Optional[float]
|
If specified, overrides the z-coordinate of the sampled points. Defaults to None. |
None
|
Returns:
| Type | Description |
|---|---|
|
np.ndarray: An array of sampled points, of shape (N, 3) if N > 1, or (3,) if N == 1. |
Source code in molmo_spaces/utils/sampler_utils.py
furthest_point_sampling
¶
Furthest Point Sampling (FPS)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
points
|
ndarray
|
Array of shape (N, D), N points in D dimensions |
required |
k
|
int
|
Number of points to sample |
required |
Returns:
| Name | Type | Description |
|---|---|---|
sampled_indices |
ndarray
|
Indices of sampled points in the original array |
Source code in molmo_spaces/utils/sampler_utils.py
save_utils
¶
Functions:
| Name | Description |
|---|---|
batch_observations |
Transpose a batch of observation dicts to a dict of batched |
byte_array_to_string |
|
convert_to_arr |
|
dict_to_byte_array |
|
is_camera_sensor |
Determine if a sensor corresponds to a camera (RGB or depth) that produces image data. |
prepare_episode_for_saving |
Transform raw episode history into batched format ready for save_trajectories(). |
safe_to_tensor |
Safely convert data to tensor, handling different dimensionalities. |
save_frames_to_mp4 |
Save RGB frames to MP4 video file. |
save_trajectories |
Save trajectories in the expected hierarchical HDF5 format. |
save_videos_from_raw_observations |
Save videos immediately from raw observations before batch processing. |
Attributes:
| Name | Type | Description |
|---|---|---|
COMPR |
|
|
log |
|
batch_observations
¶
batch_observations(observations: list[dict], sensor_suite: SensorSuite, device: device | None = None) -> dict[str, dict | Tensor]
Transpose a batch of observation dicts to a dict of batched observations.
Arguments¶
observations : List of dicts of observations. device : The torch.device to put the resulting tensors on. Will not move the tensors if None.
Returns¶
Transposed dict of lists of observations.
Source code in molmo_spaces/utils/save_utils.py
114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 | |
byte_array_to_string
¶
convert_to_arr
¶
convert_to_arr(observations: list[dict], sensor_suite: SensorSuite) -> list[dict]
Source code in molmo_spaces/utils/save_utils.py
dict_to_byte_array
¶
Source code in molmo_spaces/utils/save_utils.py
is_camera_sensor
¶
is_camera_sensor(sensor_name: str, sensor_suite: SensorSuite | None = None) -> bool
Determine if a sensor corresponds to a camera (RGB or depth) that produces image data.
Uses sensor type metadata when available (preferred), falls back to naming heuristics for backward compatibility when sensor_suite is not provided.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sensor_name
|
str
|
Name of the sensor to check |
required |
sensor_suite
|
SensorSuite | None
|
Optional SensorSuite to query for sensor type metadata |
None
|
Returns:
| Type | Description |
|---|---|
bool
|
True if the sensor is a camera that produces image data (RGB or depth), False otherwise. |
bool
|
Returns False for camera parameter sensors (CameraParameterSensor) which contain |
bool
|
metadata but not image data. |
Source code in molmo_spaces/utils/save_utils.py
prepare_episode_for_saving
¶
prepare_episode_for_saving(history: dict, sensor_suite: SensorSuite, fps: float, save_dir: str | None = None, episode_idx: int = 0, save_file_suffix: str = '', remove_sensors_if_save_dir: bool = True) -> dict[str, Tensor] | None
Transform raw episode history into batched format ready for save_trajectories().
Takes the output of task.get_history() and produces a single dict with all data batched along the time dimension.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
history
|
dict
|
Dict from task.get_history() containing: - "observations": List[List[Dict]] - [timestep][batch_idx][sensor_name] - "rewards": List[List[float]] - "terminals": List[List[bool]] - "truncateds": List[List[bool]] - "actions": List[...] (optional, currently unused) - "obs_scene": Dict (optional) |
required |
sensor_suite
|
SensorSuite
|
SensorSuite for observation processing |
required |
save_dir
|
str | None
|
Optional directory to save videos immediately (before batching) |
None
|
episode_idx
|
int
|
Episode index for video filenames |
0
|
save_file_suffix
|
str
|
Optional suffix for video filenames |
''
|
remove_sensors_if_save_dir
|
bool
|
remove camera-related sensors if video saved |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
dict[str, Tensor] | None
|
Dict[str, Tensor] with all data batched along time dimension, or None if no data |
|
Structure |
dict[str, Tensor] | None
|
|
dict[str, Tensor] | None
|
||
dict[str, Tensor] | None
|
} |
Source code in molmo_spaces/utils/save_utils.py
209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 | |
safe_to_tensor
¶
Safely convert data to tensor, handling different dimensionalities.
Source code in molmo_spaces/utils/save_utils.py
save_frames_to_mp4
¶
save_frames_to_mp4(frames: Sequence[ndarray], file_path: str, fps: float, extra_kwargs: dict[str, Any] | None = None) -> None
Save RGB frames to MP4 video file.
Low-level function that assumes frames are already validated and in uint8 format. Use _save_sensor_video() for high-level saving with validation.
Source code in molmo_spaces/utils/save_utils.py
save_trajectories
¶
save_trajectories(episodes_data: list[dict[str, Tensor]], save_dir: str, fps: float, save_file_suffix: str = '', save_mp4s: bool = True, logger: Logger | None = None) -> Path
Save trajectories in the expected hierarchical HDF5 format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
episodes_data
|
list[dict[str, Tensor]]
|
List of batched observations (output of batch_observations()) Each episode is a Dict[str, torch.Tensor] where tensors have shape (T, ...) |
required |
save_dir
|
str
|
Directory to save files |
required |
fps
|
float
|
Frames per second of episode data |
required |
save_file_suffix
|
str
|
Optional suffix for filenames |
''
|
save_mp4s
|
bool
|
Whether to save MP4 videos |
True
|
logger
|
Logger | None
|
Optional logger to use (defaults to module logger) |
None
|
Expected structure: traj_N/ ├── obs/ │ ├── agent/ │ │ ├── qpos (T,str_max_len) │ │ └── qvel (T,str_max_len) │ ├── extra/ │ │ ├── obj_start (T,7) │ │ ├── obj_end (T,7) │ │ ├── tcp_pose (T,7) │ │ ├── grasp_pose (T,7) │ │ ├── robot_base_pose (T,7) │ │ └── door_state (T,str_max_len) │ │ ├── joint_angle │ │ ├── opening_percentage │ │ ├── handle_position │ │ ├── handle_extents │ │ ├── door_position │ │ └── is_open │ ├── sensor_param/ │ │ └── render_camera/ │ │ ├── extrinsic_cv (T,3,4) │ │ ├── cam2world_gl (T,4,4) │ │ └── intrinsic_cv (T,3,3) │ └── sensor_data/ │ └── render_camera/ │ ├── rgb (T,str_max_len) - video path │ ├── depth (T,str_max_len) - video path │ └── segmentation (T,str_max_len) - video path ├── actions (T,str_max_len) - flattened ├── extra/ - original formats for reference └── episode metadata...
Source code in molmo_spaces/utils/save_utils.py
546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 | |
save_videos_from_raw_observations
¶
save_videos_from_raw_observations(observations_list, save_dir, fps, episode_idx=0, save_file_suffix='', sensor_suite: SensorSuite | None = None) -> None
Save videos immediately from raw observations before batch processing. This avoids the corruption that happens during batch_observations tensor conversion.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
observations_list
|
List of raw observation dicts from episode steps |
required | |
save_dir
|
Directory to save videos |
required | |
fps
|
Frames per second of episode data |
required | |
episode_idx
|
Episode index for naming |
0
|
|
save_file_suffix
|
Optional suffix for filenames |
''
|
|
sensor_suite
|
SensorSuite | None
|
Optional SensorSuite for proper sensor type detection |
None
|
Source code in molmo_spaces/utils/save_utils.py
988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 | |
scene_maps
¶
Classes:
| Name | Description |
|---|---|
ProcTHORMap |
|
THORMap |
Map of the Mujoco scene. |
iTHORMap |
|
Functions:
| Name | Description |
|---|---|
circular_kernel |
|
sample_around_point |
Sample a 2D point around a given point within a given radius. |
Attributes:
| Name | Type | Description |
|---|---|---|
dir_path |
|
|
free_points |
|
|
free_points_px |
|
|
ithormap |
|
|
log |
|
|
one_room_map |
|
|
procthormap |
|
|
procthormap_loaded |
|
|
room_map |
|
|
run_ithor_map_generation |
|
|
run_procthor_map_generation |
|
|
xmls |
|
ithormap
module-attribute
¶
ithormap = from_mj_model_path(model_path, agent_radius=0.25, px_per_m=200, device_id=None)
procthormap
module-attribute
¶
procthormap = from_mj_model_path(model_path, agent_radius=None, px_per_m=200, device_id=None)
ProcTHORMap
¶
ProcTHORMap(occupancy: ndarray, world_to_map: ndarray, map_to_world: ndarray, px_per_m: int, room_map: ndarray = None, room_ids_to_name: dict = None, use_filament: bool = False)
Bases: THORMap
Methods:
| Name | Description |
|---|---|
__call__ |
|
check_collision |
|
from_mj_model_path |
Generate a ProcTHORMap from a MuJoCo model with the open door path cleared. |
get_free_points |
|
get_free_points_by_room |
|
load |
|
pos_m_to_px |
|
pos_px_to_m |
|
safe_model_data |
|
save |
|
save_map |
|
Attributes:
Source code in molmo_spaces/utils/scene_maps.py
__call__
¶
Source code in molmo_spaces/utils/scene_maps.py
check_collision
¶
Source code in molmo_spaces/utils/scene_maps.py
from_mj_model_path
classmethod
¶
from_mj_model_path(model_path: str, camera: str | None = None, agent_radius: float | None = None, px_per_m: int = 100, data: MjData | None = None, device_id: int = None, use_filament: bool = False)
Generate a ProcTHORMap from a MuJoCo model with the open door path cleared.
This method renders occupancy maps at three camera heights
- 5.0 m: Base map with full wall geometry.
- 2.5 m and 1.5 m: Lower views that capture the door opening, since walls might not be visible at these heights.
It computes a door mask as the area that is occupied at 2.5 m but free at 1.5 m and applies that mask to the 5.0 m map. The method also computes the transformation matrices for mapping between world and map coordinates.
Returns:
| Name | Type | Description |
|---|---|---|
ProcTHORMap |
An instance with the occupancy map having the door path cleared. |
Source code in molmo_spaces/utils/scene_maps.py
477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 | |
get_free_points
¶
get_free_points_by_room
¶
Source code in molmo_spaces/utils/scene_maps.py
load
classmethod
¶
Source code in molmo_spaces/utils/scene_maps.py
pos_m_to_px
¶
pos_px_to_m
¶
safe_model_data
staticmethod
¶
Source code in molmo_spaces/utils/scene_maps.py
save
¶
Source code in molmo_spaces/utils/scene_maps.py
save_map
¶
Source code in molmo_spaces/utils/scene_maps.py
THORMap
¶
THORMap(occupancy_map=None, occupancy_scale_factor=None, occupancy_world_dims=None, voxel_map=None, voxel_scale_factor=None, px_per_m: int = 100, use_filament: bool = False)
Map of the Mujoco scene. including fixed, hinged/articulatable, and free objects. exclusing dynamic agent
Methods:
| Name | Description |
|---|---|
__call__ |
|
save_map |
|
Attributes:
| Name | Type | Description |
|---|---|---|
MAP_TYPES |
|
|
occupancy_map |
|
|
occupancy_scale_factor |
|
|
occupancy_world_dims |
|
|
voxel_map |
|
|
voxel_scale_to_world |
|
Source code in molmo_spaces/utils/scene_maps.py
__call__
¶
Source code in molmo_spaces/utils/scene_maps.py
save_map
¶
Source code in molmo_spaces/utils/scene_maps.py
iTHORMap
¶
Bases: ProcTHORMap
Methods:
| Name | Description |
|---|---|
__call__ |
|
check_collision |
|
from_mj_model_path |
Generate a ProcTHORMap from a MuJoCo model with the open door path cleared. |
get_free_points |
|
get_free_points_by_room |
|
load |
|
pos_m_to_px |
|
pos_px_to_m |
|
safe_model_data |
|
save |
|
save_map |
|
Attributes:
Source code in molmo_spaces/utils/scene_maps.py
__call__
¶
Source code in molmo_spaces/utils/scene_maps.py
check_collision
¶
Source code in molmo_spaces/utils/scene_maps.py
from_mj_model_path
classmethod
¶
from_mj_model_path(model_path, camera: str | None = None, agent_radius: float | None = None, px_per_m: int = 100, data: MjData | None = None, device_id: int = None, use_filament: bool = False)
Generate a ProcTHORMap from a MuJoCo model with the open door path cleared.
This method renders occupancy maps at three camera heights
- 5.0 m: Base map with full wall geometry.
- 2.5 m and 1.5 m: Lower views that capture the door opening, since walls might not be visible at these heights.
It computes a door mask as the area that is occupied at 2.5 m but free at 1.5 m and applies that mask to the 5.0 m map. The method also computes the transformation matrices for mapping between world and map coordinates.
Returns:
| Name | Type | Description |
|---|---|---|
ProcTHORMap |
An instance with the occupancy map having the door path cleared. |
Source code in molmo_spaces/utils/scene_maps.py
760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 | |
get_free_points
¶
get_free_points_by_room
¶
Source code in molmo_spaces/utils/scene_maps.py
load
classmethod
¶
Source code in molmo_spaces/utils/scene_maps.py
pos_m_to_px
¶
pos_px_to_m
¶
safe_model_data
staticmethod
¶
Source code in molmo_spaces/utils/scene_maps.py
save
¶
Source code in molmo_spaces/utils/scene_maps.py
save_map
¶
Source code in molmo_spaces/utils/scene_maps.py
circular_kernel
¶
sample_around_point
¶
sample_around_point(thormap: ProcTHORMap | iTHORMap, point: ndarray, radius_range: tuple[float, float], fallback_threshold: float = 0.05, max_iter: int = 100) -> ndarray
Sample a 2D point around a given point within a given radius.
Source code in molmo_spaces/utils/scene_maps.py
scene_metadata_utils
¶
Classes:
| Name | Description |
|---|---|
SceneMeta |
|
Functions:
| Name | Description |
|---|---|
ensure_all_scenes_installed |
|
get_scene_metadata |
Get scene metadata from the scene path. |
is_object_articulable_from_metadata |
Return True if the object has at least one hinge or slide joint per scene metadata. |
synsets_to_scenes_and_assets |
|
Attributes:
| Name | Type | Description |
|---|---|---|
ctime |
|
|
log |
|
|
meta |
|
SceneMeta
¶
Methods:
| Name | Description |
|---|---|
extraction_dir |
|
for_dataset_split |
|
for_split |
|
get_scene_metadata |
|
scene_datasets |
|
extraction_dir
staticmethod
¶
Source code in molmo_spaces/utils/scene_metadata_utils.py
for_dataset_split
cached
classmethod
¶
Source code in molmo_spaces/utils/scene_metadata_utils.py
for_split
classmethod
¶
get_scene_metadata
staticmethod
¶
ensure_all_scenes_installed
¶
Source code in molmo_spaces/utils/scene_metadata_utils.py
get_scene_metadata
¶
Get scene metadata from the scene path.
Source code in molmo_spaces/utils/scene_metadata_utils.py
is_object_articulable_from_metadata
¶
Return True if the object has at least one hinge or slide joint per scene metadata.
Uses the scene's name_map for joints and checks each corresponding MuJoCo joint type.
Source code in molmo_spaces/utils/scene_metadata_utils.py
synsets_to_scenes_and_assets
cached
¶
Source code in molmo_spaces/utils/scene_metadata_utils.py
spatial_utils
¶
Quaternions are assumed to be scalar first!
Classes:
| Name | Description |
|---|---|
Transform |
|
Functions:
| Name | Description |
|---|---|
look_at |
|
Transform
¶
Classes:
| Name | Description |
|---|---|
TClass |
Convenient way to create a pure translation. |
Methods:
| Name | Description |
|---|---|
__mul__ |
|
apply |
|
as_matrix |
|
from_list |
|
from_matrix |
|
from_rotation |
|
from_translation |
|
identity |
|
inv |
|
look_at |
|
to_list |
|
Attributes:
| Name | Type | Description |
|---|---|---|
rotation |
|
|
t_ |
|
|
translation |
|
Source code in molmo_spaces/utils/spatial_utils.py
TClass
¶
Convenient way to create a pure translation.
Transform.t_[x, y, z] is equivalent to Transform.from_translation(np.r_[x, y, z]).
Methods:
| Name | Description |
|---|---|
__getitem__ |
|
__mul__
¶
apply
¶
as_matrix
¶
from_list
classmethod
¶
from_matrix
classmethod
¶
from_rotation
classmethod
¶
from_translation
classmethod
¶
identity
classmethod
¶
inv
¶
look_at
classmethod
¶
Source code in molmo_spaces/utils/spatial_utils.py
look_at
¶
Source code in molmo_spaces/utils/spatial_utils.py
synset_utils
¶
Functions:
| Name | Description |
|---|---|
canonical_lemma |
Return the first (most canonical) lemma for a WordNet synset name. |
filter_synsets_to_remove_hyponyms |
|
generate_all_hypernyms_with_exclusions |
|
generate_hypernym_to_descendants |
|
get_all_synsets_in_metadata |
|
get_highest_relevant_hypernym |
|
get_hypernym_to_descendants_for_all_metadata_synsets |
|
get_hyponyms_of_synset |
|
get_hyponyms_of_synsets |
|
get_singleton_highest_hypernyms |
|
get_valid_pickupable_obja_uids |
Get all objaverse asset UIDs that are pickable (have valid grasp files). |
get_valid_pickupable_obja_uids_excluding_benchmark |
Get pickupable objaverse UIDs with benchmark assets excluded. |
get_valid_receptacle_uids |
Get all asset UIDs that are valid receptacles based on synset filtering. |
is_hypernym_of |
|
is_subsynset_of |
|
is_valid_receptacle_synset |
Check if a synset is a valid receptacle based on inclusion/exclusion rules. |
symmetric_subsynset_of |
|
Attributes:
| Name | Type | Description |
|---|---|---|
BENCHMARK_BLACKLIST_UIDS_PATH |
|
|
EXCLUDED_HYPERNYMS |
|
|
PICKUPABLE_EXCLUDED_CATEGORY_HYPERNYMS |
dict[str, str]
|
|
PICKUPABLE_EXCLUDED_EXACT_SYNSETS |
dict[str, str]
|
|
RECEPTACLE_HYPERNYM_INCLUDE_WITH_EXCLUSIONS |
dict[str, set[str]]
|
|
RECEPTACLE_INCLUDE_SYNSETS |
set[str]
|
|
VALID_PICKUPABLE_OBJA_UIDS_PATH |
|
BENCHMARK_BLACKLIST_UIDS_PATH
module-attribute
¶
BENCHMARK_BLACKLIST_UIDS_PATH = '/weka/prior/datasets/robomolmo/asset_utility_refs/benchmark_blacklist_uids.txt'
EXCLUDED_HYPERNYMS
module-attribute
¶
EXCLUDED_HYPERNYMS = frozenset({'abstraction.n.04', 'abstraction.n.06', 'accident.n.01', 'accomplice.n.01', 'accumulation.n.04', 'act.n.02', 'acting.n.01', 'action.n.01', 'action.n.07', 'activity.n.01', 'administrative_unit.n.01', 'admirer.n.03', 'adult.n.01', 'affair.n.03', 'agaric.n.02', 'agglomeration.n.01', 'air_unit.n.01', 'alloy.n.01', 'animal_material.n.01', 'animal_order.n.01', 'animal_product.n.01', 'announcement.n.02', 'anomaly.n.02', 'aperture.n.03', 'appearance.n.01', 'appearance.n.02', 'appearance.n.04', 'application.n.03', 'approval.n.04', 'archosaur.n.01', 'arctiid.n.01', 'area.n.05', 'area.n.06', 'aristocrat.n.01', 'army_unit.n.01', 'arrangement.n.02', 'arrangement.n.03', 'art.n.03', 'art_form.n.01', 'arthropod_family.n.01', 'arthropod_genus.n.01', 'article.n.02', 'articulator.n.02', 'artifact.n.01', 'artificial_intelligence.n.01', 'artificial_language.n.01', 'artillery.n.02', 'artistic_style.n.01', 'asphodel.n.01', 'assembly.n.01', 'assembly.n.05', 'assembly.n.06', 'assets.n.01', 'assistant.n.01', 'associate.n.01', 'association.n.08', 'atom.n.02', 'atomic_theory.n.01', 'attempt.n.01', 'attendant.n.01', 'attitude.n.01', 'attribute.n.02', 'auditory_communication.n.01', 'autoloader.n.01', 'automatic_firearm.n.01', 'avoirdupois_unit.n.01', 'axis.n.06', 'back.n.08', 'base.n.01', 'basic_cognitive_process.n.01', 'basidiomycete.n.01', 'beginning.n.05', 'being.n.01', 'belief.n.01', 'benzene.n.01', 'bill.n.07', 'binary_compound.n.01', 'bioassay.n.01', 'biological_group.n.01', 'biometric_identification.n.01', 'blemish.n.01', 'body.n.02', 'body.n.04', 'body_part.n.01', 'bodybuilding.n.01', 'boundary.n.01', 'bowling.n.01', 'bramble_bush.n.01', 'bryophyte.n.01', 'business.n.01', 'businessperson.n.01', 'calcium_carbonate.n.01', 'calcium_sulphate.n.01', 'capitalist.n.02', 'capsule.n.03', 'capsule.n.05', 'carbon.n.01', 'care.n.01', 'caryophylloid_dicot_genus.n.01', 'category.n.02', 'catholic_church.n.01', 'causal_agent.n.01', 'cavity.n.02', 'center.n.01', 'center.n.04', 'center.n.06', 'central.n.01', 'ceratopsian.n.01', 'cetacean.n.01', 'change.n.03', 'change_of_location.n.01', 'change_of_state.n.01', 'character.n.04', 'character.n.08', 'chemical_phenomenon.n.01', 'chemoreceptor.n.01', 'chicory.n.04', 'child.n.01', 'child.n.02', 'chordate.n.01', 'circle.n.01', 'class.n.03', 'clef.n.01', 'clown.n.02', 'clue.n.02', 'code.n.03', 'coding_system.n.01', 'cognition.n.01', 'cognitive_factor.n.01', 'collection.n.01', 'collision.n.02', 'color.n.01', 'combatant.n.01', 'comedian.n.01', 'commodity.n.01', 'communication.n.02', 'complexity.n.01', 'component.n.03', 'composition.n.03', 'compound_leaf.n.01', 'compression.n.04', 'computer_graphics.n.01', 'computer_network.n.01', 'computer_science.n.01', 'concealment.n.03', 'concept.n.01', 'conduit.n.01', 'confinement.n.03', 'conic_section.n.01', 'connection.n.01', 'consequence.n.01', 'constitution.n.04', 'constraint.n.01', 'consumer_credit.n.01', 'consumer_goods.n.01', 'content.n.05', 'contestant.n.01', 'control.n.05', 'convex_shape.n.01', 'cook.n.01', 'cooking.n.01', 'cookout.n.01', 'coordinate_system.n.01', 'copper-base_alloy.n.01', 'correctional_institution.n.01', 'corrective.n.01', 'course.n.08', 'covering.n.02', 'crack.n.07', 'craftsman.n.03', 'creating_by_removal.n.01', 'creating_from_raw_materials.n.01', 'creation.n.01', 'creation.n.02', 'creator.n.02', 'crest.n.05', 'criminal.n.01', 'cross_section.n.01', 'crossing.n.05', 'crossopterygian.n.01', 'crosspiece.n.02', 'cuisine.n.01', 'cultivation.n.02', 'cyprinodont.n.01', 'danaid.n.01', 'dance_music.n.02', 'dark.n.01', 'database.n.01', 'decapod.n.02', 'deceiver.n.01', 'decline.n.02', 'decorativeness.n.01', 'defender.n.01', 'definite_quantity.n.01', 'deity.n.01', 'delicious.n.01', 'delivery.n.01', 'demonstration.n.05', 'depiction.n.04', 'depository.n.01', 'depression.n.08', 'design.n.02', 'design.n.04', 'detail.n.02', 'determinant.n.01', 'development.n.06', 'device.n.01', 'diapsid.n.01', 'dicot_genus.n.01', 'diet.n.01', 'difficulty.n.02', 'digit.n.01', 'direction.n.06', 'discharge.n.03', 'discipline.n.01', 'discrimination.n.02', 'disorderliness.n.01', 'display.n.05', 'district.n.01', 'ditch.n.01', 'diver.n.01', 'division.n.03', 'division.n.04', 'dresser.n.02', 'drive.n.02', 'drop.n.01', 'dry_masonry.n.01', 'dryad.n.01', 'durables.n.01', 'dwelling.n.01', 'dysphemism.n.01', 'ectoparasite.n.01', 'edge.n.03', 'edge.n.06', 'edging.n.01', 'effect.n.03', 'effort.n.02', 'egotist.n.01', 'elasmobranch.n.01', 'elasticity.n.01', 'electronic_text.n.01', 'elite.n.01', 'ellipse.n.01', 'embankment.n.01', 'emoticon.n.01', 'employee.n.01', 'enamel.n.04', 'enclosure.n.03', 'engineering.n.02', 'enlisted_person.n.01', 'enterprise.n.02', 'entertainment.n.01', 'entity.n.01', 'entree.n.01', 'escape.n.05', 'eubacteria.n.01', 'european.n.01', 'evaluator.n.01', 'even-toed_ungulate.n.01', 'event.n.01', 'evil_spirit.n.01', 'example.n.01', 'excretory_organ.n.01', 'exercise.n.01', 'exhibitionist.n.02', 'expanse.n.03', 'expedient.n.01', 'experience.n.02', 'explanation.n.02', 'explorer.n.01', 'external_body_part.n.01', 'extremity.n.01', 'extremity.n.05', 'extremum.n.02', 'exudate.n.01', 'facial_expression.n.01', 'facial_hair.n.01', 'facility.n.04', 'facing.n.03', 'failure.n.02', 'family.n.06', 'fancier.n.01', 'fare.n.04', 'farming.n.01', 'fashion.n.03', 'feature.n.02', 'feline.n.01', 'female.n.02', 'fern_ally.n.01', 'ferric_oxide.n.01', 'fibril.n.01', 'fiction.n.01', 'field.n.01', 'figuration.n.02', 'financial_gain.n.01', 'fine_arts.n.01', 'finish.n.04', 'fire.n.01', 'firing_range.n.01', 'first_class.n.02', 'flow.n.01', 'flue.n.03', 'fluid.n.02', 'font.n.01', 'foothold.n.02', 'force.n.02', 'forest.n.01', 'formation.n.01', 'formula.n.04', 'formulation.n.01', 'foundry.n.01', 'framework.n.03', 'front.n.04', 'fruitwood.n.01', 'fullerene.n.01', 'fundamental_quantity.n.01', 'gadoid.n.01', 'gain.n.04', 'game_of_chance.n.01', 'gang.n.03', 'ganoid.n.01', 'gas.n.02', 'gastropod.n.01', 'gate.n.04', 'genre.n.03', 'genus.n.02', 'geographic_point.n.01', 'geographical_area.n.01', 'geometry.n.01', 'girdle.n.01', 'glyptic_art.n.01', 'golf.n.01', 'goosefoot.n.01', 'graphics.n.02', 'greco-roman_deity.n.01', 'greek_deity.n.01', 'grip.n.06', 'groove.n.01', 'group.n.01', 'group_action.n.01', 'hair.n.01', 'happening.n.01', 'hawkmoth.n.01', 'hazard.n.01', 'head.n.04', 'health_hazard.n.01', 'health_professional.n.01', 'heating.n.01', 'hexagram.n.01', 'higher_cognitive_process.n.01', 'hiker.n.01', 'hill.n.01', 'hindrance.n.01', 'hindrance.n.02', 'hindu_deity.n.01', 'history.n.02', 'hole.n.01', 'hole.n.02', 'hole.n.05', 'homespun.n.01', 'homo.n.02', 'horn.n.07', 'housing.n.01', 'humate.n.01', 'humorist.n.01', 'hunting_dog.n.01', 'hydrocarbon.n.01', 'hydrozoan.n.01', 'hypothesis.n.02', 'idea.n.01', 'ideal.n.01', 'idler.n.01', 'illumination.n.02', 'illusion.n.01', 'illustration.n.01', 'imaginary_place.n.01', 'imagination.n.02', 'imaging.n.02', 'immateriality.n.02', 'implement.n.01', 'implementation.n.02', 'impression.n.01', 'incident.n.01', 'income.n.01', 'indefinite_quantity.n.01', 'individual.n.02', 'industry.n.02', 'influence.n.01', 'information.n.02', 'inhabitant.n.01', 'insertion.n.02', 'institution.n.01', 'intake.n.02', 'integer.n.01', 'intellectual.n.01', 'interior_decoration.n.02', 'inventiveness.n.01', 'investigator.n.02', 'iron.n.01', 'isogon.n.01', 'isopod.n.01', 'item.n.01', 'item.n.02', 'item.n.03', 'item.n.04', 'item.n.05', 'item.n.06', 'jack.n.11', 'jail.n.01', 'junction.n.04', 'juvenile.n.01', 'juxtaposition.n.01', 'killer.n.01', 'kind.n.01', 'kingdom.n.01', 'knowledge_domain.n.01', 'labor.n.02', 'laborer.n.01', 'lake.n.01', 'lamination.n.01', 'land.n.01', 'landing.n.02', 'lane.n.02', 'language.n.01', 'language_unit.n.01', 'larid.n.01', 'latex.n.01', 'lawman.n.01', 'layer.n.02', 'leader.n.01', 'leg.n.02', 'legend.n.01', 'leporid.n.01', 'level.n.05', 'life_science.n.01', 'lignite.n.01', 'likeness.n.02', 'liliid_monocot_genus.n.01', 'limit.n.04', 'limit.n.06', 'linear_unit.n.01', 'lipid.n.01', 'liquid.n.03', 'list.n.01', 'literary_composition.n.01', 'literate.n.01', 'living_quarters.n.01', 'living_thing.n.01', 'local_area_network.n.01', 'location.n.01', 'lookout.n.02', 'lottery.n.02', 'lover.n.01', 'machine.n.02', 'macromolecule.n.01', 'magnitude.n.01', 'main.n.02', 'male_aristocrat.n.01', 'male_child.n.01', 'malformation.n.02', 'man.n.01', 'manner.n.01', 'manual_labor.n.01', 'mark.n.04', 'marking.n.02', 'martial_art.n.01', 'mass_unit.n.01', 'material.n.01', 'material.n.04', 'mathematics.n.01', 'matter.n.01', 'matter.n.02', 'matter.n.03', 'matter.n.06', 'means.n.01', 'measure.n.02', 'mechanism.n.05', 'medical_procedure.n.01', 'meeting.n.01', 'membrane.n.02', 'merchant.n.01', 'metallic_element.n.01', 'message.n.01', 'message.n.02', 'military_quarters.n.01', 'military_unit.n.01', 'minimum.n.01', 'misconception.n.01', 'misfortune.n.01', 'mishap.n.02', 'mixture.n.01', 'molecular_formula.n.01', 'moneran.n.01', 'monetary_unit.n.01', 'monocot_genus.n.01', 'motion.n.06', 'motor_hotel.n.01', 'movement.n.03', 'movement.n.04', 'movement.n.11', 'multidimensional_language.n.01', 'murderer.n.01', 'music.n.01', 'musical_composition.n.01', 'musical_notation.n.01', 'musical_organization.n.01', 'musician.n.01', 'muslim.n.01', 'name.n.01', 'natural_elevation.n.01', 'natural_object.n.01', 'natural_phenomenon.n.01', 'natural_process.n.01', 'natural_science.n.01', 'negotiator.n.01', 'neritid.n.01', 'net_income.n.01', 'nidus.n.02', 'nobility.n.01', 'noise.n.01', 'nongovernmental_organization.n.01', 'nonmetal.n.01', 'nonworker.n.01', 'notation.n.01', 'notion.n.04', 'number.n.02', 'nutrient.n.02', 'object.n.01', 'object.n.04', 'obstacle.n.01', 'occultist.n.01', 'occupation.n.01', 'offspring.n.01', 'oil_paint.n.01', 'oldster.n.01', 'open_chain.n.01', 'operation.n.06', 'orchis.n.01', 'order.n.12', 'order.n.14', 'organelle.n.01', 'organic_compound.n.01', 'organism.n.01', 'organization.n.01', 'orifice.n.01', 'originality.n.01', 'ornithischian.n.01', 'orthography.n.01', 'oscine.n.01', 'ovule.n.01', 'oxide.n.01', 'pad.n.02', 'padding.n.01', 'parallelepiped.n.01', 'parasite.n.01', 'paring.n.01', 'part.n.01', 'part.n.02', 'part.n.03', 'partial_veil.n.01', 'participant.n.01', 'particle.n.02', 'particulate.n.01', 'passage.n.03', 'patron_saint.n.01', 'pedaler.n.01', 'peer.n.01', 'percept.n.01', 'perception.n.03', 'percussionist.n.01', 'performance.n.02', 'performer.n.01', 'performing_arts.n.01', 'perpendicular.n.02', 'personal_property.n.01', 'phenomenon.n.01', 'physical_entity.n.01', 'physical_phenomenon.n.01', 'physical_property.n.01', 'pictorial_representation.n.01', 'piece.n.01', 'placement.n.01', 'plain.n.01', 'plan.n.01', 'plan.n.03', 'plane_figure.n.01', 'plant_genus.n.01', 'plant_order.n.01', 'plant_organ.n.01', 'plant_part.n.01', 'plant_process.n.01', 'play.n.08', 'player.n.01', 'point_of_view.n.01', 'porcelain.n.01', 'portrayal.n.02', 'poseur.n.01', 'position.n.07', 'position.n.12', 'post.n.01', 'power.n.01', 'practice.n.01', 'practice_range.n.01', 'prayer.n.02', 'presence.n.01', 'presentation.n.02', 'preserver.n.03', 'principal.n.05', 'problem.n.02', 'procedure.n.01', 'process.n.02', 'process.n.05', 'process.n.06', 'prod.n.02', 'product.n.02', 'production.n.02', 'production.n.07', 'profile.n.05', 'program.n.07', 'programming_language.n.01', 'projection.n.04', 'property.n.01', 'property.n.02', 'property.n.04', 'property.n.05', 'proportional_font.n.01', 'propulsion.n.01', 'propulsion.n.02', 'protection.n.01', 'protocol.n.01', 'protoctist.n.01', 'psychological_feature.n.01', 'public_square.n.01', 'punctuation.n.02', 'pure_mathematics.n.01', 'push.n.01', 'quality.n.01', 'railway.n.01', 'range.n.04', 'range.n.05', 'ration.n.01', 'reaction_propulsion.n.01', 'real_property.n.01', 'rectangle.n.01', 'region.n.01', 'region.n.03', 'regular_polygon.n.01', 'relation.n.01', 'relationship.n.03', 'relative.n.01', 'religion.n.02', 'repair_shop.n.01', 'representation.n.01', 'representation.n.02', 'representational_process.n.01', 'representative.n.01', 'reproductive_cell.n.01', 'reproductive_structure.n.01', 'reptile_family.n.01', 'reptile_genus.n.01', 'reserve.n.02', 'residential_district.n.01', 'residue.n.01', 'resin.n.01', 'resource.n.03', 'respiratory_tract.n.01', 'restoration.n.06', 'retreat.n.02', 'rider.n.03', 'rig.n.03', 'right.n.01', 'robotics.n.01', 'roman_deity.n.01', 'room.n.02', 'rosid_dicot_genus.n.01', 'rotating_mechanism.n.01', 'row.n.01', 'rubber.n.01', 'ruminant.n.01', 'saint.n.01', 'salmonid.n.01', 'salt.n.01', 'sample.n.03', 'sanitary_condition.n.01', 'satirist.n.01', 'saurischian.n.01', 'saying.n.01', 'scholar.n.01', 'school.n.04', 'science.n.01', 'scientific_theory.n.01', 'scorpaenid.n.01', 'script.n.03', 'section.n.03', 'section.n.04', 'section.n.08', 'sediment.n.01', 'self-defense.n.01', 'semipermeable_membrane.n.01', 'sense_organ.n.01', 'serviceman.n.01', 'set.n.13', 'setting.n.02', 'settlement.n.06', 'sewing.n.02', 'shaft.n.08', 'shape.n.02', 'sheath.n.02', 'sheet.n.06', 'shell.n.02', 'show.n.01', 'side.n.04', 'side.n.05', 'side.n.09', 'sign.n.01', 'sign.n.11', 'signal.n.01', 'silhouette.n.02', 'situation.n.01', 'skating.n.01', 'skilled_worker.n.01', 'sleeper.n.01', 'slope.n.01', 'small_indefinite_quantity.n.01', 'small_person.n.01', 'smith.n.10', 'soapsuds.n.01', 'social_group.n.01', 'software.n.01', 'sole.n.01', 'solid.n.01', 'solid.n.03', 'solution.n.01', 'somatic_cell.n.01', 'sound.n.04', 'spatial_property.n.01', 'species.n.01', 'specimen.n.01', 'speech.n.02', 'speech_act.n.01', 'sphere.n.01', 'spirit.n.01', 'spirit.n.04', 'spiritual_being.n.01', 'splash.n.01', 'spot.n.05', 'spot.n.12', 'spring.n.03', 'square.n.01', 'squeeze.n.01', 'stable_gear.n.01', 'star.n.03', 'state.n.02', 'state_of_matter.n.01', 'statement.n.01', 'steel.n.01', 'steward.n.03', 'store.n.02', 'story.n.02', 'stratum.n.01', 'structure.n.01', 'structure.n.03', 'structural_formula.n.01', 'structure.n.04', 'styrene.n.01', 'subject.n.01', 'subjugation.n.01', 'substance.n.01', 'substance.n.07', 'substance.n.08', 'suburb.n.01', 'sum.n.01', 'superior_skill.n.01', 'support.n.03', 'supporting_structure.n.01', 'surface.n.02', 'suspension.n.01', 'sweetening.n.01', 'swine.n.01', 'symbol.n.01', 'symbol.n.02', 'synapsid.n.01', 'synthetic.n.01', 'synthetic_resin.n.01', 'system.n.01', 'system.n.06', 'system_of_measurement.n.01', 'taste.n.03', 'taxonomic_group.n.01', 'temperature_change.n.01', 'terminal.n.01', 'test.n.05', 'text.n.01', 'texture.n.01', 'theory.n.01', 'thing.n.04', 'thing.n.08', 'thing.n.12', 'thinker.n.02', 'thinking.n.01', 'thoroughfare.n.01', 'toecap.n.01', 'top.n.01', 'top.n.02', 'topping.n.01', 'tract.n.01', 'trade.n.02', 'traffic.n.01', 'transaction.n.01', 'transducer.n.01', 'transgression.n.01', 'transparent_substance.n.01', 'transportation.n.02', 'traveler.n.01', 'triangle.n.01', 'trouble.n.03', 'tube.n.01', 'type.n.04', 'underbrush.n.01', 'ungulate.n.01', 'unicameral_script.n.01', 'union_representative.n.01', 'unit.n.02', 'unit.n.03', 'unit.n.05', 'unit_of_measurement.n.01', 'universe.n.01', 'unreality.n.01', 'unsoundness.n.01', 'unwelcome_person.n.01', 'upper_class.n.01', 'upper_surface.n.01', 'user.n.01', 'utility.n.06', 'valuable.n.01', 'vapor.n.01', 'vascular_system.n.01', 'vault.n.03', 'vegetation.n.01', 'vehicular_traffic.n.01', 'veranda.n.01', 'vertical_surface.n.01', 'vicinity.n.01', 'village.n.02', 'vinyl_polymer.n.01', 'visual_communication.n.01', 'visual_percept.n.01', 'visual_perception.n.01', 'visual_property.n.01', 'visual_signal.n.01', 'vital_principle.n.01', 'vogue.n.01', 'volatile_storage.n.01', 'ware.n.01', 'waste.n.01', 'watercourse.n.03', 'way.n.06', 'wealth.n.03', 'weave.n.01', 'weightlift.n.01', 'whole.n.01', 'whole.n.02', 'window.n.08', 'woman.n.01', 'work.n.01', 'work.n.02', 'worker.n.01', 'workman.n.01', 'workplace.n.01', 'writing.n.02', 'writing.n.04', 'written_communication.n.01', 'written_symbol.n.01', 'wrongdoer.n.01', 'wrongdoing.n.02', 'yard.n.09', 'zone.n.01'})
PICKUPABLE_EXCLUDED_CATEGORY_HYPERNYMS
module-attribute
¶
PICKUPABLE_EXCLUDED_CATEGORY_HYPERNYMS: dict[str, str] = {'sculpture.n.01': 'sculpture/art object', 'model.n.04': 'scale model/replica', 'miniature.n.02': 'miniature/replica', 'sign.n.02': 'sign/placard', 'eolith.n.01': 'primitive stone implement', 'paleolith.n.01': 'primitive stone implement'}
PICKUPABLE_EXCLUDED_EXACT_SYNSETS
module-attribute
¶
PICKUPABLE_EXCLUDED_EXACT_SYNSETS: dict[str, str] = {'plaything.n.01': 'generic plaything', 'toy.n.02': 'generic toy (non-plaything sense)', 'popgun.n.01': 'toy gun', 'arrowhead.n.01': 'primitive implement', 'stone.n.02': 'building material', 'emblem.n.01': 'visual symbol', 'logo.n.01': 'visual symbol'}
RECEPTACLE_HYPERNYM_INCLUDE_WITH_EXCLUSIONS
module-attribute
¶
RECEPTACLE_HYPERNYM_INCLUDE_WITH_EXCLUSIONS: dict[str, set[str]] = {'box.n.01': set(), 'receptacle.n.01': {'beehive.n.04'}, 'pan.n.03': set(), 'vessel.n.03': {'ladle.n.01', 'bathtub.n.01', 'boiler.n.01', 'tank.n.02', 'bedpan.n.01'}, 'dish.n.01': set(), 'basket.n.01': set(), 'glass.n.02': set(), 'workbasket.n.01': set()}
RECEPTACLE_INCLUDE_SYNSETS
module-attribute
¶
RECEPTACLE_INCLUDE_SYNSETS: set[str] = frozenset({'flatware.n.01', 'glassware.n.01', 'dinnerware.n.01', 'service.n.09', 'gold_plate.n.01', 'silver_plate.n.01', 'crockery.n.01', 'place_mat.n.01', 'coaster.n.03', 'tray.n.01', 'saucer.n.02', 'platter.n.01', 'jar.n.01', 'canister.n.02', 'tin.n.02', 'case.n.05', 'baking_dish.n.01', 'mixing_bowl.n.01', 'salad_bowl.n.01', 'serving_dish.n.01', 'caddy.n.02', 'bin.n.01'})
VALID_PICKUPABLE_OBJA_UIDS_PATH
module-attribute
¶
VALID_PICKUPABLE_OBJA_UIDS_PATH = '/weka/prior/datasets/robomolmo/asset_utility_refs/valid_pickupable_obja_uids.txt'
canonical_lemma
¶
Return the first (most canonical) lemma for a WordNet synset name.
filter_synsets_to_remove_hyponyms
¶
Source code in molmo_spaces/utils/synset_utils.py
generate_all_hypernyms_with_exclusions
¶
generate_all_hypernyms_with_exclusions(synset: str | Synset, excluded: set[str] | str = EXCLUDED_HYPERNYMS, include_self_synset: bool = True) -> set[Synset]
Source code in molmo_spaces/utils/synset_utils.py
generate_hypernym_to_descendants
¶
generate_hypernym_to_descendants(synsets: Sequence[str] | Sequence[Synset]) -> dict[str, list[Synset]]
Source code in molmo_spaces/utils/synset_utils.py
get_all_synsets_in_metadata
¶
Source code in molmo_spaces/utils/synset_utils.py
get_highest_relevant_hypernym
¶
get_highest_relevant_hypernym(synset: str | Synset, excluded: set[str] | str = EXCLUDED_HYPERNYMS) -> Synset
Source code in molmo_spaces/utils/synset_utils.py
get_hypernym_to_descendants_for_all_metadata_synsets
¶
get_hyponyms_of_synset
cached
¶
Source code in molmo_spaces/utils/synset_utils.py
get_hyponyms_of_synsets
¶
get_hyponyms_of_synsets(synsets: Iterable[str] | Iterable[Synset], return_strings: bool) -> set[Synset] | set[str]
Source code in molmo_spaces/utils/synset_utils.py
get_singleton_highest_hypernyms
cached
¶
Source code in molmo_spaces/utils/synset_utils.py
get_valid_pickupable_obja_uids
¶
Get all objaverse asset UIDs that are pickable (have valid grasp files).
Checks for cached file at VALID_PICKUPABLE_OBJA_UIDS_PATH first to avoid expensive computation. If not found, computes and returns the list.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
debug
|
bool
|
If True, prints 20 random samples with their short descriptions. |
False
|
Returns:
| Type | Description |
|---|---|
list[str]
|
List of UIDs for valid pickupable assets. |
Source code in molmo_spaces/utils/synset_utils.py
get_valid_pickupable_obja_uids_excluding_benchmark
¶
Get pickupable objaverse UIDs with benchmark assets excluded.
Loads the benchmark blacklist from BENCHMARK_BLACKLIST_UIDS_PATH (generated by scripts/roseh/extract_benchmark_assets.py) and removes any UIDs that appear in any bench_v3 benchmark as a pickup or placement asset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
debug
|
bool
|
If True, prints how many assets were filtered and samples of what was removed. |
False
|
Returns:
| Type | Description |
|---|---|
list[str]
|
List of UIDs for valid pickupable assets not used in any benchmark. |
Source code in molmo_spaces/utils/synset_utils.py
get_valid_receptacle_uids
¶
Get all asset UIDs that are valid receptacles based on synset filtering.
Returns:
| Type | Description |
|---|---|
dict[str, dict]
|
Dict mapping UID to annotation dict for valid receptacle assets. |
Source code in molmo_spaces/utils/synset_utils.py
is_hypernym_of
cached
¶
Source code in molmo_spaces/utils/synset_utils.py
is_subsynset_of
¶
is_valid_receptacle_synset
¶
Check if a synset is a valid receptacle based on inclusion/exclusion rules.
The cached valid set already contains all hyponyms of included hypernyms, so a simple set membership check is sufficient.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
synset
|
str | Synset
|
A WordNet synset or synset name string |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if the synset is a valid receptacle type |
Source code in molmo_spaces/utils/synset_utils.py
symmetric_subsynset_of
¶
task_relevant_objects_and_workspace_utils
¶
Derive task-relevant object names and workspace center from task config fields.
Single source of truth for which objects cameras must see and what defines the workspace center. Called from: - Task samplers (resolve_visibility_object, get_workspace_center) during data generation - create_json_benchmark.py to populate EpisodeSpec.task_relevant_objects - Eval camera system for visibility checks and workspace center computation
Accepts either a pydantic config object or a plain dict.
Functions:
| Name | Description |
|---|---|
compute_workspace_center |
Compute the workspace center as the centroid of named 3-D positions. |
compute_workspace_center_from_object_poses |
Compute workspace center from serialized object poses (e.g. JSON episode data). |
get_task_relevant_objects |
Return the list of object body names that are relevant for this task. |
compute_workspace_center
¶
Compute the workspace center as the centroid of named 3-D positions.
This is the shared implementation used by both live task samplers (positions from the environment) and the eval camera system (positions from JSON episode data).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
positions
|
dict[str, ndarray]
|
Mapping of label -> 3-D position array. Typical keys are
object body names from :func: |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
3-D centroid (mean) of all positions. |
Source code in molmo_spaces/utils/task_relevant_objects_and_workspace_utils.py
compute_workspace_center_from_object_poses
¶
compute_workspace_center_from_object_poses(object_names: list[str], object_poses: dict[str, list[float]], gripper_pos: ndarray | None = None) -> ndarray
Compute workspace center from serialized object poses (e.g. JSON episode data).
Convenience wrapper around :func:compute_workspace_center for the eval
path, where positions come from EpisodeSpec.scene_modifications.object_poses
(each value is [x, y, z, qw, qx, qy, qz]).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
object_names
|
list[str]
|
Body names whose positions should contribute (typically
from :func: |
required |
object_poses
|
dict[str, list[float]]
|
Mapping of body name to 7-D pose |
required |
gripper_pos
|
ndarray | None
|
Optional gripper position to include. |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
3-D centroid. |
Source code in molmo_spaces/utils/task_relevant_objects_and_workspace_utils.py
get_task_relevant_objects
¶
Return the list of object body names that are relevant for this task.
These are the objects that cameras should be able to see (visibility constraints) and whose positions define the workspace center.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
task_config
|
Any
|
A task config object (e.g. PickTaskConfig) or a dict with the same keys (as stored in EpisodeSpec.task). |
required |
Returns:
| Type | Description |
|---|---|
list[str]
|
Deduplicated list of object body names, in stable insertion order. |
Source code in molmo_spaces/utils/task_relevant_objects_and_workspace_utils.py
test_utils
¶
Shared utilities for data generation tests (Franka, RUM, etc.).
Functions:
| Name | Description |
|---|---|
assert_obs_scene_match |
Assert that the obs_scenes of two trajectory groups are equal. |
assert_observations_match |
Compare actual and expected observations using Structural Similarity Index (SSIM). |
assert_python_types_equal |
General (recursive) function to assert that two python objects are equal, with tolerance applied for floats. |
compare_h5_groups |
Recursively compare two HDF5 groups and check for differences. |
print_profiling_summary |
Print a formatted summary of profiling results. |
run_policy_for_steps |
Run a policy on a task for a fixed number of steps, following pipeline.py API. |
run_task_for_steps_with_observations |
Run a policy on a task for a fixed number of steps and return both qpos and observations. |
save_observation_comparison |
Save visual observation comparisons including actual, expected, and difference images. |
save_visual_observations |
Save visual observations as viewable PNG images for debugging. |
verify_and_compare_camera_observations |
Verify observation structure and compare camera observations against saved test data. |
verify_and_compare_camera_observations_after_steps |
Verify and compare camera observations after running policy steps against saved test data. |
verify_video_fps |
Assert all videos in a directory have the expected FPS. |
Attributes:
| Name | Type | Description |
|---|---|---|
log |
|
assert_obs_scene_match
¶
Assert that the obs_scenes of two trajectory groups are equal.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
g1
|
Group
|
h5py.Group of the first trajectory |
required |
g2
|
Group
|
h5py.Group of the second trajectory |
required |
Source code in molmo_spaces/utils/test_utils.py
assert_observations_match
¶
assert_observations_match(actual_obs, expected_obs, sensor_name, atol=0, rtol=1e-07, ssim_threshold=0.9)
Compare actual and expected observations using Structural Similarity Index (SSIM).
Uses SSIM to compare images perceptually rather than pixel-by-pixel, which is more robust to minor rendering variations while still catching meaningful visual differences.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
actual_obs
|
Actual observation array |
required | |
expected_obs
|
Expected observation array |
required | |
sensor_name
|
Name of the sensor (for error messages) |
required | |
atol
|
Unused, kept for API compatibility |
0
|
|
rtol
|
Unused, kept for API compatibility |
1e-07
|
Raises:
| Type | Description |
|---|---|
AssertionError
|
If observations have meaningful visual differences (low SSIM) |
Source code in molmo_spaces/utils/test_utils.py
assert_python_types_equal
¶
General (recursive) function to assert that two python objects are equal, with tolerance applied for floats. Works for native python primitives only.
Source code in molmo_spaces/utils/test_utils.py
compare_h5_groups
¶
Recursively compare two HDF5 groups and check for differences.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
g1
|
First HDF5 group |
required | |
g2
|
Second HDF5 group |
required | |
path
|
Current path in the HDF5 structure (for error messages) |
'/'
|
|
atol
|
Absolute tolerance for numerical comparisons |
1e-06
|
|
ignore_paths
|
Optional set/list of path suffixes to skip (e.g., {"object_image_points"}) |
None
|
Source code in molmo_spaces/utils/test_utils.py
920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 | |
print_profiling_summary
¶
Print a formatted summary of profiling results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
profiler
|
Profiler instance with collected timing data |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
Formatted summary string |
Source code in molmo_spaces/utils/test_utils.py
run_policy_for_steps
¶
Run a policy on a task for a fixed number of steps, following pipeline.py API.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
task
|
The task instance |
required | |
policy
|
The policy instance |
required | |
num_steps
|
Number of steps to run |
10
|
|
profiler
|
Optional profiler instance to track timing |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
(initial_qpos, final_qpos) as numpy arrays |
Source code in molmo_spaces/utils/test_utils.py
run_task_for_steps_with_observations
¶
Run a policy on a task for a fixed number of steps and return both qpos and observations.
This extends run_policy_for_steps by also capturing initial and final observations after running steps. Useful for testing that observations remain deterministic across runs and that they change appropriately.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
task
|
The task instance |
required | |
policy
|
The policy instance |
required | |
num_steps
|
Number of steps to run |
10
|
|
profiler
|
Optional profiler instance to track timing |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
(initial_qpos, final_qpos, initial_obs_dict, final_obs_dict) where: - initial_qpos: numpy array of initial joint positions - final_qpos: numpy array of final joint positions - initial_obs_dict: dictionary of initial observations from the single environment - final_obs_dict: dictionary of final observations from the single environment after running steps |
Source code in molmo_spaces/utils/test_utils.py
123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 | |
save_observation_comparison
¶
Save visual observation comparisons including actual, expected, and difference images.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
obs_dict
|
Dictionary of actual observations from a single environment |
required | |
expected_dict
|
Dictionary of expected observations with same structure |
required | |
output_dir
|
Path object or string for the debug output directory |
required | |
prefix
|
Prefix for the saved image filenames |
'comparison'
|
Source code in molmo_spaces/utils/test_utils.py
save_visual_observations
¶
Save visual observations as viewable PNG images for debugging.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
obs_dict
|
Dictionary of observations from a single environment |
required | |
output_dir
|
Path object or string for the debug output directory |
required | |
prefix
|
Prefix for the saved image filenames (e.g., "obs", "expected", "diff") |
'obs'
|
Source code in molmo_spaces/utils/test_utils.py
verify_and_compare_camera_observations
¶
verify_and_compare_camera_observations(obs, sensor_suite, test_data_dir, test_data_prefix, expected_cameras, debug_images_dir=None, debug_prefix='obs', expected_shape=(480, 480, 3), atol=1.0, rtol=0.0, ignore_cameras=None, skip_depth_exact_match=True, ssim_threshold=0.9)
Verify observation structure and compare camera observations against saved test data.
This is a comprehensive helper for testing task observations that: 1. Verifies the observation structure (vectorized format) 2. Extracts camera sensors and checks their shapes 3. Compares them against saved test data 4. Optionally saves debug images for visual inspection 5. Verifies all expected cameras are present
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
obs
|
Observation tuple from task.reset() or task.step() |
required | |
sensor_suite
|
The task's sensor suite |
required | |
test_data_dir
|
Path to directory containing test data files |
required | |
test_data_prefix
|
Prefix for test data files (e.g., "rum_pick_obs_") |
required | |
expected_cameras
|
List of expected camera sensor names |
required | |
debug_images_dir
|
Optional path to save debug images. If None, no images are saved. |
None
|
|
debug_prefix
|
Prefix for debug image filenames (default: "obs") |
'obs'
|
|
expected_shape
|
Expected shape of camera observations (default: (480, 480, 3)) |
(480, 480, 3)
|
|
atol
|
Absolute tolerance for pixel value comparison (default: 1.0). For uint8 images [0-255], 1.0 allows single-pixel differences due to floating point precision or slight numerical variations. |
1.0
|
|
rtol
|
Relative tolerance for comparison (default: 0.0) |
0.0
|
|
ignore_cameras
|
Optional list of camera sensor names to skip during comparison |
None
|
|
skip_depth_exact_match
|
Whether to skip pixel-exact depth comparison (default: True). When True, uses structural similarity (SSIM) on normalized depth for cross-platform robustness. When False, does pixel-exact comparison with edge masking (for local determinism tests). Depth rendering is NOT deterministic across platforms/GPUs. |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
(obs_dict, camera_sensors_found) for further testing if needed |
Source code in molmo_spaces/utils/test_utils.py
298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 | |
verify_and_compare_camera_observations_after_steps
¶
verify_and_compare_camera_observations_after_steps(obs_dict, sensor_suite, test_data_dir, test_data_prefix, expected_cameras, initial_obs_dict=None, debug_images_dir=None, debug_prefix='obs_after_steps', expected_shape=(480, 480, 3), atol=1.0, rtol=0.0, ignore_cameras=None, skip_depth_exact_match=True, ssim_threshold=0.9)
Verify and compare camera observations after running policy steps against saved test data.
Similar to verify_and_compare_camera_observations, but expects obs_dict directly rather than the tuple format from task.reset()/task.step().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
obs_dict
|
Dictionary of observations from a single environment |
required | |
sensor_suite
|
The task's sensor suite |
required | |
test_data_dir
|
Path to directory containing test data files |
required | |
test_data_prefix
|
Prefix for test data files (e.g., "rum_pick_after_steps_") |
required | |
expected_cameras
|
List of expected camera sensor names |
required | |
initial_obs_dict
|
Optional dict of initial observations to verify that observations changed |
None
|
|
debug_images_dir
|
Optional path to save debug images. If None, no images are saved. |
None
|
|
debug_prefix
|
Prefix for debug image filenames (default: "obs_after_steps") |
'obs_after_steps'
|
|
expected_shape
|
Expected shape of camera observations (w,h,c) (default: (480, 480, 3)) |
(480, 480, 3)
|
|
atol
|
Absolute tolerance for pixel value comparison (default: 1.0) |
1.0
|
|
rtol
|
Relative tolerance for comparison (default: 0.0) |
0.0
|
|
ignore_cameras
|
Optional list of camera sensor names to skip during comparison |
None
|
|
skip_depth_exact_match
|
Whether to skip pixel-exact depth comparison (default: True). When True, uses structural similarity (SSIM) on normalized depth for cross-platform robustness. When False, does pixel-exact comparison with edge masking (for local determinism tests). Depth rendering is NOT deterministic across platforms/GPUs. |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
list |
camera_sensors_found for further testing if needed |
Source code in molmo_spaces/utils/test_utils.py
570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 | |
verify_video_fps
¶
Assert all videos in a directory have the expected FPS.
Source code in molmo_spaces/utils/test_utils.py
video_utils
¶
Copied from video2sim_pipeline/video2sim/utils/video_utils.py
Functions:
| Name | Description |
|---|---|
ffmpeg_save_video |
Save a video using ffmpeg. |
resize_with_padding |
Resize an image to fit within the target size while maintaining its original aspect ratio. |
ffmpeg_save_video
¶
ffmpeg_save_video(frames, output_path: str, fps: float = 30.0, codec: str = 'libx264', quality: int = 23, pix_fmt='rgb24')
Save a video using ffmpeg.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
frames
|
Video frames as numpy array (T, H, W, 3) or torch tensor (T, 3, H, W) |
required | |
output_path
|
str
|
Path to save the video file |
required |
fps
|
float
|
Frames per second |
30.0
|
codec
|
str
|
Video codec to use |
'libx264'
|
quality
|
int
|
CRF value (lower is better quality, 18-28 is reasonable) |
23
|
Source code in molmo_spaces/utils/video_utils.py
resize_with_padding
¶
Resize an image to fit within the target size while maintaining its original aspect ratio. Padding (letterbox) is added to ensure the final image matches the target dimensions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
array
|
Input image. |
required |
target_width
|
int
|
Desired width. |
required |
target_height
|
int
|
Desired height. |
required |
pad_color
|
tuple
|
Color for the padding (default is black). |
(0, 0, 0)
|
Returns:
| Type | Description |
|---|---|
|
np.array: Resized image with padding. |