mbreuss
/

MoDE_LIBERO_90

Robotics

custom

diffusion

mixture-of-experts

multi-modal

Model card Files Files and versions Community

mbreuss commited on 2 days ago

Commit

cd97a87

•

1 Parent(s): 13a71cc

Upload folder using huggingface_hub

Browse files

Files changed (2) hide show

README.md +70 -55
model_cleaned.safetensors +1 -1

README.md CHANGED Viewed

@@ -1,56 +1,71 @@
-            ---
-            library_name: custom
-            tags:
-            - robotics
-            - diffusion
-            - mixture-of-experts
-            - multi-modal
-            license: mit
-            datasets:
-            - CALVIN
-            language:
-            - en
-            pipeline_tag: robotics
-            ---
-            # MoDE (Mixture 1of Diffusion Experts) Model
-            This model implements a Mixture of Diffusion Experts architecture for robotic manipulation, combining transformer-based processing with expert routing and diffusion-based action prediction.
-            ## Model Architecture
-            - Base Architecture: MoDE with custom Mixture of Experts Transformer
-            - Vision Encoder: {getattr(model_instance, 'resnet_type', 'ResNet')} with FiLM conditioning
-            - EMA: Enabled
-            - Action Window Size: {model_instance.act_window_size}
-            - Sampling Steps: {model_instance.num_sampling_steps}
-            - Sampler Type: {model_instance.sampler_type}
-            ## Input/Output Specifications
-            - RGB Static Camera: (B, T, 3, H, W) tensor
-            - RGB Gripper Camera: (B, T, 3, H, W) tensor
-            - Language Instructions: Text strings
-            - Output: (B, T, 7) tensor representing 7-DoF actions
-            ## Usage Example
-            ```python
-            from huggingface_hub import hf_hub_download
-            import torch
-            weights_path = hf_hub_download(repo_id="{repo_name}", filename="model_cleaned.safetensors")
-            model.load_pretrained_parameters(weights_path)
-            obs = {
-                "rgb_obs": {
-                    "rgb_static": static_image,
-                    "rgb_gripper": gripper_image
-                }
-            }
-            goal = {"lang_text": "pick up the blue cube"}
-            action = model.step(obs, goal)
-            ```
-            ## Training Configuration
-            - Optimizer: AdamW
-            - Learning Rate: {config.optimizer.learning_rate}
-            - Weight Decay: {config.optimizer.transformer_weight_decay}

+---
+library_name: custom
+tags:
+- robotics
+- diffusion
+- mixture-of-experts
+- multi-modal
+license: mit
+datasets:
+- CALVIN
+languages:
+- en
+pipeline_tag: robotics
+---
+# MoDE (Mixture of Denoising Experts) Diffusion Policy
+## Model Description
+<div style="text-align: center">
+    <img src="MoDE_Figure_1.png" width="800px"/>
+</div>
+- [Github Link](https://github.com/intuitive-robots/MoDE_Diffusion_Policy)
+- [Project Page](https://mbreuss.github.io/MoDE_Diffusion_Policy/)
+This model implements a Mixture of Diffusion Experts architecture for robotic manipulation, combining transformer-based backbone with noise-only expert routing. For faster inference, we can precache the chosen expert for each timestep to reduce computation time.
+The model has been pretrained on a subset of OXE for 300k steps and finetuned for downstream tasks on the CALVIN/LIBERO dataset.
+## Model Details
+### Architecture
+- **Base Architecture**: MoDE with custom Mixture of Experts Transformer
+- **Vision Encoder**: ResNet-50 with FiLM conditioning finetuned from ImageNet
+- **EMA**: Enabled
+- **Action Window Size**: 10
+- **Sampling Steps**: 5 (optimal for performance)
+- **Sampler Type**: DDIM
+### Input/Output Specifications
+#### Inputs
+- RGB Static Camera: `(B, T, 3, H, W)` tensor
+- RGB Gripper Camera: `(B, T, 3, H, W)` tensor
+- Language Instructions: Text strings
+#### Outputs
+- Action Space: `(B, T, 7)` tensor representing delta EEF actions
+## Usage
+```python
+obs = {
+    "rgb_obs": {
+        "rgb_static": static_image,
+        "rgb_gripper": gripper_image
+    }
+}
+goal = {"lang_text": "pick up the blue cube"}
+action = model.step(obs, goal)
+```
+## Training Details
+### Configuration
+- **Optimizer**: AdamW
+- **Learning Rate**: 0.0001
+- **Weight Decay**: 0.05
+## License
+This model is released under the MIT license.

model_cleaned.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ff0b9daebf7144d7c161dc2062ba5aca753799b085f1d1c802b83c18835efd9c
 size 3317019856

 version https://git-lfs.github.com/spec/v1
+oid sha256:b1ad15371a423cf413d345a0ba379872b1b01b5bf1ba54034756d874bc8c2cf2
 size 3317019856