mbreuss commited on
Commit
cd97a87
1 Parent(s): 13a71cc

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. README.md +70 -55
  2. model_cleaned.safetensors +1 -1
README.md CHANGED
@@ -1,56 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
 
2
- ---
3
- library_name: custom
4
- tags:
5
- - robotics
6
- - diffusion
7
- - mixture-of-experts
8
- - multi-modal
9
- license: mit
10
- datasets:
11
- - CALVIN
12
- language:
13
- - en
14
- pipeline_tag: robotics
15
- ---
16
- # MoDE (Mixture 1of Diffusion Experts) Model
17
-
18
- This model implements a Mixture of Diffusion Experts architecture for robotic manipulation, combining transformer-based processing with expert routing and diffusion-based action prediction.
19
-
20
- ## Model Architecture
21
- - Base Architecture: MoDE with custom Mixture of Experts Transformer
22
- - Vision Encoder: {getattr(model_instance, 'resnet_type', 'ResNet')} with FiLM conditioning
23
- - EMA: Enabled
24
- - Action Window Size: {model_instance.act_window_size}
25
- - Sampling Steps: {model_instance.num_sampling_steps}
26
- - Sampler Type: {model_instance.sampler_type}
27
-
28
- ## Input/Output Specifications
29
- - RGB Static Camera: (B, T, 3, H, W) tensor
30
- - RGB Gripper Camera: (B, T, 3, H, W) tensor
31
- - Language Instructions: Text strings
32
- - Output: (B, T, 7) tensor representing 7-DoF actions
33
-
34
- ## Usage Example
35
- ```python
36
- from huggingface_hub import hf_hub_download
37
- import torch
38
-
39
- weights_path = hf_hub_download(repo_id="{repo_name}", filename="model_cleaned.safetensors")
40
- model.load_pretrained_parameters(weights_path)
41
-
42
- obs = {
43
- "rgb_obs": {
44
- "rgb_static": static_image,
45
- "rgb_gripper": gripper_image
46
- }
47
- }
48
- goal = {"lang_text": "pick up the blue cube"}
49
- action = model.step(obs, goal)
50
- ```
51
-
52
- ## Training Configuration
53
- - Optimizer: AdamW
54
- - Learning Rate: {config.optimizer.learning_rate}
55
- - Weight Decay: {config.optimizer.transformer_weight_decay}
56
-
 
1
+ ---
2
+ library_name: custom
3
+ tags:
4
+ - robotics
5
+ - diffusion
6
+ - mixture-of-experts
7
+ - multi-modal
8
+ license: mit
9
+ datasets:
10
+ - CALVIN
11
+ languages:
12
+ - en
13
+ pipeline_tag: robotics
14
+ ---
15
+ # MoDE (Mixture of Denoising Experts) Diffusion Policy
16
 
17
+ ## Model Description
18
+
19
+ <div style="text-align: center">
20
+ <img src="MoDE_Figure_1.png" width="800px"/>
21
+ </div>
22
+
23
+ - [Github Link](https://github.com/intuitive-robots/MoDE_Diffusion_Policy)
24
+ - [Project Page](https://mbreuss.github.io/MoDE_Diffusion_Policy/)
25
+
26
+ This model implements a Mixture of Diffusion Experts architecture for robotic manipulation, combining transformer-based backbone with noise-only expert routing. For faster inference, we can precache the chosen expert for each timestep to reduce computation time.
27
+
28
+ The model has been pretrained on a subset of OXE for 300k steps and finetuned for downstream tasks on the CALVIN/LIBERO dataset.
29
+
30
+ ## Model Details
31
+
32
+ ### Architecture
33
+ - **Base Architecture**: MoDE with custom Mixture of Experts Transformer
34
+ - **Vision Encoder**: ResNet-50 with FiLM conditioning finetuned from ImageNet
35
+ - **EMA**: Enabled
36
+ - **Action Window Size**: 10
37
+ - **Sampling Steps**: 5 (optimal for performance)
38
+ - **Sampler Type**: DDIM
39
+
40
+ ### Input/Output Specifications
41
+
42
+ #### Inputs
43
+ - RGB Static Camera: `(B, T, 3, H, W)` tensor
44
+ - RGB Gripper Camera: `(B, T, 3, H, W)` tensor
45
+ - Language Instructions: Text strings
46
+
47
+ #### Outputs
48
+ - Action Space: `(B, T, 7)` tensor representing delta EEF actions
49
+
50
+ ## Usage
51
+
52
+ ```python
53
+ obs = {
54
+ "rgb_obs": {
55
+ "rgb_static": static_image,
56
+ "rgb_gripper": gripper_image
57
+ }
58
+ }
59
+ goal = {"lang_text": "pick up the blue cube"}
60
+ action = model.step(obs, goal)
61
+ ```
62
+
63
+ ## Training Details
64
+
65
+ ### Configuration
66
+ - **Optimizer**: AdamW
67
+ - **Learning Rate**: 0.0001
68
+ - **Weight Decay**: 0.05
69
+
70
+ ## License
71
+ This model is released under the MIT license.
model_cleaned.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ff0b9daebf7144d7c161dc2062ba5aca753799b085f1d1c802b83c18835efd9c
3
  size 3317019856
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b1ad15371a423cf413d345a0ba379872b1b01b5bf1ba54034756d874bc8c2cf2
3
  size 3317019856