mbreuss commited on
Commit
997234b
1 Parent(s): fd2208e

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. README.md +71 -0
  2. model_cleaned.safetensors +3 -0
README.md ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: custom
3
+ tags:
4
+ - robotics
5
+ - diffusion
6
+ - mixture-of-experts
7
+ - multi-modal
8
+ license: mit
9
+ datasets:
10
+ - CALVIN
11
+ languages:
12
+ - en
13
+ pipeline_tag: robotics
14
+ ---
15
+ # MoDE (Mixture of Denoising Experts) Diffusion Policy
16
+
17
+ ## Model Description
18
+
19
+ <div style="text-align: center">
20
+ <img src="MoDE_Figure_1.png" width="800px"/>
21
+ </div>
22
+
23
+ - [Github Link](https://github.com/intuitive-robots/MoDE_Diffusion_Policy)
24
+ - [Project Page](https://mbreuss.github.io/MoDE_Diffusion_Policy/)
25
+
26
+ This model implements a Mixture of Diffusion Experts architecture for robotic manipulation, combining transformer-based backbone with noise-only expert routing. For faster inference, we can precache the chosen expert for each timestep to reduce computation time.
27
+
28
+ The model has been pretrained on a subset of OXE for 300k steps and finetuned for downstream tasks on the CALVIN/LIBERO dataset.
29
+
30
+ ## Model Details
31
+
32
+ ### Architecture
33
+ - **Base Architecture**: MoDE with custom Mixture of Experts Transformer
34
+ - **Vision Encoder**: ResNet-50 with FiLM conditioning finetuned from ImageNet
35
+ - **EMA**: Enabled
36
+ - **Action Window Size**: 10
37
+ - **Sampling Steps**: 5 (optimal for performance)
38
+ - **Sampler Type**: DDIM
39
+
40
+ ### Input/Output Specifications
41
+
42
+ #### Inputs
43
+ - RGB Static Camera: `(B, T, 3, H, W)` tensor
44
+ - RGB Gripper Camera: `(B, T, 3, H, W)` tensor
45
+ - Language Instructions: Text strings
46
+
47
+ #### Outputs
48
+ - Action Space: `(B, T, 7)` tensor representing delta EEF actions
49
+
50
+ ## Usage
51
+
52
+ ```python
53
+ obs = {
54
+ "rgb_obs": {
55
+ "rgb_static": static_image,
56
+ "rgb_gripper": gripper_image
57
+ }
58
+ }
59
+ goal = {"lang_text": "pick up the blue cube"}
60
+ action = model.step(obs, goal)
61
+ ```
62
+
63
+ ## Training Details
64
+
65
+ ### Configuration
66
+ - **Optimizer**: AdamW
67
+ - **Learning Rate**: 0.0001
68
+ - **Weight Decay**: 0.05
69
+
70
+ ## License
71
+ This model is released under the MIT license.
model_cleaned.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bcf93362ced811101ff755dac7a9e85267cf76f933f4ad847edecac7be71d9a3
3
+ size 3317019856