mbreuss
/

MoDE_CALVIN_ABC_1

mixture-of-experts

Model card Files Files and versions Community

MoDE_CALVIN_ABC_1 / README.md

mbreuss's picture

Upload folder using huggingface_hub

997234b verified 6 days ago

|

1.82 kB

	---
	library_name: custom
	tags:
	- robotics
	- diffusion
	- mixture-of-experts
	- multi-modal
	license: mit
	datasets:
	- CALVIN
	languages:
	- en
	pipeline_tag: robotics
	---
	# MoDE (Mixture of Denoising Experts) Diffusion Policy

	## Model Description

	<div style="text-align: center">
	<img src="MoDE_Figure_1.png" width="800px"/>
	</div>

	- [Github Link](https://github.com/intuitive-robots/MoDE_Diffusion_Policy)
	- [Project Page](https://mbreuss.github.io/MoDE_Diffusion_Policy/)

	This model implements a Mixture of Diffusion Experts architecture for robotic manipulation, combining transformer-based backbone with noise-only expert routing. For faster inference, we can precache the chosen expert for each timestep to reduce computation time.

	The model has been pretrained on a subset of OXE for 300k steps and finetuned for downstream tasks on the CALVIN/LIBERO dataset.

	## Model Details

	### Architecture
	- Base Architecture: MoDE with custom Mixture of Experts Transformer
	- Vision Encoder: ResNet-50 with FiLM conditioning finetuned from ImageNet
	- EMA: Enabled
	- Action Window Size: 10
	- Sampling Steps: 5 (optimal for performance)
	- Sampler Type: DDIM

	### Input/Output Specifications

	#### Inputs
	- RGB Static Camera: `(B, T, 3, H, W)` tensor
	- RGB Gripper Camera: `(B, T, 3, H, W)` tensor
	- Language Instructions: Text strings

	#### Outputs
	- Action Space: `(B, T, 7)` tensor representing delta EEF actions

	## Usage

	```python
	obs = {
	"rgb_obs": {
	"rgb_static": static_image,
	"rgb_gripper": gripper_image
	}
	}
	goal = {"lang_text": "pick up the blue cube"}
	action = model.step(obs, goal)
	```

	## Training Details

	### Configuration
	- Optimizer: AdamW
	- Learning Rate: 0.0001
	- Weight Decay: 0.05

	## License
	This model is released under the MIT license.