Spaces:

Baptlem
/

UCDR-Net

Running

App Files Files Community

Baptlem commited on May 5, 2023

Commit

39db156

•

1 Parent(s): b1d16a6

Update app.py

Browse files

Files changed (1) hide show

app.py +31 -26

app.py CHANGED Viewed

@@ -19,54 +19,56 @@ if gr.__version__ != "3.28.3": #doesn't work...
     os.system("pip install gradio==3.28.3")
 title_description = """
-# SynDRoM
-## Synthetic Data augmentation for Robotic Manipulation
 """
 description = """
-Our project is to use diffusion model to change the texture of our robotic arm simulation.
-To do so, we first get our simulated images. After, we process these images to get Canny Edge maps. Finally, we can get brand new images by using ControlNet.
-Therefore, we are able to change our simulation texture, and still keep the image composition.
-Our objectif for the sprint is to perform data augmentation using ControlNet. So we look for having a model that can augment an image quickly.
-To do so, we trained many Controlnets from scratch with different datasets :
 * [Coyo-700M](https://github.com/kakaobrain/coyo-dataset)
 * [Bridge](https://sites.google.com/view/bridgedata)
-A method to accelerate the inference of diffusion model is by simply generating small images. So we decided to work with low resolution images.
-After downloading the datasets, we processed them by resizing images to a 128 resolution.
-The smallest side of the image (width or height) is resized to 128 and the other side is resized keeping the initial ratio.
-After, we retrieve the Canny Edge Map of the images. We performed this preprocess for every datasets we use during the sprint.
-We train four different Controlnets. For each one of them, we processed the datasets differently. You can find the description of the processing in the readme file attached to the model repo
-[Our ControlNet repo](https://huggingface.co/Baptlem/baptlem-controlnet)
-For now, we benchmarked our model on a node of 4 Titan RTX 24Go. We were able to generate a batch of 4 images in a average time of 1.3 seconds!
-We also have access to nodes composed of 8 A100 80Go GPUs. The benchmark on one of these nodes will come soon.
 """
 traj_description = """
-We generated a trajectory of our simulated environment. We will then use it with our different models.
-We made these videos on our Titan RTX node.
-The prompt we use for every video is "A robotic arm with a gripper and a small cube on a table, super realistic, industrial background"
 """
 perfo_description = """
-The Table on the right shows the performances of our models running on different nodes.
-To make the benchmark, we loaded one of our model on every GPUs of the node. We then retrieve an episode of our simulation.
-For every frame of the episode, we preprocess the image (resize, canny, ...) and process the Canny image on the GPUs.
-We repeated this procedure for different Batch Size (BS).
 We can see that the greater the BS the greater the FPS. By increazing the BS, we take advantage of the parallelization of the GPUs.
 """
 def create_key(seed=0):
     return jax.random.PRNGKey(seed)
@@ -317,6 +319,9 @@ def create_demo(process, max_images=12, default_num_images=4):
             with gr.Column():
                 gr.Image("./perfo_rtx.png",
                         interactive=False)

     os.system("pip install gradio==3.28.3")
 title_description = """
+# UCDR-Net
+## Unlimited Controlled Domain Randomization Network for Bridging the Sim2Real Gap in Robotics
 """
 description = """
+While existing ControlNet and public diffusion models are predominantly geared towards high-resolution images (512x512 or above) and intricate artistic detail generation, there's an untapped potential of these models in Automatic Data Augmentation (ADA).
+By harnessing the inherent variance in prompt-conditioned generated images, we can significantly boost the visual diversity of training samples for computer vision pipelines.
+This is particularly relevant in the field of robotics, where deep learning is increasingly playing a pivotal role in training policies for robotic manipulation from images.
+In this HuggingFace sprint, we present UCDR-Net (Unlimited Controlled Domain Randomization Network), a novel CannyEdge mini-ControlNet trained on Stable Diffusion 1.5 with mixed datasets.
+Our model generates photorealistic and varied renderings from simplistic robotic simulation images, enabling real-time data augmentation for robotic vision training.
+We specifically designed UCDR-Net to be fast and composition preserving, with an emphasis on lower resolution images (128x128) for online data augmentation in typical preprocessing pipelines.
+Our choice of Canny Edge version of ControlNet ensures shape and structure preservation in the image, which is crucial for visuomotor policy learning.
+We trained ControlNet from scratch using only 128x128 images, preprocessing the training datasets and extracting Canny Edge maps.
+We then trained four Control-Nets with different mixtures of 2 datasets (Coyo-700M and Bridge Data) and showcased the results.
 * [Coyo-700M](https://github.com/kakaobrain/coyo-dataset)
 * [Bridge](https://sites.google.com/view/bridgedata)
+Model Description and Training Process: Please refer to the readme file attached to the model repository.
+Model Repository: [ControlNet repo](https://huggingface.co/Baptlem/baptlem-controlnet)
 """
 traj_description = """
+To demonstrate UCDR-Net's capabilities, we generated a trajectory of our simulated robotic environment and presented the resulting videos for each model.
+We batched the frames for each video and performed independent inference for each frame, which explains the "wobbling" effect.
+Prompt used for every video: "A robotic arm with a gripper and a small cube on a table, super realistic, industrial background"
 """
 perfo_description = """
+Our model has been benchmarked on a node of 4 Titan RTX 24Go GPUs, achieving an impressive 14 FPS image generation rate!
+The Table on the right shows the performances of our models running on different nodes.
+To make the benchmark, we loaded one of our model on every GPUs of the node. We then retrieve an episode of our simulation.
+For every frame of the episode, we preprocess the image (resize, canny, …) and process the Canny image on the GPUs.
+We repeated this procedure for different Batch Size (BS).
 We can see that the greater the BS the greater the FPS. By increazing the BS, we take advantage of the parallelization of the GPUs.
 """
+conclusion_description = """
+UCDR-Net stands as a natural development in bridging the Sim2Real gap in robotics by providing real-time data augmentation for training visual policies.
+We are excited to share our work with the HuggingFace community and contribute to the advancement of robotic vision training techniques.
+"""
 def create_key(seed=0):
     return jax.random.PRNGKey(seed)
             with gr.Column():
                 gr.Image("./perfo_rtx.png",
                         interactive=False)
+        with gr.Row():
+            gr.Markdown(conclusion_description)