Update app.py
Browse files
app.py
CHANGED
@@ -19,54 +19,56 @@ if gr.__version__ != "3.28.3": #doesn't work...
|
|
19 |
os.system("pip install gradio==3.28.3")
|
20 |
|
21 |
title_description = """
|
22 |
-
#
|
23 |
-
##
|
24 |
|
25 |
"""
|
26 |
|
27 |
description = """
|
28 |
-
|
29 |
-
|
30 |
-
|
31 |
|
|
|
|
|
32 |
|
33 |
-
|
34 |
-
|
|
|
|
|
|
|
35 |
* [Coyo-700M](https://github.com/kakaobrain/coyo-dataset)
|
36 |
* [Bridge](https://sites.google.com/view/bridgedata)
|
37 |
|
38 |
-
|
39 |
-
After downloading the datasets, we processed them by resizing images to a 128 resolution.
|
40 |
-
The smallest side of the image (width or height) is resized to 128 and the other side is resized keeping the initial ratio.
|
41 |
-
After, we retrieve the Canny Edge Map of the images. We performed this preprocess for every datasets we use during the sprint.
|
42 |
-
|
43 |
-
|
44 |
-
We train four different Controlnets. For each one of them, we processed the datasets differently. You can find the description of the processing in the readme file attached to the model repo
|
45 |
-
[Our ControlNet repo](https://huggingface.co/Baptlem/baptlem-controlnet)
|
46 |
|
47 |
-
|
48 |
-
We also have access to nodes composed of 8 A100 80Go GPUs. The benchmark on one of these nodes will come soon.
|
49 |
-
|
50 |
|
51 |
"""
|
52 |
|
53 |
traj_description = """
|
54 |
-
|
55 |
-
We
|
56 |
-
|
|
|
57 |
"""
|
58 |
|
59 |
|
60 |
perfo_description = """
|
61 |
-
|
62 |
-
|
63 |
-
|
64 |
-
|
|
|
65 |
|
66 |
We can see that the greater the BS the greater the FPS. By increazing the BS, we take advantage of the parallelization of the GPUs.
|
67 |
-
|
68 |
"""
|
69 |
|
|
|
|
|
|
|
|
|
|
|
70 |
|
71 |
def create_key(seed=0):
|
72 |
return jax.random.PRNGKey(seed)
|
@@ -317,6 +319,9 @@ def create_demo(process, max_images=12, default_num_images=4):
|
|
317 |
with gr.Column():
|
318 |
gr.Image("./perfo_rtx.png",
|
319 |
interactive=False)
|
|
|
|
|
|
|
320 |
|
321 |
|
322 |
|
|
|
19 |
os.system("pip install gradio==3.28.3")
|
20 |
|
21 |
title_description = """
|
22 |
+
# UCDR-Net
|
23 |
+
## Unlimited Controlled Domain Randomization Network for Bridging the Sim2Real Gap in Robotics
|
24 |
|
25 |
"""
|
26 |
|
27 |
description = """
|
28 |
+
While existing ControlNet and public diffusion models are predominantly geared towards high-resolution images (512x512 or above) and intricate artistic detail generation, there's an untapped potential of these models in Automatic Data Augmentation (ADA).
|
29 |
+
By harnessing the inherent variance in prompt-conditioned generated images, we can significantly boost the visual diversity of training samples for computer vision pipelines.
|
30 |
+
This is particularly relevant in the field of robotics, where deep learning is increasingly playing a pivotal role in training policies for robotic manipulation from images.
|
31 |
|
32 |
+
In this HuggingFace sprint, we present UCDR-Net (Unlimited Controlled Domain Randomization Network), a novel CannyEdge mini-ControlNet trained on Stable Diffusion 1.5 with mixed datasets.
|
33 |
+
Our model generates photorealistic and varied renderings from simplistic robotic simulation images, enabling real-time data augmentation for robotic vision training.
|
34 |
|
35 |
+
We specifically designed UCDR-Net to be fast and composition preserving, with an emphasis on lower resolution images (128x128) for online data augmentation in typical preprocessing pipelines.
|
36 |
+
Our choice of Canny Edge version of ControlNet ensures shape and structure preservation in the image, which is crucial for visuomotor policy learning.
|
37 |
+
|
38 |
+
We trained ControlNet from scratch using only 128x128 images, preprocessing the training datasets and extracting Canny Edge maps.
|
39 |
+
We then trained four Control-Nets with different mixtures of 2 datasets (Coyo-700M and Bridge Data) and showcased the results.
|
40 |
* [Coyo-700M](https://github.com/kakaobrain/coyo-dataset)
|
41 |
* [Bridge](https://sites.google.com/view/bridgedata)
|
42 |
|
43 |
+
Model Description and Training Process: Please refer to the readme file attached to the model repository.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
44 |
|
45 |
+
Model Repository: [ControlNet repo](https://huggingface.co/Baptlem/baptlem-controlnet)
|
|
|
|
|
46 |
|
47 |
"""
|
48 |
|
49 |
traj_description = """
|
50 |
+
To demonstrate UCDR-Net's capabilities, we generated a trajectory of our simulated robotic environment and presented the resulting videos for each model.
|
51 |
+
We batched the frames for each video and performed independent inference for each frame, which explains the "wobbling" effect.
|
52 |
+
Prompt used for every video: "A robotic arm with a gripper and a small cube on a table, super realistic, industrial background"
|
53 |
+
|
54 |
"""
|
55 |
|
56 |
|
57 |
perfo_description = """
|
58 |
+
Our model has been benchmarked on a node of 4 Titan RTX 24Go GPUs, achieving an impressive 14 FPS image generation rate!
|
59 |
+
The Table on the right shows the performances of our models running on different nodes.
|
60 |
+
To make the benchmark, we loaded one of our model on every GPUs of the node. We then retrieve an episode of our simulation.
|
61 |
+
For every frame of the episode, we preprocess the image (resize, canny, …) and process the Canny image on the GPUs.
|
62 |
+
We repeated this procedure for different Batch Size (BS).
|
63 |
|
64 |
We can see that the greater the BS the greater the FPS. By increazing the BS, we take advantage of the parallelization of the GPUs.
|
|
|
65 |
"""
|
66 |
|
67 |
+
conclusion_description = """
|
68 |
+
UCDR-Net stands as a natural development in bridging the Sim2Real gap in robotics by providing real-time data augmentation for training visual policies.
|
69 |
+
We are excited to share our work with the HuggingFace community and contribute to the advancement of robotic vision training techniques.
|
70 |
+
|
71 |
+
"""
|
72 |
|
73 |
def create_key(seed=0):
|
74 |
return jax.random.PRNGKey(seed)
|
|
|
319 |
with gr.Column():
|
320 |
gr.Image("./perfo_rtx.png",
|
321 |
interactive=False)
|
322 |
+
|
323 |
+
with gr.Row():
|
324 |
+
gr.Markdown(conclusion_description)
|
325 |
|
326 |
|
327 |
|