Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,98 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
datasets:
|
4 |
+
- qizekun/ShapeLLM
|
5 |
+
language:
|
6 |
+
- en
|
7 |
+
---
|
8 |
+
|
9 |
+
## Install
|
10 |
+
|
11 |
+
[//]: # (If you are using Windows, do *NOT* proceed, see instructions [here](https://github.com/qizekun/LLaVA/blob/main/docs/Windows.md).)
|
12 |
+
|
13 |
+
1. Clone this repository and navigate to ShapeLLM folder
|
14 |
+
```Shell
|
15 |
+
git clone https://github.com/qizekun/ShapeLLM.git
|
16 |
+
cd ShapeLLM
|
17 |
+
```
|
18 |
+
2. Install Package
|
19 |
+
```Shell
|
20 |
+
conda create -n shapellm python=3.10 -y
|
21 |
+
conda activate shapellm
|
22 |
+
pip install --upgrade pip # enable PEP 660 support
|
23 |
+
pip install -e .
|
24 |
+
```
|
25 |
+
3. Install additional packages for training cases
|
26 |
+
```Shell
|
27 |
+
pip install -e ".[train]"
|
28 |
+
pip install flash-attn --no-build-isolation
|
29 |
+
```
|
30 |
+
4. Install PointNet++
|
31 |
+
```Shell
|
32 |
+
pip install "git+https://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"
|
33 |
+
```
|
34 |
+
|
35 |
+
|
36 |
+
## ShapeLLM
|
37 |
+
### model weights
|
38 |
+
Please check out our [Model Zoo](https://github.com/qizekun/ShapeLLM/blob/main/docs/MODEL_ZOO.md) for all public ShapeLLM checkpoints.
|
39 |
+
|
40 |
+
### Demo
|
41 |
+
#### CLI Inference
|
42 |
+
Chat about point clouds using CLI interface. It also supports multiple GPUs, 4-bit and 8-bit quantized inference.
|
43 |
+
```Shell
|
44 |
+
python -m llava.serve.cli \
|
45 |
+
--model-path qizekun/ShapeLLM_13B_general_v1.0 \
|
46 |
+
--pts-file assets/instrument.npy
|
47 |
+
```
|
48 |
+
|
49 |
+
### Training
|
50 |
+
Consistent with LLaVA, we adopt a two-stage training approach. In the first stage, we solely fine-tune the projector for semantic alignment. In the second stage, we conduct full fine-tuning using Instruction Following data.
|
51 |
+
Download data following [DATA](https://github.com/qizekun/ShapeLLM/blob/main/docs/DATA.md), organize the data as follows in `./playground/data/shapellm/`,
|
52 |
+
```
|
53 |
+
βplayground/data/shapellm/
|
54 |
+
βββ cap3d_objaverse_785k.json
|
55 |
+
βββ cap3d_objaverse_sft_45k.json
|
56 |
+
βββ gapartnet_sft_27k_openai.json
|
57 |
+
βββ gapartnet_pcs
|
58 |
+
β βββ Box_100129_0_0.npy
|
59 |
+
β βββ ...
|
60 |
+
βββ cap3d_pcs
|
61 |
+
βββ 00000054c36d44a2a483bdbff31d8edf.pt
|
62 |
+
βββ ...
|
63 |
+
```
|
64 |
+
Furthermore, ShapeLLM utilizes the Large version of [ReCon++](https://github.com/qizekun/ShapeLLM/blob/main/ReConV2/cfgs/pretrain/large/openshape.yaml) as the point encoder.
|
65 |
+
You need to download the [ReCon++ weight](https://huggingface.co/qizekun/ReConV2/blob/main/zeroshot/large/best_lvis.pth) and save it to `./checkpoints/recon/large.pth`.
|
66 |
+
```
|
67 |
+
βcheckpoints/recon/
|
68 |
+
βββ large.pth
|
69 |
+
```
|
70 |
+
**1. Feature Alignment Stage**
|
71 |
+
```
|
72 |
+
sh scripts/pretrain.sh
|
73 |
+
```
|
74 |
+
**2. Visual Instruction Tuning Stage**
|
75 |
+
```
|
76 |
+
sh scripts/finetune.sh
|
77 |
+
```
|
78 |
+
The training takes around 14 hours for ShapeLLM-13B on 8x A100 (80G). It takes around 7 hours for ShapeLLM-7B.
|
79 |
+
|
80 |
+
### Zero-shot Understanding on 3D MM-Vet
|
81 |
+
Evaluate 3D MLLMs for integrated capabilities and embodied interaction capabilities, run the script:
|
82 |
+
```
|
83 |
+
sh scripts/eval/mmvet.sh
|
84 |
+
```
|
85 |
+
Using GPT4 to calulate the 3D MM-Vet score:
|
86 |
+
```
|
87 |
+
sh scripts/eval/eval_mmvet.sh
|
88 |
+
```
|
89 |
+
|
90 |
+
### Visual Grounding on GApartNet
|
91 |
+
Evaluate the performance of ShapeLLM on the GApartNet dataset, run the script:
|
92 |
+
```
|
93 |
+
sh scripts/eval/gapartnet_ref.sh
|
94 |
+
```
|
95 |
+
Calucate the generative 3D visual grounding accuracy:
|
96 |
+
```
|
97 |
+
sh scripts/eval/eval_gapartnet.sh
|
98 |
+
```
|