qizekun commited on
Commit
e0df0cd
β€’
1 Parent(s): 055e352

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +98 -0
README.md ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - qizekun/ShapeLLM
5
+ language:
6
+ - en
7
+ ---
8
+
9
+ ## Install
10
+
11
+ [//]: # (If you are using Windows, do *NOT* proceed, see instructions [here](https://github.com/qizekun/LLaVA/blob/main/docs/Windows.md).)
12
+
13
+ 1. Clone this repository and navigate to ShapeLLM folder
14
+ ```Shell
15
+ git clone https://github.com/qizekun/ShapeLLM.git
16
+ cd ShapeLLM
17
+ ```
18
+ 2. Install Package
19
+ ```Shell
20
+ conda create -n shapellm python=3.10 -y
21
+ conda activate shapellm
22
+ pip install --upgrade pip # enable PEP 660 support
23
+ pip install -e .
24
+ ```
25
+ 3. Install additional packages for training cases
26
+ ```Shell
27
+ pip install -e ".[train]"
28
+ pip install flash-attn --no-build-isolation
29
+ ```
30
+ 4. Install PointNet++
31
+ ```Shell
32
+ pip install "git+https://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"
33
+ ```
34
+
35
+
36
+ ## ShapeLLM
37
+ ### model weights
38
+ Please check out our [Model Zoo](https://github.com/qizekun/ShapeLLM/blob/main/docs/MODEL_ZOO.md) for all public ShapeLLM checkpoints.
39
+
40
+ ### Demo
41
+ #### CLI Inference
42
+ Chat about point clouds using CLI interface. It also supports multiple GPUs, 4-bit and 8-bit quantized inference.
43
+ ```Shell
44
+ python -m llava.serve.cli \
45
+ --model-path qizekun/ShapeLLM_13B_general_v1.0 \
46
+ --pts-file assets/instrument.npy
47
+ ```
48
+
49
+ ### Training
50
+ Consistent with LLaVA, we adopt a two-stage training approach. In the first stage, we solely fine-tune the projector for semantic alignment. In the second stage, we conduct full fine-tuning using Instruction Following data.
51
+ Download data following [DATA](https://github.com/qizekun/ShapeLLM/blob/main/docs/DATA.md), organize the data as follows in `./playground/data/shapellm/`,
52
+ ```
53
+ β”‚playground/data/shapellm/
54
+ β”œβ”€β”€ cap3d_objaverse_785k.json
55
+ β”œβ”€β”€ cap3d_objaverse_sft_45k.json
56
+ β”œβ”€β”€ gapartnet_sft_27k_openai.json
57
+ β”œβ”€β”€ gapartnet_pcs
58
+ β”‚ β”œβ”€β”€ Box_100129_0_0.npy
59
+ β”‚ └── ...
60
+ └── cap3d_pcs
61
+ β”œβ”€β”€ 00000054c36d44a2a483bdbff31d8edf.pt
62
+ └── ...
63
+ ```
64
+ Furthermore, ShapeLLM utilizes the Large version of [ReCon++](https://github.com/qizekun/ShapeLLM/blob/main/ReConV2/cfgs/pretrain/large/openshape.yaml) as the point encoder.
65
+ You need to download the [ReCon++ weight](https://huggingface.co/qizekun/ReConV2/blob/main/zeroshot/large/best_lvis.pth) and save it to `./checkpoints/recon/large.pth`.
66
+ ```
67
+ β”‚checkpoints/recon/
68
+ └── large.pth
69
+ ```
70
+ **1. Feature Alignment Stage**
71
+ ```
72
+ sh scripts/pretrain.sh
73
+ ```
74
+ **2. Visual Instruction Tuning Stage**
75
+ ```
76
+ sh scripts/finetune.sh
77
+ ```
78
+ The training takes around 14 hours for ShapeLLM-13B on 8x A100 (80G). It takes around 7 hours for ShapeLLM-7B.
79
+
80
+ ### Zero-shot Understanding on 3D MM-Vet
81
+ Evaluate 3D MLLMs for integrated capabilities and embodied interaction capabilities, run the script:
82
+ ```
83
+ sh scripts/eval/mmvet.sh
84
+ ```
85
+ Using GPT4 to calulate the 3D MM-Vet score:
86
+ ```
87
+ sh scripts/eval/eval_mmvet.sh
88
+ ```
89
+
90
+ ### Visual Grounding on GApartNet
91
+ Evaluate the performance of ShapeLLM on the GApartNet dataset, run the script:
92
+ ```
93
+ sh scripts/eval/gapartnet_ref.sh
94
+ ```
95
+ Calucate the generative 3D visual grounding accuracy:
96
+ ```
97
+ sh scripts/eval/eval_gapartnet.sh
98
+ ```