qizekun
/

ShapeLLM_13B_general_v1.0

+---
+license: apache-2.0
+datasets:
+- qizekun/ShapeLLM
+language:
+- en
+---
+## Install
+[//]: # (If you are using Windows, do *NOT* proceed, see instructions [here]&#40;https://github.com/qizekun/LLaVA/blob/main/docs/Windows.md&#41;.)
+1. Clone this repository and navigate to ShapeLLM folder
+```Shell
+git clone https://github.com/qizekun/ShapeLLM.git
+cd ShapeLLM
+```
+2. Install Package
+```Shell
+conda create -n shapellm python=3.10 -y
+conda activate shapellm
+pip install --upgrade pip  # enable PEP 660 support
+pip install -e .
+```
+3. Install additional packages for training cases
+```Shell
+pip install -e ".[train]"
+pip install flash-attn --no-build-isolation
+```
+4. Install PointNet++
+```Shell
+pip install "git+https://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"
+```
+## ShapeLLM
+### model weights
+Please check out our [Model Zoo](https://github.com/qizekun/ShapeLLM/blob/main/docs/MODEL_ZOO.md) for all public ShapeLLM checkpoints.
+### Demo
+#### CLI Inference
+Chat about point clouds using CLI interface. It also supports multiple GPUs, 4-bit and 8-bit quantized inference.
+```Shell
+python -m llava.serve.cli \
+    --model-path qizekun/ShapeLLM_13B_general_v1.0 \
+    --pts-file assets/instrument.npy
+```
+### Training
+Consistent with LLaVA, we adopt a two-stage training approach. In the first stage, we solely fine-tune the projector for semantic alignment. In the second stage, we conduct full fine-tuning using Instruction Following data.
+Download data following [DATA](https://github.com/qizekun/ShapeLLM/blob/main/docs/DATA.md), organize the data as follows in `./playground/data/shapellm/`,
+```
+│playground/data/shapellm/
+├── cap3d_objaverse_785k.json
+├── cap3d_objaverse_sft_45k.json
+├── gapartnet_sft_27k_openai.json
+├── gapartnet_pcs
+│   ├── Box_100129_0_0.npy
+│   └── ...
+└── cap3d_pcs
+    ├── 00000054c36d44a2a483bdbff31d8edf.pt
+    └── ...
+```
+Furthermore, ShapeLLM utilizes the Large version of [ReCon++](https://github.com/qizekun/ShapeLLM/blob/main/ReConV2/cfgs/pretrain/large/openshape.yaml) as the point encoder.
+You need to download the [ReCon++ weight](https://huggingface.co/qizekun/ReConV2/blob/main/zeroshot/large/best_lvis.pth) and save it to `./checkpoints/recon/large.pth`.
+```
+│checkpoints/recon/
+└── large.pth
+```
+**1. Feature Alignment Stage**
+```
+sh scripts/pretrain.sh
+```
+**2. Visual Instruction Tuning Stage**
+```
+sh scripts/finetune.sh
+```
+The training takes around 14 hours for ShapeLLM-13B on 8x A100 (80G). It takes around 7 hours for ShapeLLM-7B.
+### Zero-shot Understanding on 3D MM-Vet
+Evaluate 3D MLLMs for integrated capabilities and embodied interaction capabilities, run the script:
+```
+sh scripts/eval/mmvet.sh
+```
+Using GPT4 to calulate the 3D MM-Vet score:
+```
+sh scripts/eval/eval_mmvet.sh
+```
+### Visual Grounding on GApartNet
+Evaluate the performance of ShapeLLM on the GApartNet dataset, run the script:
+```
+sh scripts/eval/gapartnet_ref.sh
+```
+Calucate the generative 3D visual grounding accuracy:
+```
+sh scripts/eval/eval_gapartnet.sh
+```