zhemaxiya commited on
Commit
bdfbcec
·
verified ·
1 Parent(s): 1f7c447

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -21
README.md CHANGED
@@ -1,9 +1,10 @@
1
  ---
2
  language: en
3
  license: apache-2.0
4
- library_name: transformers
5
  tags:
6
- - clip
 
7
  - vision-language
8
  - zero-shot-classification
9
  - marine-species
@@ -12,25 +13,45 @@ tags:
12
  - oceangpt-x
13
  ---
14
  # OceanCLIP-0.15B: Marine Vision-Language Model
15
- A vision-language model fine-tuned on marine imagery and textual data. Optimized for species identification, zero-shot classification, and cross-validation in underwater/sonar environments.
16
- ## Model Details
17
- - **Architecture:** CLIP-style (Vision Transformer + Text Encoder)
18
- - **Parameters:** ~0.15B
19
- - **Domain:** Marine Biology, Underwater Imagery, Sonar Data
20
- - **Framework:** Compatible with `transformers` and `open_clip`
21
- ## Usage
 
 
 
 
 
 
22
  ```python
23
- from transformers import CLIPProcessor, CLIPModel
 
24
  from PIL import Image
25
- model = CLIPModel.from_pretrained("zjunlp/OceanCLIP-0.15B")
26
- processor = CLIPProcessor.from_pretrained("zjunlp/OceanCLIP-0.15B")
27
- image = Image.open("marine_image.jpg")
28
- inputs = processor(
29
- text=["a photo of a clownfish", "a photo of a coral reef"],
30
- images=image,
31
- return_tensors="pt",
32
- padding=True
33
  )
34
- outputs = model(**inputs)
35
- probs = outputs.logits_per_image.softmax(dim=-1)
36
- print(probs)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  language: en
3
  license: apache-2.0
4
+ library_name: open_clip
5
  tags:
6
+ - open-clip
7
+ - bioclip
8
  - vision-language
9
  - zero-shot-classification
10
  - marine-species
 
13
  - oceangpt-x
14
  ---
15
  # OceanCLIP-0.15B: Marine Vision-Language Model
16
+
17
+ A vision-language model fine-tuned on marine imagery and biological terminology using the OpenCLIP framework. Built upon [BioCLIP](https://github.com/Imageomics/bioclip), it is optimized for marine species identification, zero-shot classification, and cross-validation in underwater/sonar environments.
18
+
19
+ ## 📂 Repository Contents
20
+ | Directory | File | Description |
21
+ |:---|:---|:---|
22
+ | `oceanclip-bio/` | `epoch_50.pt` | **Fine-tuned checkpoint**. Marine-adapted weights after 50 training epochs. Contains the updated vision & text encoder projections. |
23
+ | `oceanclip-bio/` | `terms.txt` | **Marine terminology list**. Line-by-line species names (e.g., `A abramis`). Used for zero-shot classification to dynamically build class-specific text prompts. |
24
+ | `bioclip/` | `open_clip_config.json` | **Architecture & preprocessing config**. Defines ViT-B/16 vision encoder, Transformer text encoder (77 context, 512 width), and image normalization (`mean`/`std`). |
25
+ | `bioclip/` | `open_clip_pytorch_model.bin` | **Base BioCLIP weights**. Original OpenCLIP-format pre-trained weights. Serves as the initialization backbone before marine-specific fine-tuning. |
26
+
27
+ ## 🚀 Usage
28
+ Requires `open_clip_torch` and `torch`.
29
  ```python
30
+ import open_clip
31
+ import torch
32
  from PIL import Image
33
+
34
+ # 1. Load architecture & base weights
35
+ model, _, preprocess = open_clip.create_model_and_transforms(
36
+ model_name="ViT-B-16",
37
+ pretrained="bioclip/open_clip_pytorch_model.bin"
 
 
 
38
  )
39
+ tokenizer = open_clip.get_tokenizer("ViT-B-16")
40
+
41
+ # 2. Load fine-tuned marine weights
42
+ state_dict = torch.load("oceanclip-bio/epoch_50.pt", map_location="cpu")
43
+ model.load_state_dict(state_dict, strict=False)
44
+ model.eval()
45
+
46
+ # 3. Inference (Zero-Shot with terms.txt)
47
+ image = preprocess(Image.open("marine_input.jpg")).unsqueeze(0)
48
+ terms = [line.strip() for line in open("oceanclip-bio/terms.txt", "r") if line.strip()]
49
+ text_tokens = tokenizer(terms)
50
+
51
+ with torch.no_grad():
52
+ image_feat = model.encode_image(image)
53
+ text_feat = model.encode_text(text_tokens)
54
+ logits = (image_feat @ text_feat.T).softmax(dim=-1)
55
+
56
+ top_idx = logits.argmax().item()
57
+ print(f"Predicted species: {terms[top_idx]} (Confidence: {logits[0, top_idx]:.4f})")