Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,29 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
+
|
5 |
+
λλμ νκΈ νΉν λ°μ΄ν°λ‘ μ¬μ νμ΅ (pre-training)μ μ§νν DeBERTa-v2 λͺ¨λΈμ
λλ€.
|
6 |
+
|
7 |
+
νΉν λ¬Έμμ abstract, claims, description μμ£Όμ ν
μ€νΈλ‘ μ¬μ νμ΅μ΄ μ§νλμμ΅λλ€.
|
8 |
+
|
9 |
+
νΉν λ¬Έμ μλ² λ© κ³μ°, νΉμ νΉν λ¬Έμ λΆλ₯λ±μ νμ€ν¬μ νμ©ν μ μλ νκΈ μΈμ΄λͺ¨λΈ (Language Model)μ
λλ€.
|
10 |
+
|
11 |
+
## Patent Text Embedding κ³μ° μμ
|
12 |
+
|
13 |
+
```
|
14 |
+
patent_abstract = '''λ³Έ λ°λͺ
μ νΉν κ²μ μμ€ν
λ° κ²μ λ°©λ²μ κ΄ν κ²μΌλ‘, λ³΄λ€ μμΈνκ²λ μ
λ ₯ν κ²μμ΄μ λμμ΄λ₯Ό μ 곡, κ²μμ΄λ₯Ό μλμΌλ‘ λ²μνμ¬ κ΅κ°μ μκ΄μμ΄ κ²μμ κ°λ₯ν λ‘ νκ±°λ λλΆλ₯, μ€λΆλ₯, μλΆλ₯ λ± λΆλ₯ν κ²μμ΄λ₯Ό μ‘°ν©νμ¬ κ²μμ νν¨μΌλ‘μ¨, ν¨μ¨μ μΈ μ νκΈ°μ μ κ²μν μ μλλ‘ νλ νΉν κ²μ μμ€ν
λ° κ²μ λ°©λ²μ κ΄ν κ²μ΄λ€.
|
15 |
+
νΉν κ²μ, μ μ¬λ, ν€μλ μΆμΆ, κ²μμ '''
|
16 |
+
|
17 |
+
tokenizer = AutoTokenizer.from_pretrained("axiomlabs/KR-patent-deberta-large")
|
18 |
+
|
19 |
+
encoded_inputs = tokenizer(patent_abstract, max_length=512, truncation=True, padding="max_length", return_tensors="pt")
|
20 |
+
|
21 |
+
model = AutoModel.from_pretrained("axiomlabs/KR-patent-deberta-large")
|
22 |
+
|
23 |
+
model.eval()
|
24 |
+
|
25 |
+
with torch.no_grad():
|
26 |
+
outputs = model(**encoded_inputs)[0][:,0,:] # CLS-Pooling
|
27 |
+
print(outputs.shape) # [1, 2048]
|
28 |
+
```
|
29 |
+
|