Lukekim's picture
Update README.md
5e58abc
|
raw
history blame
1.4 kB
---
license: apache-2.0
---
λŒ€λŸ‰μ˜ ν•œκΈ€ νŠΉν—ˆ λ°μ΄ν„°λ‘œ μ‚¬μ „ν•™μŠ΅ (pre-training)을 μ§„ν–‰ν•œ DeBERTa-v2 λͺ¨λΈμž…λ‹ˆλ‹€.
νŠΉν—ˆ λ¬Έμ„œμ˜ abstract, claims, description μœ„μ£Όμ˜ ν…μŠ€νŠΈλ‘œ μ‚¬μ „ν•™μŠ΅μ΄ μ§„ν–‰λ˜μ—ˆμŠ΅λ‹ˆλ‹€.
νŠΉν—ˆ λ¬Έμ„œ μž„λ² λ”© 계산, ν˜Ήμ€ νŠΉν—ˆ λ¬Έμ„œ λΆ„λ₯˜λ“±μ˜ νƒœμŠ€ν¬μ— ν™œμš©ν•  수 μžˆλŠ” ν•œκΈ€ μ–Έμ–΄λͺ¨λΈ (Language Model)μž…λ‹ˆλ‹€.
## Patent Text Embedding 계산 μ˜ˆμ‹œ
```Python
patent_abstract = '''λ³Έ 발λͺ…은 νŠΉν—ˆ 검색 μ‹œμŠ€ν…œ 및 검색 방법에 κ΄€ν•œ κ²ƒμœΌλ‘œ, 보닀 μžμ„Έν•˜κ²ŒλŠ” μž…λ ₯ν•œ κ²€μƒ‰μ–΄μ˜ λ™μ˜μ–΄λ₯Ό 제곡, 검색어λ₯Ό μžλ™μœΌλ‘œ λ²ˆμ—­ν•˜μ—¬ ꡭ가에 상관없이 검색을 κ°€λŠ₯토둝 ν•˜κ±°λ‚˜ λŒ€λΆ„λ₯˜, 쀑뢄λ₯˜, μ†ŒλΆ„λ₯˜ λ“± λΆ„λ₯˜ν•œ 검색어λ₯Ό μ‘°ν•©ν•˜μ—¬ 검색을 ν–‰ν•¨μœΌλ‘œμ¨, 효율적인 μ„ ν–‰κΈ°μˆ μ„ 검색할 수 μžˆλ„λ‘ ν•˜λŠ” νŠΉν—ˆ 검색 μ‹œμŠ€ν…œ 및 검색 방법에 κ΄€ν•œ 것이닀.
νŠΉν—ˆ 검색, μœ μ‚¬λ„, ν‚€μ›Œλ“œ μΆ”μΆœ, 검색식 '''
tokenizer = AutoTokenizer.from_pretrained("LDKSolutions/KR-patent-deberta-large")
encoded_inputs = tokenizer(patent_abstract, max_length=512, truncation=True, padding="max_length", return_tensors="pt")
model = AutoModel.from_pretrained("LDKSolutions/KR-patent-deberta-large")
model.eval()
with torch.no_grad():
outputs = model(**encoded_inputs)[0][:,0,:] # CLS-Pooling
print(outputs.shape) # [1, 2048]
```