hunflair
/

biosyn-sapbert-ncbi-disease

entity-mention-linker

Model card Files Files and versions Community

biosyn-sapbert-ncbi-disease / README.md

wbi-sg's picture

Update README.md

d8835c0 verified 10 months ago

|

1.95 kB

	---
	tags:
	- flair
	- entity-mention-linker
	---

	## biosyn-sapbert-ncbi-disease

	Biomedical Entity Mention Linking for diseases:

	- Model: [dmis-lab/biosyn-sapbert-ncbi-disease](https://huggingface.co/dmis-lab/biosyn-sapbert-ncbi-disease)
	- Dictionary: [CTD Diseases](https://ctdbase.org/voc.go?type=disease) (See License)

	### Demo: How to use in Flair

	Requires:

	- [Flair](https://github.com/flairNLP/flair/)>=0.14.0 (`pip install flair` or `pip install git+https://github.com/flairNLP/flair.git`)

	```python
	from flair.data import Sentence
	from flair.models import Classifier, EntityMentionLinker
	from flair.tokenization import SciSpacyTokenizer

	sentence = Sentence(
	"The mutation in the ABCD1 gene causes X-linked adrenoleukodystrophy, "
	"a neurodegenerative disease, which is exacerbated by exposure to high "
	"levels of mercury in dolphin populations.",
	use_tokenizer=SciSpacyTokenizer()
	)

	# load hunflair to detect the entity mentions we want to link.
	tagger = Classifier.load("hunflair-disease")
	tagger.predict(sentence)

	# load the linker and dictionary
	linker = EntityMentionLinker.load("hunflair/biosyn-sapbert-ncbi-disease")
	dictionary = linker.dictionary

	# find then candidates for the mentions
	linker.predict(sentence)

	# print the results for each entity mention:
	for span in sentence.get_spans(tagger.label_type):
	print(f"Span: {span.text}")
	for candidate_label in span.get_labels(linker.label_type):
	candidate = dictionary[candidate_label.value]
	print(f"Candidate: {candidate.concept_name}")
	```

	As an alternative to downloading the already precomputed model (much storage). You can also build the model
	and compute the embeddings for the dataset using:

	```python
	linker = EntityMentionLinker.build("dmis-lab/biosyn-biobert-ncbi-disease", dictionary_name_or_path="ctd-diseases", hybrid_search=True)
	```

	This will reduce the download requirements, at the cost of computation.