Dataset Collection:
- The news dataset is collected from Kaggledataset
- The dataset has news title ,news content and the label(the label shows the cosine similarity between news title and news content).
- Different strategies have been followed during the data gathering phase.
sentence transformer is fine-tuned for semantic search and sentence similarity
- The model is fine-tuned on the dataset.
- This model can be used for semantic search,sentence similarity,recommendation system.
- This model can be used for the inference purpose as well.
Data Fields:
label: cosine similarity between news title and news content news title: The title of the news news content:The content of the news
Application:
- This model is useful for the semantic search,sentence similarity,recommendation system.
- You can fine-tune this model for your particular use cases.
Model Implementation
pip install -U sentence-transformers
from sentence_transformers import SentenceTransformer, InputExample, losses
import pandas as pd
from sentence_transformers import SentenceTransformer, InputExample
from torch.utils.data import DataLoader
from sentence_transformers import SentenceTransformer, util
model_name="Sakil/sentence_similarity_semantic_search"
model = SentenceTransformer(model_name)
sentences = ['A man is eating food.',
'A man is eating a piece of bread.',
'The girl is carrying a baby.',
'A man is riding a horse.',
'A woman is playing violin.',
'Two men pushed carts through the woods.',
'A man is riding a white horse on an enclosed ground.',
'A monkey is playing drums.',
'Someone in a gorilla costume is playing a set of drums.'
]
#Encode all sentences
embeddings = model.encode(sentences)
#Compute cosine similarity between all pairs
cos_sim = util.cos_sim(embeddings, embeddings)
#Add all pairs to a list with their cosine similarity score
all_sentence_combinations = []
for i in range(len(cos_sim)-1):
for j in range(i+1, len(cos_sim)):
all_sentence_combinations.append([cos_sim[i][j], i, j])
#Sort list by the highest cosine similarity score
all_sentence_combinations = sorted(all_sentence_combinations, key=lambda x: x[0], reverse=True)
print("Top-5 most similar pairs:")
for score, i, j in all_sentence_combinations[0:5]:
print("{} \t {} \t {:.4f}".format(sentences[i], sentences[j], cos_sim[i][j]))
Github: Sakil Ansari
- Downloads last month
- 1,086
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.