pankajrajdeo commited on
Commit
8d07a60
1 Parent(s): a28c859

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 512,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,886 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: sentence-transformers
3
+ pipeline_tag: sentence-similarity
4
+ tags:
5
+ - sentence-transformers
6
+ - sentence-similarity
7
+ - feature-extraction
8
+ - generated_from_trainer
9
+ - dataset_size:26147930
10
+ - loss:MultipleNegativesRankingLoss
11
+ widget:
12
+ - source_sentence: '[YEAR_RANGE] 2020-2024 [TEXT] Vitamin B-6 Prevents Heart Failure
13
+ with Preserved Ejection Fraction Through Downstream of Kinase 3 in a Mouse Model.'
14
+ sentences:
15
+ - '[YEAR_RANGE] 2020-2024 [TEXT] Colorectal cancer (CRC) is a complex and genetically
16
+ heterogeneous disease presenting a specific metastatic pattern, with the liver
17
+ being the most common site of metastasis. Around 20%-25% of patients with CRC
18
+ will develop exclusively hepatic metastatic disease throughout their disease history.
19
+ With its specific characteristics and therapeutic options, liver-limited disease
20
+ (LLD) should be considered as a specific entity. The identification of these patients
21
+ is particularly relevant in view of the growing interest in liver transplantation
22
+ in selected patients with advanced CRC. Identifying why some patients will develop
23
+ only LLD remains a challenge, mainly because of a lack of a systemic understanding
24
+ of this complex and interlinked phenomenon given that cancer has traditionally
25
+ been investigated according to distinct physiological compartments. Recently,
26
+ multidisciplinary efforts and new diagnostic tools have made it possible to study
27
+ some of these complex issues in greater depth and may help identify targets and
28
+ specific treatment strategies to benefit these patients. In this review we analyze
29
+ the underlying biology and available tools to help clinicians better understand
30
+ this increasingly common and specific disease.'
31
+ - '[YEAR_RANGE] 2020-2024 [TEXT] PURPOSE: Secondary breast cancer is a frequent
32
+ late adverse event of mediastinal Hodgkin lymphoma radiotherapy. Secondary breast
33
+ cancers overwhelmingly correspond to ductal carcinoma and develop from the glandular
34
+ mammary tissue. In addition, during childhood, radiation overexposure of the glandular
35
+ tissue may lead to a late breast hypotrophy at adult age. The aim of this study
36
+ was to evaluate the radiation exposure to the glandular tissue in patients treated
37
+ for mediastinal Hodgkin lymphoma with intensity-modulated proton therapy, in order
38
+ to evaluate the potential dosimetric usefulness of its delineation for breast
39
+ sparing. MATERIALS AND METHODS: Sixteen consecutive intermediate-risk mediastinal
40
+ female patients with Hodgkin lymphoma treated with consolidation radiation with
41
+ deep inspiration breath hold intensity-modulated proton therapy to the total dose
42
+ of 30Gy were included. Breasts were delineated according to the European Society
43
+ for Radiotherapy and Oncology guidelines for treatment optimization ("clinical
44
+ organ at risk"). The glandular tissue ("glandular organ at risk") was retrospectively
45
+ contoured on the initial simulation CT scans based on Hounsfield unit (HU) values,
46
+ using a range between -80HU and 500HU. RESULTS: The mean and maximum doses delivered
47
+ to the glandular organ at risk were significantly lower than the mean and maximum
48
+ doses delivered to the clinical organ at risk, but were statistically correlated.
49
+ Glandular organ at risk volumes were significantly smaller. CONCLUSION: Optimizing
50
+ the treatment plans on the clinical breast contours will systematically lead to
51
+ overestimation of the dose received to the glandular tissue and, consequently,
52
+ to an indistinct and involuntary improved glandular tissue sparing. As such, our
53
+ findings do not support the consideration of the glandular tissue as an additional
54
+ organ at risk when planning intensity-modulated proton therapy for mediastinal
55
+ Hodgkin lymphoma in female patients.'
56
+ - '[YEAR_RANGE] 2020-2024 [TEXT] BACKGROUND: There is an urgent need to develop
57
+ an efficient therapeutic strategy for heart failure with preserved ejection fraction
58
+ (HFpEF), which is mediated by phenotypic changes in cardiac macrophages. We previously
59
+ reported that vitamin B-6 inhibits macrophage-mediated inflammasome activation.
60
+ OBJECTIVES: We sought to examine whether the prophylactic use of vitamin B-6 prevents
61
+ HFpEF. METHODS: HFpEF model was elicited by a combination of high-fat diet and
62
+ Nω-nitro-l-arginine methyl ester supplement in mice. Cardiac function was assessed
63
+ using conventional echocardiography and Doppler imaging. Immunohistochemistry
64
+ and immunoblotting were used to detect changes in the macrophage phenotype and
65
+ myocardial remodeling-related molecules. RESULTS: Co-administration of vitamin
66
+ B-6 with HFpEF mice mitigated HFpEF phenotypes, including diastolic dysfunction,
67
+ cardiac macrophage phenotypic shifts, fibrosis, and hypertrophy. Echocardiographic
68
+ improvements were observed, with the E/E'' ratio decreasing from 42.0 to 21.6
69
+ and the E/A ratio improving from 2.13 to 1.17. The exercise capacity also increased
70
+ from 295.3 to 657.7 min. However, these beneficial effects were negated in downstream
71
+ of kinase (DOK) 3-deficient mice. Mechanistically, vitamin B-6 increased DOK3
72
+ protein concentrations and inhibited macrophage phenotypic changes, which were
73
+ abrogated by an AMP-activated protein kinase inhibitor. CONCLUSIONS: Vitamin B-6
74
+ increases DOK3 signaling to lower risk of HFpEF by inhibiting phenotypic changes
75
+ in cardiac macrophages.'
76
+ - source_sentence: '[YEAR_RANGE] 2020-2024 [TEXT] Resolving phylogenetic relationships
77
+ and taxonomic revision in the Pseudogastromyzon (Cypriniformes, Gastromyzonidae)
78
+ genus: molecular and morphological evidence for a new genus, Labigastromyzon.'
79
+ sentences:
80
+ - '[YEAR_RANGE] 2020-2024 [TEXT] Bats contain a diverse spectrum of viral species
81
+ in their bodies. The RNA virus family Paramyxoviridae tends to infect several
82
+ vertebrate species, which are accountable for a variety of devastating infections
83
+ in both humans and animals. Viruses of this kind include measles, mumps, and Hendra.
84
+ Some synonymous codons are favoured over others in mRNAs during gene-to-protein
85
+ synthesis process. Such phenomenon is termed as codon usage bias (CUB). Our research
86
+ emphasized many aspects that shape the CUB of genes in the Paramyxoviridae family
87
+ found in bats. Here, the nitrogenous base A occurred the most. AT was found to
88
+ be abundant in the coding sequences of the Paramyxoviridae family. RSCU data revealed
89
+ that A or T ending codons occurred more frequently than predicted. Furthermore,
90
+ 3 overrepresented codons (CAT, AGA, and GCA) and 7 underrepresented codons (CCG,
91
+ TCG, CGC, CGG, CGT, GCG and ACG) were detected in the viral genomes. Correspondence
92
+ analysis, neutrality plot, and parity plots highlight the combined impact of mutational
93
+ pressure and natural selection on CUB. The neutrality plot of GC12 against GC3
94
+ yielded a regression coefficient value of 0.366, indicating that natural selection
95
+ had a significant (63.4 %) impact. Moreover, RNA editing analysis was done, which
96
+ revealed the highest frequency of C to T mutations. The results of our research
97
+ revealed the pattern of codon usage and RNA editing sites in Paramyxoviridae genomes.'
98
+ - '[YEAR_RANGE] 2020-2024 [TEXT] OBJECTIVE: The preoperative inclination angle of
99
+ mandibular incisors was crucial for surgical and postoperative stability while
100
+ the effect of proclined mandibular incisors on skeletal stability has not been
101
+ investigated. This study aimed to evaluate the effects of differences in presurgical
102
+ mandibular incisor inclination on skeletal stability after orthognathic surgery
103
+ in patients with skeletal Class III malocclusion. METHODS: A retrospective cohort
104
+ study of 80 consecutive patients with skeletal Class III malocclusion who underwent
105
+ bimaxillary orthognathic surgery was conducted. According to incisor mandibular
106
+ plane angle (IMPA), patients were divided into 3 groups: retroclined inclination
107
+ (IMPA < 87°), normal inclination (87° ≤ IMPA < 93°) and proclined inclination
108
+ (IMPA ≥ 93°). Preoperative characteristics, surgical changes and postoperative
109
+ stability were compared based on lateral cephalograms obtained 1 week before surgery
110
+ (T0), 1 week after surgery (T1), and at 6 to 12 months postoperatively (T2). RESULTS:
111
+ The mandible demonstrated a forward and upward relapse in all three groups. No
112
+ significant differences in skeletal relapse were observed in the 3 groups of patients.
113
+ However, the proclined inclination group showed a negative overbite tendency postoperatively
114
+ compared with the other two groups and a clinically significant mandibular relapse
115
+ pattern. Proclined IMPA both pre- and postoperatively was correlated with mandibular
116
+ relapse. CONCLUSION: Sufficient presurgical mandibular incisor decompensation
117
+ was of crucial importance for the maintenance of skeletal stability in patients
118
+ with skeletal Class III malocclusion who subsequently underwent orthognathic surgery.'
119
+ - '[YEAR_RANGE] 2020-2024 [TEXT] The Pseudogastromyzon genus, consisting of species
120
+ predominantly distributed throughout southeastern China, has garnered increasing
121
+ market attention in recent years due to its ornamental appeal. However, the overlapping
122
+ diagnostic attributes render the commonly accepted criteria for interspecific
123
+ identification unreliable, leaving the phylogenetic relationships among Pseudogastromyzon
124
+ species unexplored. In the present study, we undertake molecular phylogenetic
125
+ and morphological examinations of the Pseudogastromyzon genus. Our phylogenetic
126
+ analysis of mitochondrial genes distinctly segregated Pseudogastromyzon species
127
+ into two clades: the Pseudogastromyzon clade and the Labigastromyzon clade. A
128
+ subsequent morphological assessment revealed that the primary dermal ridge (specifically,
129
+ the second ridge) within the labial adhesive apparatus serves as an effective
130
+ and precise interspecific diagnostic characteristic. Moreover, the distributional
131
+ ranges of Pseudogastromyzon and Labigastromyzon are markedly distinct, exhibiting
132
+ only a narrow area of overlap. Considering the morphological heterogeneity of
133
+ the labial adhesive apparatus and the substantial division within the molecular
134
+ phylogeny, we advocate for the elevation of the Labigastromyzon subgenus to the
135
+ status of a separate genus. Consequently, we have ascertained the validity of
136
+ the Pseudogastromyzon and Labigastromyzon species, yielding a total of six valid
137
+ species. To facilitate future research, we present comprehensive descriptions
138
+ of the redefined species and introduce novel identification keys.'
139
+ - source_sentence: '[YEAR_RANGE] 2020-2024 [TEXT] PCa-RadHop: A transparent and lightweight
140
+ feed-forward method for clinically significant prostate cancer segmentation.'
141
+ sentences:
142
+ - '[YEAR_RANGE] 2020-2024 [TEXT] According to the importance of time in treatment
143
+ of thrombosis disorders, faster than current treatments are required. For the
144
+ first time, this research discloses a novel strategy for rapid dissolution of
145
+ blood clots by encapsulation of a fibrinolytic (Reteplase) into a Thrombin sensitive
146
+ shell formed by polymerization of acrylamide monomers and bisacryloylated peptide
147
+ as crosslinker. Degradability of the peptide units in exposure to Thrombin, creates
148
+ the Thrombin-sensitive Reteplase nanocapsules (TSRNPs) as a triggered release
149
+ system. Accelerated thrombolysis was achieved by combining three approaches including:
150
+ deep penetration of TSRNPs into the blood clots, changing the clot dissolution
151
+ mechanism by altering the distribution pattern of TSRNPs to 3D intra-clot distribution
152
+ (based on the distributed intra-clot thrombolysis (DIT) model) instead of peripheral
153
+ and unidirectional distribution of unencapsulated fibrinolytics and, enzyme-stimulated
154
+ release of the fibrinolytic. Ex-vivo study was carried out by an occluded tube
155
+ model that mimics in-vivo brain stroke as an emergency situation where faster
156
+ treatment in short time is a golden key. In in vivo, efficacy of the developed
157
+ formulation was confirmed by PET scan and laser Doppler flowmetry (LDF). As the
158
+ most important achievements, 40.0 ± 0.7 (n = 3) % and 37.0 ± 0.4 (n = 3) % reduction
159
+ in the thrombolysis time (faster reperfusion) were observed by ex-vivo and in-vivo
160
+ experiments, respectively. Higher blood flow and larger digestion mass of clot
161
+ at similar times in comparison to non-encapsulated Reteplase were observed that
162
+ means more effective thrombolysis by the developed strategy.'
163
+ - '[YEAR_RANGE] 2020-2024 [TEXT] Prostate Cancer is one of the most frequently occurring
164
+ cancers in men, with a low survival rate if not early diagnosed. PI-RADS reading
165
+ has a high false positive rate, thus increasing the diagnostic incurred costs
166
+ and patient discomfort. Deep learning (DL) models achieve a high segmentation
167
+ performance, although require a large model size and complexity. Also, DL models
168
+ lack of feature interpretability and are perceived as "black-boxes" in the medical
169
+ field. PCa-RadHop pipeline is proposed in this work, aiming to provide a more
170
+ transparent feature extraction process using a linear model. It adopts the recently
171
+ introduced Green Learning (GL) paradigm, which offers a small model size and low
172
+ complexity. PCa-RadHop consists of two stages: Stage-1 extracts data-driven radiomics
173
+ features from the bi-parametric Magnetic Resonance Imaging (bp-MRI) input and
174
+ predicts an initial heatmap. To reduce the false positive rate, a subsequent stage-2
175
+ is introduced to refine the predictions by including more contextual information
176
+ and radiomics features from each already detected Region of Interest (ROI). Experiments
177
+ on the largest publicly available dataset, PI-CAI, show a competitive performance
178
+ standing of the proposed method among other deep DL models, achieving an area
179
+ under the curve (AUC) of 0.807 among a cohort of 1,000 patients. Moreover, PCa-RadHop
180
+ maintains orders of magnitude smaller model size and complexity.'
181
+ - '[YEAR_RANGE] 2020-2024 [TEXT] OBJECTIVE: To evaluate rates of remission, recovery,
182
+ relapse, and recurrence in suicidal youth who participated in a clinical trial
183
+ comparing Dialectical Behavior Therapy (DBT) and Individual and Group Supportive
184
+ Therapy (IGST). METHOD: Participants were 173 youth, aged 12 to 18 years, with
185
+ repetitive self-harm (including at least 1 prior suicide attempt [SA]) and elevated
186
+ suicidal ideation (SI). Participants received 6 months of DBT or IGST and were
187
+ followed for 6 months post-treatment. The sample was 95% female, 56.4% White,
188
+ and 27.49% Latina. Remission was defined as absence of SA or nonsuicidal self-injury
189
+ (NSSI) across one 3-month interval; recovery was defined across 2 or more consecutive
190
+ intervals. Relapse and recurrence were defined as SA or NSSI following remission
191
+ or recovery. Cross-tabulation with χ2 was used for between-group contrasts. RESULTS:
192
+ Over 70% of the sample reported remission of SA at each treatment and follow-up
193
+ interval. There were significantly higher rates of remission and recovery and
194
+ lower rates of relapse and recurrence for SA in DBT than for IGST. Across treatments
195
+ and time points, SA had higher remission and recovery rates and lower relapse
196
+ and recurrence rates than NSSI. There were no significant differences in NSSI
197
+ remission between conditions; however, participants receiving DBT had significantly
198
+ higher NSSI recovery rates than those receiving IGST for the 3- to 9-month, 3-
199
+ to 12-month, and 6- to 12-month intervals. CONCLUSION: Results showed higher percentages
200
+ of SA remission and recovery for DBT as compared to IGST. NSSI was less likely
201
+ to remit than SA. PLAIN LANGUAGE SUMMARY: This study examined rates of remission,
202
+ recovery, relapse, and recurrence of suicide attempts (SA) and nonsuicidal self-injury
203
+ (NSSI) among the participants in the CARES Study, a randomized clinical trial
204
+ of 6 months of Dialectical Behavior Therapy or Individual and Group Supportive
205
+ Therapy. 173 youth aged 12 to 18 years participated in the study and were followed
206
+ for 6 months post treatment. Over 70% of the sample reported remission of SA at
207
+ each treatment and follow-up interval. There were significantly higher rates of
208
+ remission and recovery and lower rates of relapse and recurrence for SA among
209
+ participants who received Dialectical Behavioral Therapy. Across both treatments,
210
+ remission and recovery rates were lower and relapse and recurrence rates were
211
+ higher for NSSI than for SA. These results underscore the value of Dialectical
212
+ Behavioral Therapy as a first line treatment for youth at high risk for suicide.
213
+ DIVERSITY & INCLUSION STATEMENT: We worked to ensure race, ethnic, and/or other
214
+ types of diversity in the recruitment of human participants. CLINICAL TRIAL REGISTRATION
215
+ INFORMATION: Collaborative Adolescent Research on Emotions and Suicide (CARES);
216
+ https://www. CLINICALTRIALS: gov/; NCT01528020.'
217
+ - source_sentence: '[YEAR_RANGE] 2020-2024 [TEXT] Predicting Recovery After Concussion
218
+ in Pediatric Patients: A Meta-Analysis.'
219
+ sentences:
220
+ - '[YEAR_RANGE] 2020-2024 [TEXT] OBJECTIVE: The authors examined licensing requirements
221
+ for select children''s behavioral health care providers. METHODS: Statutes and
222
+ regulations as of October 2021 were reviewed for licensed clinical social workers,
223
+ licensed professional counselors, and licensed marriage and family therapists
224
+ for all 50 U.S. states and the District of Columbia. RESULTS: All jurisdictions
225
+ had laws regarding postgraduate training and license portability. No jurisdiction
226
+ included language about specialized postgraduate training related to serving children
227
+ and families or cultural competence. Other policies that related to the structure,
228
+ composition, and authority of licensing boards varied across states and licensure
229
+ types. CONCLUSIONS: In their efforts to address barriers to licensure, expand
230
+ the workforce, and ensure that children have access to high-quality and culturally
231
+ responsive care, states could consider their statutes and regulations.'
232
+ - '[YEAR_RANGE] 2020-2024 [TEXT] Magnetic Resonance Imaging (MRI) plays a pivotal
233
+ role in the accurate measurement of brain subcortical structures in macaques,
234
+ which is crucial for unraveling the complexities of brain structure and function,
235
+ thereby enhancing our understanding of neurodegenerative diseases and brain development.
236
+ However, due to significant differences in brain size, structure, and imaging
237
+ characteristics between humans and macaques, computational tools developed for
238
+ human neuroimaging studies often encounter obstacles when applied to macaques.
239
+ In this context, we propose an Anatomy Attentional Fusion Network (AAF-Net), which
240
+ integrates multimodal MRI data with anatomical constraints in a multi-scale framework
241
+ to address the challenges posed by the dynamic development, regional heterogeneity,
242
+ and age-related size variations of the juvenile macaque brain, thus achieving
243
+ precise subcortical segmentation. Specifically, we generate a Signed Distance
244
+ Map (SDM) based on the initial rough segmentation of the subcortical region by
245
+ a network as an anatomical constraint, providing comprehensive information on
246
+ positions, structures, and morphology. Then we construct AAF-Net to fully fuse
247
+ the SDM anatomical constraints and multimodal images for refined segmentation.
248
+ To thoroughly evaluate the performance of our proposed tool, over 700 macaque
249
+ MRIs from 19 datasets were used in this study. Specifically, we employed two manually
250
+ labeled longitudinal macaque datasets to develop the tool and complete four-fold
251
+ cross-validations. Furthermore, we incorporated various external datasets to demonstrate
252
+ the proposed tool''s generalization capabilities and promise in brain development
253
+ research. We have made this tool available as an open-source resource at https://github.com/TaoZhong11/Macaque_subcortical_segmentation
254
+ for direct application.'
255
+ - '[YEAR_RANGE] 2020-2024 [TEXT] CONTEXT: Prognostic prediction models (PPMs) can
256
+ help clinicians predict outcomes. OBJECTIVE: To critically examine peer-reviewed
257
+ PPMs predicting delayed recovery among pediatric patients with concussion. DATA
258
+ SOURCES: Ovid Medline, Embase, Ovid PsycInfo, Web of Science Core Collection,
259
+ Cumulative Index to Nursing and Allied Health Literature, Cochrane Library, Google
260
+ Scholar. STUDY SELECTION: The study had to report a PPM for pediatric patients
261
+ to be used within 28 days of injury to estimate risk of delayed recovery at 28
262
+ days to 1 year postinjury. Studies had to have at least 30 participants. DATA
263
+ EXTRACTION: The Critical Appraisal and Data Extraction for Systematic Reviews
264
+ of Prediction Modeling Studies checklist was completed. RESULTS: Six studies of
265
+ 13 PPMs were included. These studies primarily reflected male patients in late
266
+ childhood or early adolescence presenting to an emergency department meeting the
267
+ Concussion in Sport Group concussion criteria. No study authors used the same
268
+ outcome definition nor evaluated the clinical utility of a model. All studies
269
+ demonstrated high risk of bias. Quality of evidence was best for the Predicting
270
+ and Preventing Postconcussive Problems in Pediatrics (5P) clinical risk score.
271
+ LIMITATIONS: No formal PPM Grading of Recommendations, Assessment, Development,
272
+ and Evaluations (GRADE) process exists. CONCLUSIONS: The 5P clinical risk score
273
+ may be considered for clinical use. Rigorous external validations, particularly
274
+ in other settings, are needed. The remaining PPMs require external validation.
275
+ Lack of consensus regarding delayed recovery criteria limits these PPMs.'
276
+ - source_sentence: '[YEAR_RANGE] 2020-2024 [TEXT] Intraoperative Monitoring of the
277
+ External Urethral Sphincter Reflex: A Novel Adjunct to Bulbocavernosus Reflex
278
+ Neuromonitoring for Protecting the Sacral Neural Pathways Responsible for Urination,
279
+ Defecation and Sexual Function.'
280
+ sentences:
281
+ - '[YEAR_RANGE] 2020-2024 [TEXT] Early menarche has been associated with adverse
282
+ health outcomes, such as depressive symptoms. Discovering effect modifiers across
283
+ these conditions in the pediatric population is a constant challenge. We tested
284
+ whether movement behaviours modified the effect of the association between early
285
+ menarche and depression symptoms among adolescents. This cross-sectional study
286
+ included 2031 females aged 15-19 years across all Brazilian geographic regions.
287
+ Data were collected using a self-administered questionnaire; 30.5% (n = 620) reported
288
+ having experienced menarche before age 12 years (that is, early menarche). We
289
+ used the Patient Health Questionnaire (PHQ-9) to evaluate depressive symptoms.
290
+ Accruing any moderate-vigorous physical activity during leisure time, limited
291
+ recreational screen time, and having good sleep quality were the exposures investigated.
292
+ Adolescents who experienced early menarche and met one (B: -4.45, 95% CI: (-5.38,
293
+ -3.51)), two (B: -6.07 (-7.02, -5.12)), or three (B: -6.49 (-7.76, -5.21)), and
294
+ adolescents who experienced not early menarche and met one (B: -5.33 (-6.20; -4.46)),
295
+ two (B: -6.12 (-6.99; -5.24)), or three (B: -6.27 (-7.30; -5.24)) of the movement
296
+ behaviour targets had lower PHQ-9 scores for depression symptoms than adolescents
297
+ who experienced early menarche and did not meet any of the movement behaviours.
298
+ The disparities in depressive symptoms among the adolescents (early menarche versus
299
+ not early menarche) who adhered to all three target behaviours were not statistically
300
+ significant (B: 0.41 (-0.19; 1.01)). Adherence to movement behaviours modified
301
+ the effect of the association between early menarche and depression symptoms.'
302
+ - '[YEAR_RANGE] 2020-2024 [TEXT] PURPOSE: Intraoperative bulbocavernosus reflex
303
+ neuromonitoring has been utilized to protect bowel, bladder, and sexual function,
304
+ providing a continuous functional assessment of the somatic sacral nervous system
305
+ during surgeries where it is at risk. Bulbocavernosus reflex data may also provide
306
+ additional functional insight, including an evaluation for spinal shock, distinguishing
307
+ upper versus lower motor neuron injury (conus versus cauda syndromes) and prognosis
308
+ for postoperative bowel and bladder function. Continuous intraoperative bulbocavernosus
309
+ reflex monitoring has been utilized to provide the surgeon with an ongoing functional
310
+ assessment of the anatomical elements involved in the S2-S4 mediated reflex arc
311
+ including the conus, cauda equina and pudendal nerves. Intraoperative bulbocavernosus
312
+ reflex monitoring typically includes the electrical activation of the dorsal nerves
313
+ of the genitals to initiate the afferent component of the reflex, followed by
314
+ recording the resulting muscle response using needle electromyography recordings
315
+ from the external anal sphincter. METHODS: Herein we describe a complementary
316
+ and novel technique that includes recording electromyography responses from the
317
+ external urethral sphincter to monitor the external urethral sphincter reflex.
318
+ Specialized foley catheters embedded with recording electrodes have recently become
319
+ commercially available that provide the ability to perform intraoperative external
320
+ urethral sphincter muscle recordings. RESULTS: We describe technical details and
321
+ the potential utility of incorporating external urethral sphincter reflex recordings
322
+ into existing sacral neuromonitoring paradigms to provide redundant yet complementary
323
+ data streams. CONCLUSIONS: We present two illustrative neurosurgical oncology
324
+ cases to demonstrate the utility of the external urethral sphincter reflex technique
325
+ in the setting of the necessary surgical sacrifice of sacral nerve roots.'
326
+ - '[YEAR_RANGE] 2020-2024 [TEXT] BACKGROUND: Limited data are available on the appropriate
327
+ choice of blood pressure management strategy for patients with acute basilar artery
328
+ occlusion assessed by the standard deviation (SD). Multivariate logistic models
329
+ were used to investigate the association between BPV, the primary outcome (futile
330
+ recanalization, 90-day modified Rankin Scale score 3-6), and the secondary outcome
331
+ (30-day mortality). Subgroup analysis was performed as a sensitivity test. RESULTS:
332
+ Futile recanalization occurred in 60 (56 %) patients, while 26 (24 %) patients
333
+ died within 30 days. In the fully adjusted model, MAP SD was associated with a
334
+ higher risk of futile recanalization (OR adj=1.36, per 1 mmHg increase, 95 % CI:
335
+ 1.09-1.69, P=0.006) and 30-day mortality (OR adj=1.56, per 1 mmHg increase, 95
336
+ % CI: 1.20-2.04, P=0.001). A significant interaction between MAP SD and the lack
337
+ of hypertension history on futile recanalization (P<0.05) was observed. CONCLUSIONS:
338
+ Among recanalized acute BAO ischemic patients, higher blood pressure variability
339
+ during the first 24 h after MT was associated with worse outcomes. This association
340
+ was stronger in patients without a history of hypertension.'
341
+ ---
342
+
343
+ # SentenceTransformer
344
+
345
+ This is a [sentence-transformers](https://www.SBERT.net) model trained on the parquet dataset. It maps sentences & paragraphs to a 512-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
346
+
347
+ ## Model Details
348
+
349
+ ### Model Description
350
+ - **Model Type:** Sentence Transformer
351
+ <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
352
+ - **Maximum Sequence Length:** 512 tokens
353
+ - **Output Dimensionality:** 512 tokens
354
+ - **Similarity Function:** Cosine Similarity
355
+ - **Training Dataset:**
356
+ - parquet
357
+ <!-- - **Language:** Unknown -->
358
+ <!-- - **License:** Unknown -->
359
+
360
+ ### Model Sources
361
+
362
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
363
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
364
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
365
+
366
+ ### Full Model Architecture
367
+
368
+ ```
369
+ SentenceTransformer(
370
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
371
+ (1): Pooling({'word_embedding_dimension': 512, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
372
+ )
373
+ ```
374
+
375
+ ## Usage
376
+
377
+ ### Direct Usage (Sentence Transformers)
378
+
379
+ First install the Sentence Transformers library:
380
+
381
+ ```bash
382
+ pip install -U sentence-transformers
383
+ ```
384
+
385
+ Then you can load this model and run inference.
386
+ ```python
387
+ from sentence_transformers import SentenceTransformer
388
+
389
+ # Download from the 🤗 Hub
390
+ model = SentenceTransformer("pankajrajdeo/Bioformer-8L-UMLS-Pubmed-ST-TCE-Epoch-1")
391
+ # Run inference
392
+ sentences = [
393
+ '[YEAR_RANGE] 2020-2024 [TEXT] Intraoperative Monitoring of the External Urethral Sphincter Reflex: A Novel Adjunct to Bulbocavernosus Reflex Neuromonitoring for Protecting the Sacral Neural Pathways Responsible for Urination, Defecation and Sexual Function.',
394
+ '[YEAR_RANGE] 2020-2024 [TEXT] PURPOSE: Intraoperative bulbocavernosus reflex neuromonitoring has been utilized to protect bowel, bladder, and sexual function, providing a continuous functional assessment of the somatic sacral nervous system during surgeries where it is at risk. Bulbocavernosus reflex data may also provide additional functional insight, including an evaluation for spinal shock, distinguishing upper versus lower motor neuron injury (conus versus cauda syndromes) and prognosis for postoperative bowel and bladder function. Continuous intraoperative bulbocavernosus reflex monitoring has been utilized to provide the surgeon with an ongoing functional assessment of the anatomical elements involved in the S2-S4 mediated reflex arc including the conus, cauda equina and pudendal nerves. Intraoperative bulbocavernosus reflex monitoring typically includes the electrical activation of the dorsal nerves of the genitals to initiate the afferent component of the reflex, followed by recording the resulting muscle response using needle electromyography recordings from the external anal sphincter. METHODS: Herein we describe a complementary and novel technique that includes recording electromyography responses from the external urethral sphincter to monitor the external urethral sphincter reflex. Specialized foley catheters embedded with recording electrodes have recently become commercially available that provide the ability to perform intraoperative external urethral sphincter muscle recordings. RESULTS: We describe technical details and the potential utility of incorporating external urethral sphincter reflex recordings into existing sacral neuromonitoring paradigms to provide redundant yet complementary data streams. CONCLUSIONS: We present two illustrative neurosurgical oncology cases to demonstrate the utility of the external urethral sphincter reflex technique in the setting of the necessary surgical sacrifice of sacral nerve roots.',
395
+ '[YEAR_RANGE] 2020-2024 [TEXT] Early menarche has been associated with adverse health outcomes, such as depressive symptoms. Discovering effect modifiers across these conditions in the pediatric population is a constant challenge. We tested whether movement behaviours modified the effect of the association between early menarche and depression symptoms among adolescents. This cross-sectional study included 2031 females aged 15-19 years across all Brazilian geographic regions. Data were collected using a self-administered questionnaire; 30.5% (n = 620) reported having experienced menarche before age 12 years (that is, early menarche). We used the Patient Health Questionnaire (PHQ-9) to evaluate depressive symptoms. Accruing any moderate-vigorous physical activity during leisure time, limited recreational screen time, and having good sleep quality were the exposures investigated. Adolescents who experienced early menarche and met one (B: -4.45, 95% CI: (-5.38, -3.51)), two (B: -6.07 (-7.02, -5.12)), or three (B: -6.49 (-7.76, -5.21)), and adolescents who experienced not early menarche and met one (B: -5.33 (-6.20; -4.46)), two (B: -6.12 (-6.99; -5.24)), or three (B: -6.27 (-7.30; -5.24)) of the movement behaviour targets had lower PHQ-9 scores for depression symptoms than adolescents who experienced early menarche and did not meet any of the movement behaviours. The disparities in depressive symptoms among the adolescents (early menarche versus not early menarche) who adhered to all three target behaviours were not statistically significant (B: 0.41 (-0.19; 1.01)). Adherence to movement behaviours modified the effect of the association between early menarche and depression symptoms.',
396
+ ]
397
+ embeddings = model.encode(sentences)
398
+ print(embeddings.shape)
399
+ # [3, 512]
400
+
401
+ # Get the similarity scores for the embeddings
402
+ similarities = model.similarity(embeddings, embeddings)
403
+ print(similarities.shape)
404
+ # [3, 3]
405
+ ```
406
+
407
+ <!--
408
+ ### Direct Usage (Transformers)
409
+
410
+ <details><summary>Click to see the direct usage in Transformers</summary>
411
+
412
+ </details>
413
+ -->
414
+
415
+ <!--
416
+ ### Downstream Usage (Sentence Transformers)
417
+
418
+ You can finetune this model on your own dataset.
419
+
420
+ <details><summary>Click to expand</summary>
421
+
422
+ </details>
423
+ -->
424
+
425
+ <!--
426
+ ### Out-of-Scope Use
427
+
428
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
429
+ -->
430
+
431
+ <!--
432
+ ## Bias, Risks and Limitations
433
+
434
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
435
+ -->
436
+
437
+ <!--
438
+ ### Recommendations
439
+
440
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
441
+ -->
442
+
443
+ ## Training Details
444
+
445
+ ### Training Dataset
446
+
447
+ #### parquet
448
+
449
+ * Dataset: parquet
450
+ * Size: 26,147,930 training samples
451
+ * Columns: <code>anchor</code> and <code>positive</code>
452
+ * Approximate statistics based on the first 1000 samples:
453
+ | | anchor | positive |
454
+ |:--------|:------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
455
+ | type | string | string |
456
+ | details | <ul><li>min: 16 tokens</li><li>mean: 45.85 tokens</li><li>max: 137 tokens</li></ul> | <ul><li>min: 31 tokens</li><li>mean: 268.19 tokens</li><li>max: 512 tokens</li></ul> |
457
+ * Samples:
458
+ | anchor | positive |
459
+ |:---------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
460
+ | <code>[YEAR_RANGE] 1880-1884 [TEXT] ADDRESS OF COL. GARRICK MALLERY, U. S. ARMY.</code> | <code>[YEAR_RANGE] 1880-1884 [TEXT] It may be conceded that after man had all his present faculties, he did not choose between the adoption of voice and gesture, and never with those faculties, was in a state where the one was used, to the absolute exclusion of the other. The epoch, however, to which our speculations relate is that in which he had not reached the present symmetric development of his intellect and of his bodily organs, and the inquiry is: Which mode of communication was earliest adopted to his single wants and informed intelligence? With the voice he could imitate distinictively but few sounds of nature, while with gesture he could exhibit actions, motions, positions, forms, dimensions, directions and distances, with their derivations and analogues. It would seem from this unequal division of capacity that oral speech remained rudimentary long after gesture had become an efficient mode of communication. With due allowance for all purely imitative sounds, and for the spontaneous action of vocal organs under excitement, it appears that the connection between ideas and words is only to be explained by a compact between speaker and hearer which supposes the existence of a prior mode of communication. This was probably by gesture. At least we may accept it as a clew leading out of the labyrinth of philological confusion, and regulating the immemorial quest of man's primitive speech.</code> |
461
+ | <code>[YEAR_RANGE] 1880-1884 [TEXT] How TO OBTAIN THE BRAIN OF THE CAT.</code> | <code>[YEAR_RANGE] 1880-1884 [TEXT] How to obtain the Brain of the Cat, (Wilder).-Correction: Page 158, second column, line 7, "grains," should be "grams;" page 159, near middle of 2nd column, "successily," should be "successively;" page 161, the number of Flower's paper is 3.</code> |
462
+ | <code>[YEAR_RANGE] 1880-1884 [TEXT] DOLBEAR ON THE NATURE AND CONSTITUTION OF MATTER.</code> | <code>[YEAR_RANGE] 1880-1884 [TEXT] Mr. Dopp desires to make the following correction in his paper in the last issue: "In my article on page 200 of "Science", the expression and should have been and being the velocity of light.</code> |
463
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
464
+ ```json
465
+ {
466
+ "scale": 20.0,
467
+ "similarity_fct": "cos_sim"
468
+ }
469
+ ```
470
+
471
+ ### Evaluation Dataset
472
+
473
+ #### parquet
474
+
475
+ * Dataset: parquet
476
+ * Size: 26,147,930 evaluation samples
477
+ * Columns: <code>anchor</code> and <code>positive</code>
478
+ * Approximate statistics based on the first 1000 samples:
479
+ | | anchor | positive |
480
+ |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
481
+ | type | string | string |
482
+ | details | <ul><li>min: 15 tokens</li><li>mean: 31.78 tokens</li><li>max: 78 tokens</li></ul> | <ul><li>min: 16 tokens</li><li>mean: 299.97 tokens</li><li>max: 512 tokens</li></ul> |
483
+ * Samples:
484
+ | anchor | positive |
485
+ |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
486
+ | <code>[YEAR_RANGE] 2020-2024 [TEXT] Solubility and thermodynamics of mesalazine in aqueous mixtures of poly ethylene glycol 200/600 at 293.2-313.2K.</code> | <code>[YEAR_RANGE] 2020-2024 [TEXT] In this study, the solubility of mesalazine was investigated in binary solvent mixtures of poly ethylene glycols 200/600 and water at temperatures ranging from 293.2K to 313.2K. The solubility of mesalazine was determined using a shake-flask method, and its concentrations were measured using a UV-Vis spectrophotometer. The obtained solubility data were analyzed using mathematical models including the van't Hoff, Jouyban-Acree, Jouyban-Acree-van't Hoff, mixture response surface, and modified Wilson models. The experimental data obtained for mesalazine dissolution encompassed various thermodynamic properties, including ΔG°, ΔH°, ΔS°, and TΔS°. These properties offer valuable insights into the energetic aspects of the dissolution process and were calculated based on the van't Hoff equation.</code> |
487
+ | <code>[YEAR_RANGE] 2020-2024 [TEXT] Safety and efficacy of remimazolam versus propofol during EUS: a multicenter randomized controlled study.</code> | <code>[YEAR_RANGE] 2020-2024 [TEXT] BACKGROUND AND AIMS: Propofol, a widely used sedative in GI endoscopic procedures, is associated with cardiorespiratory suppression. Remimazolam is a novel ultrashort-acting benzodiazepine sedative with rapid onset and minimal cardiorespiratory depression. This study compared the safety and efficacy of remimazolam and propofol during EUS procedures. METHODS: A multicenter randomized controlled study was conducted between October 2022 and March 2023 in patients who underwent EUS procedures. Patients were randomly assigned to receive either remimazolam or propofol as a sedative agent. The primary endpoint was cardiorespiratory adverse events.</code> |
488
+ | <code>[YEAR_RANGE] 2020-2024 [TEXT] Ultrasound-Guided Vs Non-Guided Prolotherapy for Internal Derangement of Temporomandibular Joint. A Randomized Clinical Trial.</code> | <code>[YEAR_RANGE] 2020-2024 [TEXT] OBJECTIVES: This randomized clinical trial study aims to compare ultrasound-guided versus non-guided Dextrose 10% injections in patients suffering from internal derangement in the temporomandibular joint (TMJ). MATERIAL AND METHODS: The study population included 22 patients and 43 TMJs suffering from unilateral or bilateral TMJ painful clicking, magnetic resonance imaging (MRI) proved disc displacement with reduction (DDWR), refractory to or failed conservative treatment. The patients were divided randomly into two groups (non-guided and ultrasound (US)-guided groups). The procedure involved injection of 2 mL solution of a mixture of 0.75 mL 0.9% normal saline solution, 0.3 mL 2% lidocaine and 0.75 mL dextrose 10% using a 25G needle in the joint and 1 mL intramuscular injection to the masseter muscle at the most tender point. The Visual Analogue Score (VAS) was used to compare joint pain intensity over four different periods, beginning with pre-injection, 1-, 2-, and 6-months postinjection. RESULTS: Twenty-two patients 5 males (n = 5/22, 22.7%) and 17 females (n = 17/22, 77.2%) were included in this study. The mean age was 27.3 ± 7.4 years (30.2 ± 7.0) for the non-guided group and 24.3 ± 6.9 for the US-guided group. The dextrose injection reduced intensity over time in both groups with statistically significant improvement (P value <.05) at 2 and 6 months in both groups. There was no statistically significant difference in VAS assessment between both groups. CONCLUSION: Intra-articular injection of dextrose 10% for patients with painful clicking and DDWR resulted in reduced pain intensity in both US-guided and non-guided groups with significant symptomatic improvement over time in both groups. US guidance allowed accurate anatomical localization and safe procedure with a single joint puncture.</code> |
489
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
490
+ ```json
491
+ {
492
+ "scale": 20.0,
493
+ "similarity_fct": "cos_sim"
494
+ }
495
+ ```
496
+
497
+ ### Training Hyperparameters
498
+ #### Non-Default Hyperparameters
499
+
500
+ - `eval_strategy`: steps
501
+ - `per_device_train_batch_size`: 128
502
+ - `learning_rate`: 2e-05
503
+ - `num_train_epochs`: 1
504
+ - `max_steps`: 194066
505
+ - `log_level`: info
506
+ - `fp16`: True
507
+ - `dataloader_num_workers`: 16
508
+ - `load_best_model_at_end`: True
509
+ - `resume_from_checkpoint`: True
510
+
511
+ #### All Hyperparameters
512
+ <details><summary>Click to expand</summary>
513
+
514
+ - `overwrite_output_dir`: False
515
+ - `do_predict`: False
516
+ - `eval_strategy`: steps
517
+ - `prediction_loss_only`: True
518
+ - `per_device_train_batch_size`: 128
519
+ - `per_device_eval_batch_size`: 8
520
+ - `per_gpu_train_batch_size`: None
521
+ - `per_gpu_eval_batch_size`: None
522
+ - `gradient_accumulation_steps`: 1
523
+ - `eval_accumulation_steps`: None
524
+ - `torch_empty_cache_steps`: None
525
+ - `learning_rate`: 2e-05
526
+ - `weight_decay`: 0.0
527
+ - `adam_beta1`: 0.9
528
+ - `adam_beta2`: 0.999
529
+ - `adam_epsilon`: 1e-08
530
+ - `max_grad_norm`: 1.0
531
+ - `num_train_epochs`: 1
532
+ - `max_steps`: 194066
533
+ - `lr_scheduler_type`: linear
534
+ - `lr_scheduler_kwargs`: {}
535
+ - `warmup_ratio`: 0.0
536
+ - `warmup_steps`: 0
537
+ - `log_level`: info
538
+ - `log_level_replica`: warning
539
+ - `log_on_each_node`: True
540
+ - `logging_nan_inf_filter`: True
541
+ - `save_safetensors`: True
542
+ - `save_on_each_node`: False
543
+ - `save_only_model`: False
544
+ - `restore_callback_states_from_checkpoint`: False
545
+ - `no_cuda`: False
546
+ - `use_cpu`: False
547
+ - `use_mps_device`: False
548
+ - `seed`: 42
549
+ - `data_seed`: None
550
+ - `jit_mode_eval`: False
551
+ - `use_ipex`: False
552
+ - `bf16`: False
553
+ - `fp16`: True
554
+ - `fp16_opt_level`: O1
555
+ - `half_precision_backend`: auto
556
+ - `bf16_full_eval`: False
557
+ - `fp16_full_eval`: False
558
+ - `tf32`: None
559
+ - `local_rank`: 0
560
+ - `ddp_backend`: None
561
+ - `tpu_num_cores`: None
562
+ - `tpu_metrics_debug`: False
563
+ - `debug`: []
564
+ - `dataloader_drop_last`: False
565
+ - `dataloader_num_workers`: 16
566
+ - `dataloader_prefetch_factor`: None
567
+ - `past_index`: -1
568
+ - `disable_tqdm`: False
569
+ - `remove_unused_columns`: True
570
+ - `label_names`: None
571
+ - `load_best_model_at_end`: True
572
+ - `ignore_data_skip`: False
573
+ - `fsdp`: []
574
+ - `fsdp_min_num_params`: 0
575
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
576
+ - `fsdp_transformer_layer_cls_to_wrap`: None
577
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
578
+ - `deepspeed`: None
579
+ - `label_smoothing_factor`: 0.0
580
+ - `optim`: adamw_torch
581
+ - `optim_args`: None
582
+ - `adafactor`: False
583
+ - `group_by_length`: False
584
+ - `length_column_name`: length
585
+ - `ddp_find_unused_parameters`: None
586
+ - `ddp_bucket_cap_mb`: None
587
+ - `ddp_broadcast_buffers`: False
588
+ - `dataloader_pin_memory`: True
589
+ - `dataloader_persistent_workers`: False
590
+ - `skip_memory_metrics`: True
591
+ - `use_legacy_prediction_loop`: False
592
+ - `push_to_hub`: False
593
+ - `resume_from_checkpoint`: True
594
+ - `hub_model_id`: None
595
+ - `hub_strategy`: every_save
596
+ - `hub_private_repo`: False
597
+ - `hub_always_push`: False
598
+ - `gradient_checkpointing`: False
599
+ - `gradient_checkpointing_kwargs`: None
600
+ - `include_inputs_for_metrics`: False
601
+ - `eval_do_concat_batches`: True
602
+ - `fp16_backend`: auto
603
+ - `push_to_hub_model_id`: None
604
+ - `push_to_hub_organization`: None
605
+ - `mp_parameters`:
606
+ - `auto_find_batch_size`: False
607
+ - `full_determinism`: False
608
+ - `torchdynamo`: None
609
+ - `ray_scope`: last
610
+ - `ddp_timeout`: 1800
611
+ - `torch_compile`: False
612
+ - `torch_compile_backend`: None
613
+ - `torch_compile_mode`: None
614
+ - `dispatch_batches`: None
615
+ - `split_batches`: None
616
+ - `include_tokens_per_second`: False
617
+ - `include_num_input_tokens_seen`: False
618
+ - `neftune_noise_alpha`: None
619
+ - `optim_target_modules`: None
620
+ - `batch_eval_metrics`: False
621
+ - `eval_on_start`: False
622
+ - `eval_use_gather_object`: False
623
+ - `batch_sampler`: batch_sampler
624
+ - `multi_dataset_batch_sampler`: proportional
625
+
626
+ </details>
627
+
628
+ ### Training Logs
629
+ <details><summary>Click to expand</summary>
630
+
631
+ | Epoch | Step | Training Loss | Validation Loss |
632
+ |:------:|:------:|:-------------:|:---------------:|
633
+ | 0.0000 | 1 | 5.6379 | - |
634
+ | 0.0052 | 1000 | 0.6085 | - |
635
+ | 0.0103 | 2000 | 0.1263 | - |
636
+ | 0.0155 | 3000 | 0.1246 | - |
637
+ | 0.0206 | 4000 | 0.1225 | - |
638
+ | 0.0258 | 5000 | 0.0909 | - |
639
+ | 0.0309 | 6000 | 0.1011 | - |
640
+ | 0.0361 | 7000 | 0.0879 | - |
641
+ | 0.0412 | 8000 | 0.0858 | - |
642
+ | 0.0464 | 9000 | 0.0854 | - |
643
+ | 0.0515 | 10000 | 0.1016 | - |
644
+ | 0.0567 | 11000 | 0.0919 | - |
645
+ | 0.0618 | 12000 | 0.066 | - |
646
+ | 0.0670 | 13000 | 0.0947 | - |
647
+ | 0.0721 | 14000 | 0.1017 | - |
648
+ | 0.0773 | 15000 | 0.0622 | - |
649
+ | 0.0824 | 16000 | 0.1529 | - |
650
+ | 0.0876 | 17000 | 0.065 | - |
651
+ | 0.0928 | 18000 | 0.104 | - |
652
+ | 0.0979 | 19000 | 0.0626 | - |
653
+ | 0.1031 | 20000 | 0.1228 | - |
654
+ | 0.1082 | 21000 | 0.0681 | - |
655
+ | 0.1134 | 22000 | 0.1211 | - |
656
+ | 0.1185 | 23000 | 0.0608 | - |
657
+ | 0.1237 | 24000 | 0.0733 | - |
658
+ | 0.1288 | 25000 | 0.0587 | - |
659
+ | 0.1340 | 26000 | 0.1249 | - |
660
+ | 0.1391 | 27000 | 0.0662 | - |
661
+ | 0.1443 | 28000 | 0.1328 | - |
662
+ | 0.1494 | 29000 | 0.0647 | - |
663
+ | 0.1546 | 30000 | 0.1371 | - |
664
+ | 0.1597 | 31000 | 0.0603 | - |
665
+ | 0.1649 | 32000 | 0.1005 | - |
666
+ | 0.1700 | 33000 | 0.1037 | - |
667
+ | 0.1752 | 34000 | 0.0688 | - |
668
+ | 0.1804 | 35000 | 0.138 | - |
669
+ | 0.1855 | 36000 | 0.0648 | - |
670
+ | 0.1907 | 37000 | 0.0709 | - |
671
+ | 0.1958 | 38000 | 0.1214 | - |
672
+ | 0.2010 | 39000 | 0.0728 | - |
673
+ | 0.2061 | 40000 | 0.1218 | - |
674
+ | 0.2113 | 41000 | 0.0804 | - |
675
+ | 0.2164 | 42000 | 0.0682 | - |
676
+ | 0.2216 | 43000 | 0.1122 | - |
677
+ | 0.2267 | 44000 | 0.0692 | - |
678
+ | 0.2319 | 45000 | 0.0777 | - |
679
+ | 0.2370 | 46000 | 0.0947 | - |
680
+ | 0.2422 | 47000 | 0.0623 | - |
681
+ | 0.2473 | 48000 | 0.0785 | - |
682
+ | 0.2525 | 49000 | 0.0862 | - |
683
+ | 0.2576 | 50000 | 0.0703 | - |
684
+ | 0.2628 | 51000 | 0.0614 | - |
685
+ | 0.2679 | 52000 | 0.063 | - |
686
+ | 0.2731 | 53000 | 0.086 | - |
687
+ | 0.2783 | 54000 | 0.0763 | - |
688
+ | 0.2834 | 55000 | 0.0536 | - |
689
+ | 0.2886 | 56000 | 0.0634 | - |
690
+ | 0.2937 | 57000 | 0.0598 | - |
691
+ | 0.2989 | 58000 | 0.063 | - |
692
+ | 0.3040 | 59000 | 0.057 | - |
693
+ | 0.3092 | 60000 | 0.0733 | - |
694
+ | 0.3143 | 61000 | 0.0447 | - |
695
+ | 0.3195 | 62000 | 0.0504 | - |
696
+ | 0.3246 | 63000 | 0.0558 | - |
697
+ | 0.3298 | 64000 | 0.0669 | - |
698
+ | 0.3349 | 65000 | 0.0664 | - |
699
+ | 0.3401 | 66000 | 0.0519 | - |
700
+ | 0.3452 | 67000 | 0.0396 | - |
701
+ | 0.3504 | 68000 | 0.0531 | - |
702
+ | 0.3555 | 69000 | 0.069 | - |
703
+ | 0.3607 | 70000 | 0.0793 | - |
704
+ | 0.3659 | 71000 | 0.0873 | - |
705
+ | 0.3710 | 72000 | 0.034 | - |
706
+ | 0.3762 | 73000 | 0.0428 | - |
707
+ | 0.3813 | 74000 | 0.0536 | - |
708
+ | 0.3865 | 75000 | 0.0433 | - |
709
+ | 0.3916 | 76000 | 0.1024 | - |
710
+ | 0.3968 | 77000 | 0.051 | - |
711
+ | 0.4019 | 78000 | 0.0672 | - |
712
+ | 0.4071 | 79000 | 0.0343 | - |
713
+ | 0.4122 | 80000 | 0.0597 | - |
714
+ | 0.4174 | 81000 | 0.074 | - |
715
+ | 0.4225 | 82000 | 0.0421 | - |
716
+ | 0.4277 | 83000 | 0.0463 | - |
717
+ | 0.4328 | 84000 | 0.0651 | - |
718
+ | 0.4380 | 85000 | 0.0292 | - |
719
+ | 0.4431 | 86000 | 0.0403 | - |
720
+ | 0.4483 | 87000 | 0.0459 | - |
721
+ | 0.4535 | 88000 | 0.0705 | - |
722
+ | 0.4586 | 89000 | 0.0577 | - |
723
+ | 0.4638 | 90000 | 0.0462 | - |
724
+ | 0.4689 | 91000 | 0.0342 | - |
725
+ | 0.4741 | 92000 | 0.0402 | - |
726
+ | 0.4792 | 93000 | 0.0472 | - |
727
+ | 0.4844 | 94000 | 0.0472 | - |
728
+ | 0.4895 | 95000 | 0.0843 | - |
729
+ | 0.4947 | 96000 | 0.031 | - |
730
+ | 0.4998 | 97000 | 0.0347 | - |
731
+ | 0.5050 | 98000 | 0.0326 | - |
732
+ | 0.5101 | 99000 | 0.0349 | - |
733
+ | 0.5153 | 100000 | 0.0276 | - |
734
+ | 0.5204 | 101000 | 0.0412 | - |
735
+ | 0.5256 | 102000 | 0.0445 | - |
736
+ | 0.5307 | 103000 | 0.0232 | - |
737
+ | 0.5359 | 104000 | 0.0301 | - |
738
+ | 0.5411 | 105000 | 0.0268 | - |
739
+ | 0.5462 | 106000 | 0.0215 | - |
740
+ | 0.5514 | 107000 | 0.0538 | - |
741
+ | 0.5565 | 108000 | 0.0475 | - |
742
+ | 0.5617 | 109000 | 0.0302 | - |
743
+ | 0.5668 | 110000 | 0.0397 | - |
744
+ | 0.5720 | 111000 | 0.0438 | - |
745
+ | 0.5771 | 112000 | 0.0416 | - |
746
+ | 0.5823 | 113000 | 0.0289 | - |
747
+ | 0.5874 | 114000 | 0.0374 | - |
748
+ | 0.5926 | 115000 | 0.0514 | - |
749
+ | 0.5977 | 116000 | 0.0321 | - |
750
+ | 0.6029 | 117000 | 0.0759 | - |
751
+ | 0.6080 | 118000 | 0.0433 | - |
752
+ | 0.6132 | 119000 | 0.0362 | - |
753
+ | 0.6183 | 120000 | 0.041 | - |
754
+ | 0.6235 | 121000 | 0.0306 | - |
755
+ | 0.6286 | 122000 | 0.0407 | - |
756
+ | 0.6338 | 123000 | 0.0393 | - |
757
+ | 0.6390 | 124000 | 0.0478 | - |
758
+ | 0.6441 | 125000 | 0.0429 | - |
759
+ | 0.6493 | 126000 | 0.0521 | - |
760
+ | 0.6544 | 127000 | 0.0442 | - |
761
+ | 0.6596 | 128000 | 0.0428 | - |
762
+ | 0.6647 | 129000 | 0.0346 | - |
763
+ | 0.6699 | 130000 | 0.027 | - |
764
+ | 0.6750 | 131000 | 0.0219 | - |
765
+ | 0.6802 | 132000 | 0.0417 | - |
766
+ | 0.6853 | 133000 | 0.0376 | - |
767
+ | 0.6905 | 134000 | 0.0341 | - |
768
+ | 0.6956 | 135000 | 0.0304 | - |
769
+ | 0.7008 | 136000 | 0.0375 | - |
770
+ | 0.7059 | 137000 | 0.0352 | - |
771
+ | 0.7111 | 138000 | 0.0413 | - |
772
+ | 0.7162 | 139000 | 0.0346 | - |
773
+ | 0.7214 | 140000 | 0.0437 | - |
774
+ | 0.7266 | 141000 | 0.0447 | - |
775
+ | 0.7317 | 142000 | 0.0335 | - |
776
+ | 0.7369 | 143000 | 0.0287 | - |
777
+ | 0.7420 | 144000 | 0.0271 | - |
778
+ | 0.7472 | 145000 | 0.0295 | - |
779
+ | 0.7523 | 146000 | 0.0238 | - |
780
+ | 0.7575 | 147000 | 0.03 | - |
781
+ | 0.7626 | 148000 | 0.0369 | - |
782
+ | 0.7678 | 149000 | 0.0351 | - |
783
+ | 0.7729 | 150000 | 0.0302 | - |
784
+ | 0.7781 | 151000 | 0.0369 | - |
785
+ | 0.7832 | 152000 | 0.0374 | - |
786
+ | 0.7884 | 153000 | 0.0298 | - |
787
+ | 0.7935 | 154000 | 0.0387 | - |
788
+ | 0.7987 | 155000 | 0.0344 | - |
789
+ | 0.8038 | 156000 | 0.0374 | - |
790
+ | 0.8090 | 157000 | 0.0338 | - |
791
+ | 0.8142 | 158000 | 0.0317 | - |
792
+ | 0.8193 | 159000 | 0.0378 | - |
793
+ | 0.8245 | 160000 | 0.0375 | - |
794
+ | 0.8296 | 161000 | 0.0276 | - |
795
+ | 0.8348 | 162000 | 0.0238 | - |
796
+ | 0.8399 | 163000 | 0.0178 | - |
797
+ | 0.8451 | 164000 | 0.0226 | - |
798
+ | 0.8502 | 165000 | 0.0233 | - |
799
+ | 0.8554 | 166000 | 0.0183 | - |
800
+ | 0.8605 | 167000 | 0.0253 | - |
801
+ | 0.8657 | 168000 | 0.0232 | - |
802
+ | 0.8708 | 169000 | 0.0252 | - |
803
+ | 0.8760 | 170000 | 0.0258 | - |
804
+ | 0.8811 | 171000 | 0.0256 | - |
805
+ | 0.8863 | 172000 | 0.0285 | - |
806
+ | 0.8914 | 173000 | 0.0279 | - |
807
+ | 0.8966 | 174000 | 0.0237 | - |
808
+ | 0.9018 | 175000 | 0.0277 | - |
809
+ | 0.9069 | 176000 | 0.0245 | - |
810
+ | 0.9121 | 177000 | 0.0282 | - |
811
+ | 0.9172 | 178000 | 0.0248 | - |
812
+ | 0.9224 | 179000 | 0.0212 | - |
813
+ | 0.9275 | 180000 | 0.0199 | - |
814
+ | 0.9327 | 181000 | 0.0236 | - |
815
+ | 0.9378 | 182000 | 0.0246 | - |
816
+ | 0.9430 | 183000 | 0.0225 | - |
817
+ | 0.9481 | 184000 | 0.0228 | - |
818
+ | 0.9533 | 185000 | 0.0235 | - |
819
+ | 0.9584 | 186000 | 0.0226 | - |
820
+ | 0.9636 | 187000 | 0.0201 | - |
821
+ | 0.9687 | 188000 | 0.0201 | - |
822
+ | 0.9739 | 189000 | 0.0221 | - |
823
+ | 0.9790 | 190000 | 0.022 | - |
824
+ | 0.9842 | 191000 | 0.0359 | - |
825
+ | 0.9893 | 192000 | 0.0242 | - |
826
+ | 0.9945 | 193000 | 0.0192 | - |
827
+ | 0.9997 | 194000 | 0.0208 | - |
828
+ | 1.0000 | 194066 | - | 0.0007 |
829
+
830
+ </details>
831
+
832
+ ### Framework Versions
833
+ - Python: 3.12.2
834
+ - Sentence Transformers: 3.2.1
835
+ - Transformers: 4.44.2
836
+ - PyTorch: 2.5.0
837
+ - Accelerate: 1.0.1
838
+ - Datasets: 3.0.2
839
+ - Tokenizers: 0.19.1
840
+
841
+ ## Citation
842
+
843
+ ### BibTeX
844
+
845
+ #### Sentence Transformers
846
+ ```bibtex
847
+ @inproceedings{reimers-2019-sentence-bert,
848
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
849
+ author = "Reimers, Nils and Gurevych, Iryna",
850
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
851
+ month = "11",
852
+ year = "2019",
853
+ publisher = "Association for Computational Linguistics",
854
+ url = "https://arxiv.org/abs/1908.10084",
855
+ }
856
+ ```
857
+
858
+ #### MultipleNegativesRankingLoss
859
+ ```bibtex
860
+ @misc{henderson2017efficient,
861
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
862
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
863
+ year={2017},
864
+ eprint={1705.00652},
865
+ archivePrefix={arXiv},
866
+ primaryClass={cs.CL}
867
+ }
868
+ ```
869
+
870
+ <!--
871
+ ## Glossary
872
+
873
+ *Clearly define terms in order to be accessible across audiences.*
874
+ -->
875
+
876
+ <!--
877
+ ## Model Card Authors
878
+
879
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
880
+ -->
881
+
882
+ <!--
883
+ ## Model Card Contact
884
+
885
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
886
+ -->
added_tokens.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "[TEXT]": 32768,
3
+ "[YEAR_RANGE]": 32769
4
+ }
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/data/aronow/pankaj/Embeddings/Bioformer-MNRL-finetuned/checkpoint-194066",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 512,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 2048,
13
+ "layer_norm_eps": 1e-12,
14
+ "max_position_embeddings": 512,
15
+ "model_type": "bert",
16
+ "num_attention_heads": 8,
17
+ "num_hidden_layers": 8,
18
+ "pad_token_id": 0,
19
+ "position_embedding_type": "absolute",
20
+ "torch_dtype": "float32",
21
+ "transformers_version": "4.45.2",
22
+ "type_vocab_size": 2,
23
+ "use_cache": true,
24
+ "vocab_size": 32770
25
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.1.1",
4
+ "transformers": "4.45.2",
5
+ "pytorch": "2.4.1+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:861e007351b4c3467558bf84e6a720cd7e8481c92ab3027b81affd540761e032
3
+ size 170111688
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "[TEXT]",
4
+ "[YEAR_RANGE]"
5
+ ],
6
+ "cls_token": {
7
+ "content": "[CLS]",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false
12
+ },
13
+ "mask_token": {
14
+ "content": "[MASK]",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false
19
+ },
20
+ "pad_token": {
21
+ "content": "[PAD]",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false
26
+ },
27
+ "sep_token": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false
33
+ },
34
+ "unk_token": {
35
+ "content": "[UNK]",
36
+ "lstrip": false,
37
+ "normalized": false,
38
+ "rstrip": false,
39
+ "single_word": false
40
+ }
41
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "32768": {
44
+ "content": "[TEXT]",
45
+ "lstrip": false,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ },
51
+ "32769": {
52
+ "content": "[YEAR_RANGE]",
53
+ "lstrip": false,
54
+ "normalized": false,
55
+ "rstrip": false,
56
+ "single_word": false,
57
+ "special": true
58
+ }
59
+ },
60
+ "additional_special_tokens": [
61
+ "[TEXT]",
62
+ "[YEAR_RANGE]"
63
+ ],
64
+ "clean_up_tokenization_spaces": true,
65
+ "cls_token": "[CLS]",
66
+ "do_basic_tokenize": true,
67
+ "do_lower_case": false,
68
+ "mask_token": "[MASK]",
69
+ "max_length": 512,
70
+ "model_max_length": 512,
71
+ "never_split": null,
72
+ "pad_to_multiple_of": null,
73
+ "pad_token": "[PAD]",
74
+ "pad_token_type_id": 0,
75
+ "padding_side": "right",
76
+ "sep_token": "[SEP]",
77
+ "stride": 0,
78
+ "strip_accents": null,
79
+ "tokenize_chinese_chars": true,
80
+ "tokenizer_class": "BertTokenizer",
81
+ "truncation_side": "right",
82
+ "truncation_strategy": "longest_first",
83
+ "unk_token": "[UNK]"
84
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff