Update README.md
Browse files
README.md
CHANGED
@@ -24,24 +24,24 @@ datasets:
|
|
24 |
- wikipedia
|
25 |
- bookcorpus
|
26 |
tags:
|
27 |
-
-
|
28 |
- astrophysics
|
|
|
|
|
29 |
- high-energy physics (HEP)
|
30 |
- history of science
|
31 |
-
-
|
32 |
- sociology of science
|
|
|
|
|
33 |
- word embeddings
|
34 |
-
- semantic shift detection
|
35 |
-
- conceptual change
|
36 |
-
- epistemic change
|
37 |
-
- arXiv
|
38 |
---
|
39 |
|
40 |
# Model Card for Astro-HEP-BERT
|
41 |
|
42 |
**Astro-HEP-BERT** is a bidirectional transformer designed primarily to generate contextualized word embeddings for computational conceptual analysis in astrophysics and high-energy physics (HEP). Built upon Google's `bert-base-uncased`, the model underwent additional training for three epochs using 21.84 million paragraphs found in more than 600,000 scholarly articles sourced from arXiv, all pertaining to astrophysics and/or high-energy physics (HEP). The sole training objective was masked language modeling.
|
43 |
|
44 |
-
The Astro-HEP-BERT project
|
45 |
|
46 |
For further insights into the model, the corpus, and the underlying research project (<a target="_blank" rel="noopener noreferrer" href="https://doi.org/10.3030/101044932" >Network Epistemology in Practice</a>) please refer to the Astro-HEP-BERT paper [link coming soon].
|
47 |
|
|
|
24 |
- wikipedia
|
25 |
- bookcorpus
|
26 |
tags:
|
27 |
+
- arXiv
|
28 |
- astrophysics
|
29 |
+
- conceptual analysis
|
30 |
+
- epistemic change
|
31 |
- high-energy physics (HEP)
|
32 |
- history of science
|
33 |
+
- semantic shift detection
|
34 |
- sociology of science
|
35 |
+
- philosophy of science
|
36 |
+
- physics
|
37 |
- word embeddings
|
|
|
|
|
|
|
|
|
38 |
---
|
39 |
|
40 |
# Model Card for Astro-HEP-BERT
|
41 |
|
42 |
**Astro-HEP-BERT** is a bidirectional transformer designed primarily to generate contextualized word embeddings for computational conceptual analysis in astrophysics and high-energy physics (HEP). Built upon Google's `bert-base-uncased`, the model underwent additional training for three epochs using 21.84 million paragraphs found in more than 600,000 scholarly articles sourced from arXiv, all pertaining to astrophysics and/or high-energy physics (HEP). The sole training objective was masked language modeling.
|
43 |
|
44 |
+
The Astro-HEP-BERT project demonstrates the general feasibility of training a customized bidirectional transformer for computational conceptual analysis in the history, philosophy, and sociology of science as an open-source endeavor that does not require a substantial budget. Leveraging only freely available code, weights, and text inputs, the entire training process was conducted on a single MacBook Pro Laptop (M2/96GB).
|
45 |
|
46 |
For further insights into the model, the corpus, and the underlying research project (<a target="_blank" rel="noopener noreferrer" href="https://doi.org/10.3030/101044932" >Network Epistemology in Practice</a>) please refer to the Astro-HEP-BERT paper [link coming soon].
|
47 |
|