rrivera1849 commited on
Commit
2441c62
·
1 Parent(s): b188778

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +64 -0
README.md CHANGED
@@ -1,3 +1,67 @@
1
  ---
2
  license: apache-2.0
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - en
5
  ---
6
+
7
+ # rrivera1849/LUAR-CRUD
8
+
9
+ Author Style Representations using [LUAR](https://aclanthology.org/2021.emnlp-main.70.pdf).
10
+
11
+ The LUAR training and evaluation repository can be found [here](https://github.com/llnl/luar).
12
+
13
+ This model was trained on the Reddit Million User Dataset (MUD) found [here](https://aclanthology.org/2021.naacl-main.415.pdf).
14
+
15
+ ## Usage
16
+
17
+ ```python
18
+ from transformers import AutoModel, AutoTokenizer
19
+
20
+ tokenizer = AutoTokenizer.from_pretrained("rrivera1849/LUAR-CRUD")
21
+ model = AutoModel.from_pretrained("rrivera1849/LUAR-CRUD")
22
+
23
+ # we embed `episodes`, a colletion of documents presumed to come from an author
24
+ # NOTE: make sure that `episode_length` consistent across `episode`
25
+ batch_size = 3
26
+ episode_length = 16
27
+ text = [
28
+ ["Foo"] * episode_length,
29
+ ["Bar"] * episode_length,
30
+ ["Zoo"] * episode_length,
31
+ ]
32
+ text = [j for i in text for j in i]
33
+ tokenized_text = tokenizer(
34
+ text,
35
+ max_length=32,
36
+ padding="max_length",
37
+ truncation=True,
38
+ return_tensors="pt"
39
+ )
40
+ # inputs size: (batch_size, episode_length, max_token_length)
41
+ tokenized_text["input_ids"] = tokenized_text["input_ids"].reshape(batch_size, episode_length, -1)
42
+ tokenized_text["attention_mask"] = tokenized_text["attention_mask"].reshape(batch_size, episode_length, -1)
43
+ print(tokenized_text["input_ids"].size()) # torch.Size([3, 16, 32])
44
+ print(tokenized_text["attention_mask"].size()) # torch.Size([3, 16, 32])
45
+
46
+ out = model(**tokenized_text)
47
+ print(out.size()) # torch.Size([3, 512])
48
+ ```
49
+
50
+ ## Citing & Authors
51
+
52
+ If you find this model helpful, feel free to cite our [publication](https://aclanthology.org/2021.emnlp-main.70.pdf).
53
+
54
+ ```
55
+ @inproceedings{uar-emnlp2021,
56
+ author = {Rafael A. Rivera Soto and Olivia Miano and Juanita Ordonez and Barry Chen and Aleem Khan and Marcus Bishop and Nicholas Andrews},
57
+ title = {Learning Universal Authorship Representations},
58
+ booktitle = {EMNLP},
59
+ year = {2021},
60
+ }
61
+ ```
62
+
63
+ ## License
64
+
65
+ LUAR is distributed under the terms of the Apache License (Version 2.0).
66
+
67
+ All new contributions must be made under the Apache-2.0 licenses.