Add evaluation results on the default config of billsum
Browse filesBeep boop, I am a bot from Hugging Face's automatic model evaluator 👋!\
Your model has been evaluated on the default config of the [billsum](https://huggingface.co/datasets/billsum) dataset by
@pszemraj
, using the predictions stored [here](https://huggingface.co/datasets/autoevaluate/autoeval-staging-eval-project-billsum-a6bd4aa5-11965601).\
Accept this pull request to see the results displayed on the [Hub leaderboard](https://huggingface.co/spaces/autoevaluate/leaderboards?dataset=billsum).\
Evaluate your model on more datasets [here](https://huggingface.co/spaces/autoevaluate/model-evaluator?dataset=billsum).
README.md
CHANGED
@@ -58,7 +58,55 @@ widget:
|
|
58 |
\ and parameters 0, and generalization is influenced by the inductive bias of\
|
59 |
\ this function space (Section 5)."
|
60 |
example_title: scientific paper
|
61 |
-
- text:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
62 |
example_title: transcribed audio - lecture
|
63 |
- text: "Transformer-based models have shown to be very useful for many NLP tasks.\
|
64 |
\ However, a major limitation of transformers-based models is its O(n^2)O(n 2)\
|
@@ -267,6 +315,39 @@ model-index:
|
|
267 |
type: gen_len
|
268 |
value: 82.2177
|
269 |
verified: true
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
270 |
---
|
271 |
|
272 |
# long-t5-tglobal-base-16384 + BookSum
|
|
|
58 |
\ and parameters 0, and generalization is influenced by the inductive bias of\
|
59 |
\ this function space (Section 5)."
|
60 |
example_title: scientific paper
|
61 |
+
- text: 'Is a else or outside the cob and tree written being of early client rope
|
62 |
+
and you have is for good reasons. On to the ocean in Orange for time. By''s the
|
63 |
+
aggregate we can bed it yet. Why this please pick up on a sort is do and also
|
64 |
+
M Getoi''s nerocos and do rain become you to let so is his brother is made in
|
65 |
+
use and Mjulia''s''s the lay major is aging Masastup coin present sea only of
|
66 |
+
Oosii rooms set to you We do er do we easy this private oliiishs lonthen might
|
67 |
+
be okay. Good afternoon everybody. Welcome to this lecture of Computational Statistics.
|
68 |
+
As you can see, I''m not socially my name is Michael Zelinger. I''m one of the
|
69 |
+
task for this class and you might have already seen me in the first lecture where
|
70 |
+
I made a quick appearance. I''m also going to give the tortillas in the last third
|
71 |
+
of this course. So to give you a little bit about me, I''m a old student here
|
72 |
+
with better Bulman and my research centres on casual inference applied to biomedical
|
73 |
+
disasters, so that could be genomics or that could be hospital data. If any of
|
74 |
+
you is interested in writing a bachelor thesis, a semester paper may be mastathesis
|
75 |
+
about this topic feel for reach out to me. you have my name on models and my email
|
76 |
+
address you can find in the directory I''d Be very happy to talk about it. you
|
77 |
+
do not need to be sure about it, we can just have a chat. So with that said, let''s
|
78 |
+
get on with the lecture. There''s an exciting topic today I''m going to start
|
79 |
+
by sharing some slides with you and later on during the lecture we''ll move to
|
80 |
+
the paper. So bear with me for a few seconds. Well, the projector is starting
|
81 |
+
up. Okay, so let''s get started. Today''s topic is a very important one. It''s
|
82 |
+
about a technique which really forms one of the fundamentals of data science,
|
83 |
+
machine learning, and any sort of modern statistics. It''s called cross validation.
|
84 |
+
I know you really want to understand this topic I Want you to understand this
|
85 |
+
and frankly, nobody''s gonna leave Professor Mineshousen''s class without understanding
|
86 |
+
cross validation. So to set the stage for this, I Want to introduce you to the
|
87 |
+
validation problem in computational statistics. So the problem is the following:
|
88 |
+
You trained a model on available data. You fitted your model, but you know the
|
89 |
+
training data you got could always have been different and some data from the
|
90 |
+
environment. Maybe it''s a random process. You do not really know what it is,
|
91 |
+
but you know that somebody else who gets a different batch of data from the same
|
92 |
+
environment they would get slightly different training data and you do not care
|
93 |
+
that your method performs as well. On this training data. you want to to perform
|
94 |
+
well on other data that you have not seen other data from the same environment.
|
95 |
+
So in other words, the validation problem is you want to quantify the performance
|
96 |
+
of your model on data that you have not seen. So how is this even possible? How
|
97 |
+
could you possibly measure the performance on data that you do not know The solution
|
98 |
+
to? This is the following realization is that given that you have a bunch of data,
|
99 |
+
you were in charge. You get to control how much that your model sees. It works
|
100 |
+
in the following way: You can hide data firms model. Let''s say you have a training
|
101 |
+
data set which is a bunch of doubtless so X eyes are the features those are typically
|
102 |
+
hide and national vector. It''s got more than one dimension for sure. And the
|
103 |
+
why why eyes. Those are the labels for supervised learning. As you''ve seen before,
|
104 |
+
it''s the same set up as we have in regression. And so you have this training
|
105 |
+
data and now you choose that you only use some of those data to fit your model.
|
106 |
+
You''re not going to use everything, you only use some of it the other part you
|
107 |
+
hide from your model. And then you can use this hidden data to do validation from
|
108 |
+
the point of you of your model. This hidden data is complete by unseen. In other
|
109 |
+
words, we solve our problem of validation.'
|
110 |
example_title: transcribed audio - lecture
|
111 |
- text: "Transformer-based models have shown to be very useful for many NLP tasks.\
|
112 |
\ However, a major limitation of transformers-based models is its O(n^2)O(n 2)\
|
|
|
315 |
type: gen_len
|
316 |
value: 82.2177
|
317 |
verified: true
|
318 |
+
- task:
|
319 |
+
type: summarization
|
320 |
+
name: Summarization
|
321 |
+
dataset:
|
322 |
+
name: billsum
|
323 |
+
type: billsum
|
324 |
+
config: default
|
325 |
+
split: test
|
326 |
+
metrics:
|
327 |
+
- name: ROUGE-1
|
328 |
+
type: rouge
|
329 |
+
value: 39.6378
|
330 |
+
verified: true
|
331 |
+
- name: ROUGE-2
|
332 |
+
type: rouge
|
333 |
+
value: 13.0017
|
334 |
+
verified: true
|
335 |
+
- name: ROUGE-L
|
336 |
+
type: rouge
|
337 |
+
value: 23.0255
|
338 |
+
verified: true
|
339 |
+
- name: ROUGE-LSUM
|
340 |
+
type: rouge
|
341 |
+
value: 32.9943
|
342 |
+
verified: true
|
343 |
+
- name: loss
|
344 |
+
type: loss
|
345 |
+
value: 1.9428048133850098
|
346 |
+
verified: true
|
347 |
+
- name: gen_len
|
348 |
+
type: gen_len
|
349 |
+
value: 162.3588
|
350 |
+
verified: true
|
351 |
---
|
352 |
|
353 |
# long-t5-tglobal-base-16384 + BookSum
|