File size: 29,194 Bytes
a332df1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
import streamlit as st

# Page configuration
st.set_page_config(
    layout="wide", 
    initial_sidebar_state="auto"
)

# Custom CSS for better styling
st.markdown("""

    <style>

        .main-title {

            font-size: 36px;

            color: #4A90E2;

            font-weight: bold;

            text-align: center;

        }

        .sub-title {

            font-size: 24px;

            color: #4A90E2;

            margin-top: 20px;

        }

        .section {

            background-color: #f9f9f9;

            padding: 15px;

            border-radius: 10px;

            margin-top: 20px;

        }

        .section h2 {

            font-size: 22px;

            color: #4A90E2;

        }

        .section p, .section ul {

            color: #666666;

        }

        .link {

            color: #4A90E2;

            text-decoration: none;

        }

        .benchmark-table {

            width: 100%;

            border-collapse: collapse;

            margin-top: 20px;

        }

        .benchmark-table th, .benchmark-table td {

            border: 1px solid #ddd;

            padding: 8px;

            text-align: left;

        }

        .benchmark-table th {

            background-color: #4A90E2;

            color: white;

        }

        .benchmark-table td {

            background-color: #f2f2f2;

        }

    </style>

""", unsafe_allow_html=True)

# Title
st.markdown('<div class="main-title">Introduction to RoBERTa Annotators in Spark NLP</div>', unsafe_allow_html=True)

# Subtitle
st.markdown("""

<div class="section">

    <p>RoBERTa (A Robustly Optimized BERT Pretraining Approach) builds on BERT's language model by modifying key hyperparameters and pretraining techniques to enhance its performance. RoBERTa achieves state-of-the-art results in various NLP tasks. Below, we provide an overview of the RoBERTa annotator for token classification, zero-shot classification, and sequence classification:</p>

</div>

""", unsafe_allow_html=True)

tab1, tab2, tab3, tab4 = st.tabs(["RoBERTa for Token Classification", "RoBERTa for Zero Shot Classification", "RoBERTa for Sequence Classification", "RoBERTa for Question Answering"])

with tab1:
    st.markdown("""

    <div class="section">

        <h2>RoBERTa for Token Classification</h2>

        <p>The <strong>RoBertaForTokenClassification</strong> annotator is designed for Named Entity Recognition (NER) tasks using the RoBERTa model. This pretrained model is adapted from a Hugging Face model and imported into Spark NLP, offering robust performance in identifying and classifying entities in text. The RoBERTa model, with its large-scale pretraining, delivers state-of-the-art results on NER tasks.</p>

        <p>Token classification with RoBERTa enables:</p>

        <ul>

            <li><strong>Named Entity Recognition (NER):</strong> Identifying and classifying entities such as miscellaneous (MISC), organizations (ORG), locations (LOC), and persons (PER).</li>

            <li><strong>Information Extraction:</strong> Extracting key information from unstructured text for further analysis.</li>

            <li><strong>Text Categorization:</strong> Enhancing document retrieval and categorization based on entity recognition.</li>

        </ul>

        <p>Here is an example of how RoBERTa token classification works:</p>

        <table class="benchmark-table">

            <tr>

                <th>Entity</th>

                <th>Label</th>

            </tr>

            <tr>

                <td>Apple</td>

                <td>ORG</td>

            </tr>

            <tr>

                <td>Elon Musk</td>

                <td>PER</td>

            </tr>

            <tr>

                <td>California</td>

                <td>LOC</td>

            </tr>

        </table>

    </div>

    """, unsafe_allow_html=True)

    # RoBERTa Token Classification - NER Large
    st.markdown('<div class="sub-title">RoBERTa Token Classification - NER Large</div>', unsafe_allow_html=True)
    st.markdown("""

    <div class="section">

        <p>The <strong>roberta_ner_roberta_large_ner_english</strong> is a fine-tuned RoBERTa model for token classification tasks, specifically adapted for Named Entity Recognition (NER) on English text. It recognizes four types of entities: location (LOC), organizations (ORG), person (PER), and Miscellaneous (MISC).</p>

    </div>

    """, unsafe_allow_html=True)

    # How to Use the Model - Token Classification
    st.markdown('<div class="sub-title">How to Use the Model</div>', unsafe_allow_html=True)
    st.code('''

    from sparknlp.base import *

    from sparknlp.annotator import *

    from pyspark.ml import Pipeline

    from pyspark.sql.functions import col, expr



    document_assembler = DocumentAssembler() \\

        .setInputCol("text") \\

        .setOutputCol("document")



    sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\\

        .setInputCols(["document"])\\

        .setOutputCol("sentence")



    tokenizer = Tokenizer() \\

        .setInputCols(["sentence"]) \\

        .setOutputCol("token")



    tokenClassifier = RoBertaForTokenClassification \\

        .pretrained("roberta_ner_roberta_large_ner_english", "en") \\

        .setInputCols(["sentence", "token"]) \\

        .setOutputCol("ner")



    ner_converter = NerConverter() \\

        .setInputCols(['sentence', 'token', 'ner']) \\

        .setOutputCol('entities')



    pipeline = Pipeline(stages=[

        document_assembler,

        sentenceDetector,

        tokenizer,

        tokenClassifier,

        ner_converter

    ])



    data = spark.createDataFrame([["William Henry Gates III (born October 28, 1955) is an American business magnate, software developer, investor, and philanthropist. He is best known as the co-founder of Microsoft Corporation. During his career at Microsoft, Gates held the positions of chairman, chief executive officer (CEO), president and chief software architect, while also being the largest individual shareholder until May 2014. He is one of the best-known entrepreneurs and pioneers of the microcomputer revolution of the 1970s and 1980s. Born and raised in Seattle, Washington, Gates co-founded Microsoft with childhood friend Paul Allen in 1975, in Albuquerque, New Mexico; it went on to become the world's largest personal computer software company. Gates led the company as chairman and CEO until stepping down as CEO in January 2000, but he remained chairman and became chief software architect. During the late 1990s, Gates had been criticized for his business tactics, which have been considered anti-competitive. This opinion has been upheld by numerous court rulings. In June 2006, Gates announced that he would be transitioning to a part-time role at Microsoft and full-time work at the Bill & Melinda Gates Foundation, the private charitable foundation that he and his wife, Melinda Gates, established in 2000.[9] He gradually transferred his duties to Ray Ozzie and Craig Mundie. He stepped down as chairman of Microsoft in February 2014 and assumed a new post as technology adviser to support the newly appointed CEO Satya Nadella."]]).toDF("text")

    result = pipeline.fit(data).transform(data)

            

    result.select(

        expr("explode(entities) as ner_chunk")

    ).select(

        col("ner_chunk.result").alias("chunk"),

        col("ner_chunk.metadata.entity").alias("ner_label")

    ).show(truncate=False)

    ''', language='python')

    # Results
    st.text("""

    +-------------------------------+---------+

    |chunk                          |ner_label|

    +-------------------------------+---------+

    |William Henry Gates III        |R        |

    |American                       |SC       |

    |Microsoft Corporation          |G        |

    |Microsoft                      |G        |

    |Gates                          |R        |

    |Seattle                        |C        |

    |Washington                     |C        |

    |Gates co-founded Microsoft     |R        |

    |Paul Allen                     |R        |

    |Albuquerque                    |C        |

    |New Mexico                     |C        |

    |Gates                          |R        |

    |Gates                          |R        |

    |Gates                          |R        |

    |Microsoft                      |G        |

    |Bill & Melinda Gates Foundation|G        |

    |Melinda Gates                  |R        |

    |Ray Ozzie                      |R        |

    |Craig Mundie                   |R        |

    |Microsoft                      |G        |

    +-------------------------------+---------+

    """)

    # Model Info Section
    st.markdown('<div class="sub-title">Model Info</div>', unsafe_allow_html=True)
    st.markdown("""

    <div class="section">

        <ul>

            <li><strong>Model Name:</strong> roberta_ner_roberta_large_ner_english</li>

            <li><strong>Compatibility:</strong> Spark NLP 3.4.2+</li>

            <li><strong>License:</strong> Open Source</li>

            <li><strong>Edition:</strong> Official</li>

            <li><strong>Input Labels:</strong> [document, token]</li>

            <li><strong>Output Labels:</strong> [ner]</li>

            <li><strong>Language:</strong> English (en)</li>

            <li><strong>Size:</strong> 1.3 GB</li>

            <li><strong>Case Sensitive:</strong> True</li>

            <li><strong>Max Sentence Length:</strong> 128</li>

        </ul>

    </div>

    """, unsafe_allow_html=True)

    # References Section
    st.markdown('<div class="sub-title">References</div>', unsafe_allow_html=True)
    st.markdown("""

    <div class="section">

        <ul>

            <li><a class="link" href="https://huggingface.co/Jean-Baptiste/roberta-large-ner-english" target="_blank">Jean-Baptiste's RoBERTa NER Model on Hugging Face</a></li>

            <li><a class="link" href="https://medium.com/@jean-baptiste.polle/lstm-model-for-email-signature-detection-8e990384fefa" target="_blank">LSTM Model for Email Signature Detection</a></li>

        </ul>

    </div>

    """, unsafe_allow_html=True)

with tab2:
    # RoBERTa Zero-Shot Classification
    st.markdown("""

    <div class="section">

        <h2>RoBERTa for Zero-Shot Classification</h2>

        <p>The <strong>RoBertaForZeroShotClassification</strong> annotator is designed for zero-shot text classification, particularly in English. This model utilizes the RoBERTa Base architecture fine-tuned on Natural Language Inference (NLI) tasks, allowing it to classify text into labels it has not seen during training.</p>

        <p>Key features of this model include:</p>

        <ul>

            <li><strong>Zero-Shot Classification:</strong> Classify text into dynamic categories defined at runtime without requiring predefined classes.</li>

            <li><strong>Flexibility:</strong> Adjusts to different classification scenarios by specifying candidate labels as needed.</li>

            <li><strong>Model Foundation:</strong> Based on RoBERTa and fine-tuned with NLI data for robust performance across various tasks.</li>

        </ul>

        <p>This model is ideal for applications where predefined categories are not available or frequently change, offering flexibility and adaptability in text classification tasks.</p>

        <table class="benchmark-table">

            <tr>

                <th>Text</th>

                <th>Predicted Category</th>

            </tr>

            <tr>

                <td>"I have a problem with my iPhone that needs to be resolved ASAP!!"</td>

                <td>Urgent</td>

            </tr>

            <tr>

                <td>"The latest advancements in technology are fascinating."</td>

                <td>Technology</td>

            </tr>

        </table>

    </div>

    """, unsafe_allow_html=True)

    # RoBERTA Zero-Shot Classification Base - NLI
    st.markdown('<div class="sub-title">RoBERTA Zero-Shot Classification Base - NLI</div>', unsafe_allow_html=True)
    st.markdown("""

    <div class="section">

        <p>The <strong>roberta_base_zero_shot_classifier_nli</strong> model is tailored for zero-shot text classification tasks, enabling dynamic classification based on labels specified at runtime. Fine-tuned on Natural Language Inference (NLI) tasks, this model leverages the RoBERTa architecture to provide flexible and robust classification capabilities.</p>

    </div>

    """, unsafe_allow_html=True)

    # How to Use the Model - Zero-Shot Classification
    st.markdown('<div class="sub-title">How to Use the Model</div>', unsafe_allow_html=True)
    st.code('''

    from sparknlp.base import *

    from sparknlp.annotator import *

    from pyspark.ml import Pipeline



    document_assembler = DocumentAssembler() \\

    .setInputCol('text') \\

    .setOutputCol('document')



    tokenizer = Tokenizer() \\

    .setInputCols(['document']) \\

    .setOutputCol('token')



    zeroShotClassifier = RoBertaForZeroShotClassification \\

    .pretrained('roberta_base_zero_shot_classifier_nli', 'en') \\

    .setInputCols(['token', 'document']) \\

    .setOutputCol('class') \\

    .setCaseSensitive(False) \\

    .setMaxSentenceLength(512) \\

    .setCandidateLabels(["urgent", "mobile", "travel", "movie", "music", "sport", "weather", "technology"])



    pipeline = Pipeline(stages=[

    document_assembler,

    tokenizer,

    zeroShotClassifier

    ])



    example = spark.createDataFrame([['I have a problem with my iPhone that needs to be resolved ASAP!!']]).toDF("text")

    result = pipeline.fit(example).transform(example)



    result.select('document.result', 'class.result').show(truncate=False)

    ''', language='python')

    st.text("""

    +------------------------------------------------------------------+------------+

    |result                                                            |result      |

    +------------------------------------------------------------------+------------+

    |[I have a problem with my iPhone that needs to be resolved ASAP!!]|[technology]|

    +------------------------------------------------------------------+------------+

    """)

    # Model Information - Zero-Shot Classification
    st.markdown('<div class="sub-title">Model Information</div>', unsafe_allow_html=True)
    st.markdown("""

        <table class="benchmark-table">

            <tr>

                <th>Attribute</th>

                <th>Description</th>

            </tr>

            <tr>

                <td><strong>Model Name</strong></td>

                <td>roberta_base_zero_shot_classifier_nli</td>

            </tr>

            <tr>

                <td><strong>Compatibility</strong></td>

                <td>Spark NLP 4.4.2+</td>

            </tr>

            <tr>

                <td><strong>License</strong></td>

                <td>Open Source</td>

            </tr>

            <tr>

                <td><strong>Edition</strong></td>

                <td>Official</td>

            </tr>

            <tr>

                <td><strong>Input Labels</strong></td>

                <td>[token, document]</td>

            </tr>

            <tr>

                <td><strong>Output Labels</strong></td>

                <td>[multi_class]</td>

            </tr>

            <tr>

                <td><strong>Language</strong></td>

                <td>en</td>

            </tr>

            <tr>

                <td><strong>Size</strong></td>

                <td>466.4 MB</td>

            </tr>

            <tr>

                <td><strong>Case Sensitive</strong></td>

                <td>true</td>

            </tr>

        </table>

    """, unsafe_allow_html=True)

    # References - Zero-Shot Classification
    st.markdown('<div class="sub-title">References</div>', unsafe_allow_html=True)
    st.markdown("""

    <div class="section">

        <ul>

            <li><a class="link" href="https://github.com/huggingface/transformers" target="_blank" rel="noopener">Hugging Face Transformers</a></li>

            <li><a class="link" href="https://arxiv.org/abs/1905.05583" target="_blank" rel="noopener">RoBERTa: A Robustly Optimized BERT Pretraining Approach</a></li>

            <li><a class="link" href="https://huggingface.co/roberta-base" target="_blank" rel="noopener">Hugging Face RoBERTa Models</a></li>

        </ul>

    </div>

    """, unsafe_allow_html=True)

with tab3:
    # RoBERTa Sequence Classification
    st.markdown("""

    <div class="section">

        <h2>RoBERTa for Sequence Classification</h2>

        <p>The <strong>RoBertaForSequenceClassification</strong> annotator is designed for tasks such as sentiment analysis and sequence classification using the RoBERTa model. This model handles classification tasks efficiently and is adapted for production-readiness with Spark NLP.</p>

        <p>Sequence classification with RoBERTa enables:</p>

        <ul>

            <li><strong>Sentiment Analysis:</strong> Determining sentiment expressed in text as negative, neutral, or positive.</li>

            <li><strong>Text Classification:</strong> Categorizing text into predefined classes such as sentiment or topic categories.</li>

            <li><strong>Document Analysis:</strong> Enhancing the analysis and categorization of documents based on content.</li>

        </ul>

        <p>Here is an example of how RoBERTa sequence classification works:</p>

        <table class="benchmark-table">

            <tr>

                <th>Text</th>

                <th>Label</th>

            </tr>

            <tr>

                <td>The new RoBERTa model shows significant improvements in performance.</td>

                <td>Positive</td>

            </tr>

            <tr>

                <td>The training was not very effective and did not yield desired results.</td>

                <td>Negative</td>

            </tr>

            <tr>

                <td>The overall feedback on the new features has been mixed.</td>

                <td>Neutral</td>

            </tr>

        </table>

    </div>

    """, unsafe_allow_html=True)

    # RoBERTa Sequence Classification - ACTS Feedback1
    st.markdown('<div class="sub-title">RoBERTa Sequence Classification - ACTS Feedback1</div>', unsafe_allow_html=True)
    st.markdown("""

    <div class="section">

        <p>The <strong>roberta_classifier_acts_feedback1</strong> model is a fine-tuned RoBERTa model for sequence classification tasks, specifically adapted for English text. This model was originally trained by mp6kv and is curated to provide scalability and production-readiness using Spark NLP. It can classify text into three categories: negative, neutral, and positive.</p>

    </div>

    """, unsafe_allow_html=True)

    # How to Use the Model - Sequence Classification
    st.markdown('<div class="sub-title">How to Use the Model</div>', unsafe_allow_html=True)
    st.code('''

    from sparknlp.base import *

    from sparknlp.annotator import *

    from pyspark.ml import Pipeline



    document_assembler = DocumentAssembler() \\

        .setInputCol("text") \\

        .setOutputCol("document")



    tokenizer = Tokenizer() \\

        .setInputCols("document") \\

        .setOutputCol("token")



    seq_classifier = RoBertaForSequenceClassification \\

        .pretrained("roberta_classifier_acts_feedback1", "en") \\

        .setInputCols(["document", "token"]) \\

        .setOutputCol("class")



    pipeline = Pipeline(stages=[document_assembler, tokenizer, seq_classifier])



    data = spark.createDataFrame([["I had a fantastic day at the park with my friends and family, enjoying the beautiful weather and fun activities."]]).toDF("text")



    result = pipeline.fit(data).transform(data)



    result.select('class.result').show(truncate=False)

    ''', language='python')

    # Results
    st.text("""

    +----------+

    |result    |

    +----------+

    |[positive]|

    +----------+

    """)

    # Model Info Section
    st.markdown('<div class="sub-title">Model Info</div>', unsafe_allow_html=True)
    st.markdown("""

    <div class="section">

        <ul>

            <li><strong>Model Name:</strong> roberta_classifier_acts_feedback1</li>

            <li><strong>Compatibility:</strong> Spark NLP 5.2.0+</li>

            <li><strong>License:</strong> Open Source</li>

            <li><strong>Edition:</strong> Official</li>

            <li><strong>Input Labels:</strong> [document, token]</li>

            <li><strong>Output Labels:</strong> [class]</li>

            <li><strong>Language:</strong> en</li>

            <li><strong>Size:</strong> 424.8 MB</li>

            <li><strong>Case Sensitive:</strong> True</li>

            <li><strong>Max Sentence Length:</strong> 256</li>

        </ul>

    </div>

    """, unsafe_allow_html=True)

    # References Section
    st.markdown('<div class="sub-title">References</div>', unsafe_allow_html=True)
    st.markdown("""

    <div class="section">

        <ul>

            <li><a class="link" href="https://huggingface.co/mp6kv/ACTS_feedback1" target="_blank" rel="noopener">ACTS Feedback1 Model on Hugging Face</a></li>

            <li><a class="link" href="https://arxiv.org/abs/1907.11692" target="_blank" rel="noopener">RoBERTa: A Robustly Optimized BERT Pretraining Approach</a></li>

            <li><a class="link" href="https://github.com/huggingface/transformers" target="_blank" rel="noopener">Hugging Face Transformers</a></li>

        </ul>

    </div>

    """, unsafe_allow_html=True)

with tab4:
    st.markdown("""

    <div class="section">

        <h2>RoBERTa for Question Answering</h2>

        <p>The <strong>RoBertaForQuestionAnswering</strong> annotator is designed for extracting answers from a given context based on a specific question. This model leverages RoBERTa's capabilities to accurately find and provide answers, making it suitable for applications that require detailed information retrieval. Question answering with RoBERTa is especially useful for:</p>

        <ul>

            <li><strong>Building Advanced QA Systems:</strong> Developing systems capable of answering user queries with high accuracy.</li>

            <li><strong>Enhancing Customer Service:</strong> Providing precise answers to customer questions in support environments.</li>

            <li><strong>Improving Information Retrieval:</strong> Extracting specific answers from large text corpora.</li>

        </ul>

        <p>Utilizing this annotator can significantly enhance your ability to retrieve and deliver accurate answers from text data.</p>

        <table class="benchmark-table">

            <tr>

                <th>Context</th>

                <th>Question</th>

                <th>Predicted Answer</th>

            </tr>

            <tr>

                <td>"The Eiffel Tower is one of the most recognizable structures in the world. It was constructed in 1889 as the entrance arch to the 1889 World's Fair held in Paris, France."</td>

                <td>"When was the Eiffel Tower constructed?"</td>

                <td>1889</td>

            </tr>

            <tr>

                <td>"The Amazon rainforest, also known as Amazonia, is a vast tropical rainforest in South America. It is home to an incredible diversity of flora and fauna."</td>

                <td>"What is the Amazon rainforest also known as?"</td>

                <td>Amazonia</td>

            </tr>

            <tr>

                <td>"The Great Wall of China is a series of fortifications made of various materials, stretching over 13,000 miles across northern China."</td>

                <td>"How long is the Great Wall of China?"</td>

                <td>13,000 miles</td>

            </tr>

        </table>

    </div>

    """, unsafe_allow_html=True)

    # RoBERTa for Question Answering - icebert_finetuned_squad_10
    st.markdown('<div class="sub-title">icebert_finetuned_squad_10</div>', unsafe_allow_html=True)
    st.markdown("""

    <div class="section">

        <p>This model is a pretrained RoBERTa model, adapted from Hugging Face, specifically fine-tuned for question-answering tasks. It has been curated to provide scalability and production-readiness using Spark NLP. The <strong>icebert_finetuned_squad_10</strong> model is originally trained by gudjonk93 for English language tasks.</p>

    </div>

    """, unsafe_allow_html=True)

    # How to Use the Model - Question Answering
    st.markdown('<div class="sub-title">How to Use the Model</div>', unsafe_allow_html=True)
    st.code('''

    from sparknlp.base import *

    from sparknlp.annotator import *

    from pyspark.ml import Pipeline



    # Document Assembler

    document_assembler = MultiDocumentAssembler() \\

        .setInputCols(["question", "context"]) \\

        .setOutputCols(["document_question", "document_context"])



    # RoBertaForQuestionAnswering

    spanClassifier = RoBertaForQuestionAnswering.pretrained("icebert_finetuned_squad_10", "en") \\

        .setInputCols(["document_question", "document_context"]) \\

        .setOutputCol("answer")



    # Pipeline

    pipeline = Pipeline().setStages([

        document_assembler,

        spanClassifier

    ])



    # Create example DataFrame

    example = spark.createDataFrame([

        ["What's my name?", "My name is Clara and I live in Berkeley."]

    ]).toDF("question", "context")



    # Fit and transform the data

    pipelineModel = pipeline.fit(example)

    result = pipelineModel.transform(example)



    # Show results

    result.select('document_question.result', 'answer.result').show(truncate=False)

    ''', language='python')

    st.text("""

    +-----------------+-------+

    |result           |result |

    +-----------------+-------+

    |[What's my name?]|[Clara]|

    +-----------------+-------+

    """)

    # Model Information - Question Answering
    st.markdown('<div class="sub-title">Model Information</div>', unsafe_allow_html=True)
    st.markdown("""

    <table class="benchmark-table">

        <tr>

            <th>Attribute</th>

            <th>Description</th>

        </tr>

        <tr>

            <td><strong>Model Name</strong></td>

            <td>icebert_finetuned_squad_10</td>

        </tr>

        <tr>

            <td><strong>Compatibility</strong></td>

            <td>Spark NLP 5.2.1+</td>

        </tr>

        <tr>

            <td><strong>License</strong></td>

            <td>Open Source</td>

        </tr>

        <tr>

            <td><strong>Edition</strong></td>

            <td>Official</td>

        </tr>

        <tr>

            <td><strong>Input Labels</strong></td>

            <td>[document_question, document_context]</td>

        </tr>

        <tr>

            <td><strong>Output Labels</strong></td>

            <td>[answer]</td>

        </tr>

        <tr>

            <td><strong>Language</strong></td>

            <td>en</td>

        </tr>

        <tr>

            <td><strong>Size</strong></td>

            <td>450.4 MB</td>

        </tr>

    </table>

    """, unsafe_allow_html=True)

    # References - Question Answering
    st.markdown('<div class="sub-title">References</div>', unsafe_allow_html=True)
    st.markdown("""

    <div class="section">

        <ul>

            <li><a class="link" href="https://huggingface.co/gudjonk93/IceBERT-finetuned-squad-10" target="_blank" rel="noopener">IceBERT Model on Hugging Face</a></li>

            <li><a class="link" href="https://arxiv.org/abs/1810.04805" target="_blank" rel="noopener">BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding</a></li>

            <li><a class="link" href="https://github.com/google-research/bert" target="_blank" rel="noopener">Google Research BERT</a></li>

        </ul>

    </div>

    """, unsafe_allow_html=True)

# Community & Support
st.markdown('<div class="sub-title">Community & Support</div>', unsafe_allow_html=True)
st.markdown("""

<div class="section">

    <ul>

        <li><a class="link" href="https://sparknlp.org/" target="_blank">Official Website</a>: Documentation and examples</li>

        <li><a class="link" href="https://join.slack.com/t/spark-nlp/shared_invite/zt-198dipu77-L3UWNe_AJ8xqDk0ivmih5Q" target="_blank">Slack</a>: Live discussion with the community and team</li>

        <li><a class="link" href="https://github.com/JohnSnowLabs/spark-nlp" target="_blank">GitHub</a>: Bug reports, feature requests, and contributions</li>

        <li><a class="link" href="https://medium.com/spark-nlp" target="_blank">Medium</a>: Spark NLP articles</li>

        <li><a class="link" href="https://www.youtube.com/channel/UCmFOjlpYEhxf_wJUDuz6xxQ/videos" target="_blank">YouTube</a>: Video tutorials</li>

    </ul>

</div>

""", unsafe_allow_html=True)