victormiller commited on
Commit
1ac16da
1 Parent(s): 9919a10

Update web.py

Browse files
Files changed (1) hide show
  1. web.py +1 -1
web.py CHANGED
@@ -285,7 +285,7 @@ def web_data():
285
  H2("Stage 1: Document Preparation"),
286
 
287
 
288
- P(B("Text Extraction: ")), """
289
  Common Crawl provides webpage texts via two formats: WARC (Web ARChive format) and WET (WARC Encapsulated Text).
290
  WARC files contain the raw data from the crawl, which store the full HTTP response and request metadata.
291
  WET files contain plaintexts extracted by Common Crawl. In line with previous works ([1], [2], [3], [4]),
 
285
  H2("Stage 1: Document Preparation"),
286
 
287
 
288
+ P(B("Text Extraction: "), """
289
  Common Crawl provides webpage texts via two formats: WARC (Web ARChive format) and WET (WARC Encapsulated Text).
290
  WARC files contain the raw data from the crawl, which store the full HTTP response and request metadata.
291
  WET files contain plaintexts extracted by Common Crawl. In line with previous works ([1], [2], [3], [4]),