Spaces:

anakin87
/

who-killed-laura-palmer

Running

App Files Files Community

who-killed-laura-palmer / crawler /README.md

Stefano Fiorucci

crawler refactoring

82fe524 over 2 years ago

|

817 Bytes

Twin Peaks crawler

This crawler download texts and metadata from Twin Peaks Fandom Wiki. The output format is JSON. The crawler is based on the combination of Scrapy and fandom-py.

Several wiki pages are discarded, since they are not related to Twin Peaks plot and create noise in the Question Answering index.

Installation

pip install -r requirements.txt
copy this folder (if needed, see stackoverflow)

Usage

(if needed, activate the virtual environment)
cd tpcrawler
scrapy crawl tpcrawler
you can find the downloaded pages in data subfolder