Skip to content

kybeka/ir-babes

Repository files navigation

Information Retreival USI INF 2023

Topic:

  • Artifical Intelligence News

Authors:

  • Kyla Kaplan
  • Elvira Baltasar

Libraries:

  • Scrapy
  • Pyterrier

Websites scraped:

File Breakdown (temporary)

  • ir // main folder
    • scrapy.cfg // deployment configuration file

    • middlewares.py // project middlewares file

    • pipelines.py // project pipelines file

    • settings.py // project setting file

    • spiders/ // where the spiders are stored

      • _init.py // spider initializer
      • example.py // example spider

Execution

In order to run the necessary spider:
- $ scrapy crawl NAMEOFSPIDER -o results.jsonl