Scrapers

This directory contains the following scrapers:

PracticalPlants
Permapeople
Reinsaat

Requirements

nodejs v14.21.2
npm

Installation and Usage

Configure your environment:

Install dependencies

npm install && mkdir -p data

Create .env.local file from .env.example and fill in the required values:

cp .env.example .env.local

Installation Option 1: With a single command

The following command will fetch the data from the sources, merge the datasets, apply the overrides and insert the data into the database:

npm run start:full

If you would like to skip the fetching steps and import your csv files from Nextcloud, you can use the following command:

npm run start

Note: you will need the following files in the data directory:

detail.csv - scraped from PracticalPlants
permapeopleRawData.csv - scraped from Permapeople
reinsaatRawData.csv - scraped from Reinsaat and merged from reinsaatRawDataEN.csv and reinsaatRawDataDE.csv
germanCommonNames.csv - scraped from wikidata

Installation Option 2: Step by Step

The following steps describe how to use the scraper to fetch the data from the sources and insert it into the database. The steps are simplified and only the most important commands are listed. For more information, please refer to the documentation of the individual scrapers linked in the first paragraph of this doc.

Fetch the data

The scraper scrapes the data from the sources and stores it in csv format in the data directory:

npm run fetch:practicalplants

npm run fetch:permapeople

npm run fetch:reinsaat && npm run merge:reinsaat

The scraped data is stored in the data directory:

detail.csv: This file contains the raw data scraped from the PracticalPlants webpage.
permapeopleRawData.csv: This file contains the raw data scraped from the Permapeople webpage.
reinsaatRawDataEN.csv: This file contains the raw data scraped from the english version of the Reinsaat webpage.
reinsaatRawDataDE.csv: This file contains the raw data scraped from the german version of the Reinsaat webpage.
reinsaatRawData.csv: This file contains the merged data scraped from the english and german version of the Reinsaat webpage.
germanCommonNames.csv: This file contains the German common names fetched from https://www.wikidata.org

Merge the scraped datasets

The scraper also merges the scraped data of all the sources and stores it in csv format in the data directory:

mergedDatasets.csv: This file contains the merged datasets

This can be done with the following command:

npm run merge:datasets

Fetch German common names

Goes through all unique names from mergedDatasets.csv and fetches the German common names from https://www.wikidata.org concurrently. Then merges them into mergedDatasets.csv

If it starts throwing 429 errors, reduce MAX_CONCURRENT_REQUESTS to a lower number, such as 10.

npm run fetch:germannames && npm run merge:germannames

Apply overrides

The scraped data can contain inconsistencies and errors. In order to correct these mistakes, we can create override files. data/overrides may contain any number of csv files, which are applied consecutively to mergedDatasets.csv to create finalDataset.csv

For details see data/overrides/README.md

npm run apply:overrides

Insert the data into the database

The scraper also inserts the scraped data into the database:

npm run insert:plants

Insert relations into the database

The scraper inserts the relation data into the database.

First you need to download the Companions.csv and Antagonist.csv file from the nextcloud server or export them yourself from the current Plant_Relations.ods.
Copy them into the /data directory and run:

npm run insert:relations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Scrapers

Requirements

Installation and Usage

Installation Option 1: With a single command

Installation Option 2: Step by Step

Files

README.md

Latest commit

History

README.md

File metadata and controls

Scrapers

Requirements

Installation and Usage

Installation Option 1: With a single command

Installation Option 2: Step by Step