- Django application
- Web crawling (Scrapy)
An easy way to search the FreeBSD pkg-fallout reports.
Be nice!
Install all requirements:
django
requests
scrapy
djangorestframework
python-dateutil
dnspython
Copy the sample settings.py
and configure your database access:
$ cp portsfallout/settings_dev.py portsfallout/settings.py
Create initial database:
$ python manage.py migrate
Operations to perform:
Apply all migrations: admin, auth, contenttypes, ports, sessions
Running migrations:
Applying contenttypes.0001_initial... OK
Applying auth.0001_initial... OK
Applying admin.0001_initial... OK
Applying admin.0002_logentry_remove_auto_add... OK
Applying admin.0003_logentry_add_action_flag_choices... OK
Applying contenttypes.0002_remove_content_type_name... OK
Applying auth.0002_alter_permission_name_max_length... OK
Applying auth.0003_alter_user_email_max_length... OK
Applying auth.0004_alter_user_username_opts... OK
Applying auth.0005_alter_user_last_login_null... OK
Applying auth.0006_require_contenttypes_0002... OK
Applying auth.0007_alter_validators_add_error_messages... OK
Applying auth.0008_alter_user_username_max_length... OK
Applying auth.0009_alter_user_last_name_max_length... OK
Applying auth.0010_alter_group_name_max_length... OK
Applying auth.0011_update_proxy_permissions... OK
Applying ports.0001_initial... OK
Applying sessions.0001_initial... OK
Populate database (ports and fallout info):
$ ./scripts/cron-import-index.sh
$ ./scripts/cron-scrapy.sh
Start web-server:
$ python manage.py runserver
You can also fetch older fallouts:
$ cd scripts
Crawling messages from an specific month / Verbose
$ scrapy runspider -O scrapy_output/2021-May.json \
-a scrapydate="2021-May" pkgfallout_scrapy_spider.py
Then import all .json files to database:
$ python import-scrapy.py
More info in scripts/pkgfallout_scrapy_spider.py
.
Execution for keeping the database always updated:
# Update ports tree reference in the database
30 0 * * * /portsfallout/scripts/cron-import-index.sh
# Fetch/import all pkg-fallout's reports from the Mlmmj archive of the
# current month. Requests are cached, only new fallouts are fetched.
45 0 * * * /portsfallout/scripts/cron-scrapy.sh
# Fetch/import pkg-fallout's from the last month
30 10 * * * /portsfallout/scripts/cron-scrapy.sh lastmonth
# Update DNS values of the pkg-fallout servers
45 3 * * * python manage.py server_update
45 3 * * * python manage.py server_update -v 0 # no output