Skip to content

GV scraper gathers bus information from Grande Vitória and Espírito Santo state, by different companies and sources, in different formats, for use with osm2gtfs

License

Notifications You must be signed in to change notification settings

Skippern/GV-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GV Scraper

GV Scraper gathers bus information from Grande Vitória, by different companies and sources, in different formats, for use with osm2gtfs

The scraper script downloads the timetables on PDF files, supplied on the web site of Expresso Lorenzutti and Sanremo, and extracts the timetable from it before storing them as a JSON file.

For Transcol and Seletivo, it uses the same JSON interface, used by the Ceturb site.

For Planeta, timetables are posted as tables in HTML, each variation is a separate route, using the page index as ref tag on the routes.

The JSON format is developed in collaboration with the developers of osm2gtfs for full functionallity.

Requirements

The script requires pdfminer, requests, overpass, logging, json, workalendar and datetime python moduls and runs under Python2.7

Install dependencies by running

pip install -r requirements.txt
  • osrm need to be installed manually. If not installed, or if install not importing, fallback to YOURS over requests

Usage

In each folder, to obtain the duration of the routes, just run get_duration.py. To generate a times.json file for osm2gtfs, when durations.json is up to date, just run get_times.py

Durations

There is a separate script, get_durations.py that tests the route relations against OSRM to generate a list of durations. This script is only needed to run when significant changes have been done in the itenerary, or new routes have been added. Mark that it will not erase the duration of routes that have been discontinued.

Routing is done by selecting the route relation in question with an overpass query, and creates a list of waypoints that are passed to the selected routing engine.

If there are no route relation for a specific route, it returns -1 duration, this is a signal to the scraper to test against the default value (60). Mark that routes that doesn't have a relation will not be handled by osm2gtfs either. Other negative values have different meanings, but for short means that no relation found or impossible to calculate route due to missing waypoints.

List of negative durations:

  • -1: Route have no valid stop positions.
  • -2: Route have only one valid stop position, and it is neither start, nor end.
  • -3: Route doesn't start with a valid stop position.
  • -4: Route doesn't end with a valid stop position.
  • -5: No valid routes found
  • -6: Circular route (same start and end position) with no aditional stops

get_durations.py depends, in addition to the above mentioned, on overpass and osrm python modules.

As a fallback if osrm is not installed, or installation doesn't work, routing can be handled by a YOURS web interface, using requests calls. This is ment as a fallback, since YOURS must route between two nodes, so a long route must be called in a series of calls, instead of osrm that can take the entire waypoint list in one call.

Calendar exceptions

For routes such as Transcol, I have added feriados.py, requiring workalendar python module. The workalendar give a system for handling holidays, and feriados.py use them to create different lists of holidays within a given year. This way, exceptions can be handled in an intelligent manner. workalendar handles fixed holidays as well as moving holidays.

List of services

Urban services

Intercity services

Intercity services from DER-ES

Interstate services from ANTT

This will not be pursued, if a proper API can be found, this can be done per company, but also mapping of such routes can be challenging as some of them spans the entire territory of Brazil. It will be preferred if these companies can supply their own GTFS sources.

Other services

  • EFVM Estrada Ferroviaria Vitoria Minas. Static times.json file.

Other Sources

About

GV scraper gathers bus information from Grande Vitória and Espírito Santo state, by different companies and sources, in different formats, for use with osm2gtfs

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published