You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My code giving me all the delays in the stations worked perfectly a couple of months ago. The last few days I tried to rerun the same script but always got the same error. I've changed my script and updating the packages since, but i'm unable to make it work again. I asked more advanced programmers who said the error looks like there is something wrong with the data coming from the website I'm scraping, not your script. Has the format changed or something?
library(httr)
library(jsonlite)
library(tidyverse)
load.stations <- function(){
a <- GET("https://api.irail.be/stations/?format=json") #get command for all stations from irail api
parsed <- jsonlite::fromJSON(content(a, "text"), flatten=TRUE) #parse json into r
stations <- parsed$station %>%
filter(grepl("^BE.NMBS.0088",id)) #keep only stations in Belgium. Regular expression ^ is begins with
return(stations)
}
get.time <- function(){
time <- paste(format(Sys.time(),"%d/%m/%y %H:%M:%S")) #formats system time in dd/mm/yyyy hh:mm:ss in a string
strpt <- strptime(time,"%d/%m/%y %H:%M:%S") #takes time-string and converts to interpretable date and time
return(strpt)
}
get.temp_df <- function(stations, i){
goget <- paste0("https://api.irail.be/liveboard/?format=json&id=",stations$id[i]) #http for get command, get liveboard (similar to screens in station i)
c <- GET(goget) #get the data
parsed_c <- jsonlite::fromJSON(content(c, "text"), flatten=TRUE) #parse from json
temp_df <- parsed_c$departures$departure #get the dataframe with departures from the parsed json
return(temp_df)
}
add.to.all <- function(all_df, temp_df){
all_df <- rbind(all_df,temp_df)%>% #add temporary dataframe to master dataframe
group_by(stationneke,time,vehicle)%>% # group departure times by station - remove doubles
top_n(1,importtime)%>% #only keep the most recent observation - remove doubles 2
ungroup() #lift grouping
return(all_df)
}
save.day <- function(all_df){
strpt <- get.time()
saveRDS(all_df,file = paste(strpt$mday, strpt$mon+1, strpt$year+1900,"Punct.rda",sep = "-"))
Sys.sleep(time = 3600-(strpt$min*60+strpt$sec)) #sleep one hour minus number of secs in the sleep time
return(data.frame())
}
library(httr)
library(jsonlite)
library(tidyverse)
## all departures - scraper
loop.scraper <- function(hour_of_pause =3){
source("NMBS-punctuality-functions.R")
all_df <- data.frame() #leeg dataframe
stations <- load.stations()
while (TRUE) { #infinite loop
strpt <- get.time()
while(strpt$hour != hour_of_pause){ #enters loop when hour is not "hour_of_pause"
# startloop <- (strpt$min*60 + strpt$sec)
for (i in 1:nrow(stations)) { #second loop through the stations
temp_df <- get.temp_df(stations, i)
if(is.null(temp_df)) next #skip if dataframe is empty (some stations have been closed in recent years)
temp_df$stationneke <- stations$name[i] #add departure station name i to the dataframe
temp_df$importtime <- Sys.time() # add variable with the time of import of the observation
all_df <- add.to.all(all_df, temp_df)
strpt <- get.time()
} #end of loop through stations
# stoploop <- (strpt$min*60 + strpt$sec)
} #end of hour-check loop, code below only executed when no trains active (at night)
all_df <- save.day(all_df) #saves file and returns empty dataframe
}
}
The text was updated successfully, but these errors were encountered:
In order to find the root cause we need a bit more information:
What is the URL of the api page that can't be parsed?
What is the response on the page? "Fatal error: Unc" is the start of an error message, but the important part is cut off.
In general I'd also say you're better off using another data format instead of scraping data from all stations for analytics. GTFS-RT is a way better fit, but is hard to "quickly use" as it needs a lot of preprocessing.
We're working on a new "graph" API, which is based of this GTFS-RT data with the preprocessing already done for you: https://graph.irail.be/sncb/connections . This is a list of all departing trains, paginated by their departure date. If you get the pages for the upcoming hour, you got all departures and arrivals all Belgian stations for the upcoming hour. This might be interesting for your use case, as this API to handle lots of requests and allows you to run analytics, and to reuse data client-side for different questions. See https://linkedconnections.org/ for more information, Pieter will also gladly tell you more. Ping @pietercolpaert.
This isn't a "stop using this API" thing, it's just something to consider in the future as it might make things easier for you ;) .
As a small footnote, I'd recommend you to set a user-agent header when making requests to our API (this might be hard in R, but if it's possible, do it). This way we can contact you if we notice strange things on our side such as invalid requests, or to see who gets rate limited in order to resolve it together.
My code giving me all the delays in the stations worked perfectly a couple of months ago. The last few days I tried to rerun the same script but always got the same error. I've changed my script and updating the packages since, but i'm unable to make it work again. I asked more advanced programmers who said the error looks like there is something wrong with the data coming from the website I'm scraping, not your script. Has the format changed or something?
The error/traceback:
My code (one script is filled with functions):
The text was updated successfully, but these errors were encountered: