This data is readily available through FIFA's data API, simply made available here as a CSV file for easy access:
1. Four new columns have been added to the dataset, all of them with a column name prefix of lagged_
They are as follows.
lagged_ranking_date
- As each observation has values for both
previous_rank
andprevious_points
, but no information on the date of said previous rank and previous points, this is an attempt to amend that by applying a lag of 1 to theranking_date
column.
lagged_rank
- This is the equivalent of
previous_rank
but is achieved by applying a lag of 1 to therank
column.
lagged_points
- This is the equivalent of
previous_points
but is achieved by applying a lag of 1 to thepoints
column.
lagged_true
- This column is a complement to the three previous
lagged_
columns. Lagging leaves explicitNA
values for the first time each national team appears in the data. For convenience I've filled in thoseNA
values with data. This makes it unnecessary to make changes to the actual data in theprevious_rank
andprevious_points
columns where it's seemingly erroneous. Withlagged_ranking_date
it's easier, for instance, to do fuzzy joins with other datasets based on the time intervals between rankings.
This is how the data is inserted:
column | data inserted |
---|---|
lagged_ranking_date |
"2002-12-31" |
lagged_rank |
corresponding previous_rank value |
lagged_points |
corresponding previous_points value |
The lagged_true
column only serves to denote which of the rows are the results of the actual lag function (the rows where the value of lagged_true
is "TRUE) and which ones are the rows with the aforementioned inserted data (the rows where the value of lagged_true
is "FALSE".