The following scripts were used to generate the analyses and figures of the Data Paper “Biodiversidata: A novel dataset for the vascular plant species diversity in Uruguay”, published in the Biodiversity Data Journal.
The dataset Biodiversidata_Plants_1.0.0 provides primary biodiversity data on extant vascular plant (native and introduced) species recorded within the country area of Uruguay. This is Biodiversidata's second open-access set of data on the biodiversity of the country.
We will see the data's:
To run this code you will need the following R packages:
library(extrafont)
library(lubridate)
library(gridExtra)
library(ggpubr)
library(sf)
library(tidyverse)
Let's start by reading the data from the database. Once you have downloaded the dataset, the data will look like this (a tibble with 12,470 rows and 61 columns):
Biodiversidata_Plants
## # A tibble: 12,470 x 61
## license occurrenceID basisOfRecord recordedBy establishmentMe~
## <chr> <chr> <chr> <chr> <chr>
## 1 https:~ Biodiversid~ HumanObserva~ Andrés Go~ introducida
## 2 https:~ Biodiversid~ HumanObserva~ Andrés Go~ nativa
## 3 https:~ Biodiversid~ HumanObserva~ Andrés Go~ nativa
## 4 https:~ Biodiversid~ HumanObserva~ Andrés Go~ nativa
## 5 https:~ Biodiversid~ HumanObserva~ Andrés Go~ nativa
## 6 https:~ Biodiversid~ HumanObserva~ Andrés Go~ nativa
## 7 https:~ Biodiversid~ HumanObserva~ Andrés Go~ nativa
## 8 https:~ Biodiversid~ HumanObserva~ Andrés Go~ nativa
## 9 https:~ Biodiversid~ HumanObserva~ Andrés Go~ nativa
## 10 https:~ Biodiversid~ HumanObserva~ Andrés Go~ nativa
## # ... with 12,460 more rows, and 56 more variables:
## # previousIdentifications <chr>, eventDate <chr>, year <dbl>, month <dbl>,
## # day <dbl>, higherGeography <chr>, continent <chr>, country <chr>,
## # countryCode <chr>, stateProvince <chr>, locality <chr>,
## # decimalLatitude <dbl>, decimalLongitude <dbl>, geodeticDatum <chr>,
## # coordinateUncertaintyInMeters <dbl>, coordinatePrecision <dbl>,
## # georeferencedBy <chr>, identifiedBy <chr>, taxonID <chr>,
## # scientificName <chr>, nameAccordingTo <chr>, higherClassification <chr>,
## # kingdom <chr>, phylum <chr>, class <chr>, order <chr>, family <chr>,
## # species <chr>, genus <chr>, specificEpithet <chr>,
## # infraspecificEpithet <chr>, taxonRank <chr>,
## # scientificNameAuthorship <chr>, institutionCode <chr>,
## # collectionCode <chr>, catalogNumber <chr>, recordNumber <chr>,
## # associatedReferences <chr>, verbatimLocality <chr>,
## # georeferenceRemarks <chr>, vernacularName <chr>, locationAccordingTo <chr>,
## # georeferencedDate <date>, georeferenceSources <chr>,
## # georeferenceVerificationStatus <chr>, organismRemarks <chr>,
## # verbatimLatitude <chr>, verbatimLongitude <chr>,
## # verbatimCoordinateSystem <chr>, locationRemarks <chr>,
## # measurementType <chr>, measurementValue <chr>,
## # measurementDeterminedBy <chr>, measurementRemarks <chr>, language <chr>,
## # otherCatalogNumbers <chr>
To get sumarised data out of our database, such as the total number of
species or the total number of records with date of collection, we use
the function summarise
together with n_distinct
. We can group_by
any column we want, for example, by the taxonomic class of each
record or by any specific group. To get the Total values, we can
also summarise without grouping the data.
Biodiversidata_Plants %>%
mutate(group=ifelse(class=='Magnoliopsida'|class=='Liliopsida', 'Angiosperms',
ifelse(class=='Gnetopsida'|class=='Pinopsida', 'Gymnosperms',
ifelse(class=='Polypodiopsida', 'Ferns',
'Lycophytes')))) %>%
group_by(group) %>% # use this or comment depending on the output you want
summarise(N= n(), # total number of records
S= n_distinct(species), # number of species
wDate= sum(!(is.na(year))), # records that have a year of collection
last30y = sum(year>1990, na.rm = T), # records that were collected in the last 30 years
wCoordinates= sum(!(is.na(decimalLatitude)))) # records that have a decimal latitude
Here's a table that summarises all the data, binding the information
extracted for the groupped data and for the total data. To produce the
table we are using knitr::kable
.
Group | Number of Occurrence Records | Number of Species | Records with Date (%) | Records from the last 30 years (%) | Records with Coordinates (%) |
---|---|---|---|---|---|
Lycophytes | 13 | 6 | 13 (100) | 11 (84.6) | 13 (100) |
Ferns | 540 | 78 | 540 (100) | 508 (94.1) | 540 (100) |
Gymnosperms | 48 | 5 | 41 (85.4) | 39 (81.2) | 48 (100) |
Angiosperms | 11869 | 1559 | 10527 (88.7) | 9585 (80.8) | 11869 (100) |
Total | 12470 | 1648 | 11121 (89.2) | 10143 (81.3) | 12470 (100) |
To get other useful information to fill the metadata section, we will also summarise fields such as:
knitr::kable(Biodiversidata_Plants %>%
unite('species', genus, specificEpithet, na.rm = TRUE, sep = ' ', remove = FALSE) %>%
mutate(places=str_c(decimalLatitude, decimalLongitude)) %>% # unique point locations
summarise(recordsT= n(),
speciesN= n_distinct(species),
familyN= n_distinct(family),
genusN= n_distinct(genus),
placesN= n_distinct(places),
StartDate=min(eventDate, na.rm = T),
EndDate=max(eventDate, na.rm = T),
North=max(decimalLatitude),
South=min(decimalLatitude),
East=max(decimalLongitude),
West=min(decimalLongitude),
departamentos=n_distinct(stateProvince)) %>%
rename('Occurrence Records'=recordsT, 'Number of Species'=speciesN,
'Families'=familyN, 'Genera'=genusN,
'Localities (lat/lon)'=placesN, 'Start Date'=StartDate,
'End Date'=EndDate, 'States (Departamentos)'=departamentos), align = 'r')
Occurrence Records | Number of Species | Families | Genera | Localities (lat/lon) | Start Date | End Date | North | South | East | West | States (Departamentos) |
---|---|---|---|---|---|---|---|---|---|---|---|
12470 | 1648 | 160 | 760 | 4567 | 1877-02 | 2020-05-21 | -30.17587 | -34.9751 | -53.2636 | -58.43 | 19 |
The database includes 12,470 records from 160 families and 760 genera, representing 1,648 species: with 6 species of the Lycopodiopsida class (Lycophytes), 48 of Gnetopsida and Pinopsida classes (Gymnosperms), 78 of the Polypodiopsida class (Ferns), and 1,145 of the Magnoliopsida and Liliopsida classes (Angiosperms). Also, the database includes 1,379 native species and 270 introduced, you can see that in the column establishmentMeans.
Group | Records | Species | Families |
---|---|---|---|
Lycophytes | 13 | 6 | 2 |
Ferns | 540 | 78 | 15 |
Gymnosperms | 48 | 5 | 4 |
Angiosperms | 11869 | 1559 | 139 |
To simplify the plots we are going to keep only the families of
Angiosperms and Ferns with more than 50 a 20 records respectively. We
will use this variable familiesToPlot
to filter families in the
following plots:
familiesToPlot <- Biodiversidata_Plants %>%
group_by(class, family) %>% count() %>%
mutate(classLevels=ifelse(class=='Magnoliopsida'|class=='Liliopsida', 'Angiosperms',
ifelse(class=='Gnetopsida'|class=='Pinopsida', 'Gymnosperms',
ifelse(class=='Polypodiopsida', 'Ferns',
'Lycophytes')))) %>%
filter(classLevels!='Angiosperms' | classLevels=='Angiosperms'& n>=50) %>%
filter(classLevels!='Ferns' | classLevels=='Ferns'& n>=20) %>% pull(family)
Now we plot the number of records by family, faceting the plots by group.
grid.arrange(Biodiversidata_Plants %>%
group_by(class, family) %>% count() %>%
mutate(classLevels=ifelse(class=='Magnoliopsida'|class=='Liliopsida', 'Angiosperms',
ifelse(class=='Gnetopsida'|class=='Pinopsida', 'Gymnosperms',
ifelse(class=='Polypodiopsida', 'Ferns',
'Lycophytes')))) %>%
filter(classLevels!='Angiosperms' & family %in% familiesToPlot) %>%
mutate(classLevels=factor(classLevels, levels=c('Lycophytes',
'Ferns',
'Gymnosperms',
'Angiosperms'))) %>%
ggbarplot(x = "family", y = "n",
fill = "classLevels",
color = "white",
drop= T,
palette = c("#D7191C", "#FDAE61", "#ABDDA4"),
sort.val = "desc",
x.text.angle = 90,
legend = "none",
ylab = 'Number of Records', xlab = '') +
scale_y_continuous(breaks=scales::breaks_extended(8))+
theme(text=element_text(family='Calibri', size = 10)) +
facet_wrap(~classLevels, nrow=2, ncol=3, scales = "free", drop=TRUE),
Biodiversidata_Plants %>%
group_by(class, family) %>% count() %>%
mutate(classLevels=ifelse(class=='Magnoliopsida'|class=='Liliopsida', 'Angiosperms',
ifelse(class=='Gnetopsida'|class=='Pinopsida', 'Gymnosperms',
ifelse(class=='Polypodiopsida', 'Ferns',
'Lycophytes')))) %>%
filter(classLevels=='Angiosperms' & family %in% familiesToPlot) %>%
mutate(classLevels=factor(classLevels, levels=c('Lycophytes',
'Ferns',
'Gymnosperms',
'Angiosperms'))) %>%
ggbarplot(x = "family", y = "n",
fill = "classLevels",
color = "white",
drop= T,
palette = "#2B83BA",
sort.val = "desc",
x.text.angle = 90,
legend = "none",
ylab = 'Number of Records', xlab = 'Family') +
scale_y_continuous(breaks=scales::breaks_extended(8))+
theme(text=element_text(family='Calibri', size = 10)) +
facet_wrap(~classLevels, nrow=2, ncol=3, scales = "free", drop=TRUE))
The occurrence records included in Biodiversidata's plants database cover samples reported in Uruguay during the period of 1877–2020. How are these records distributed over time within the groups?
Biodiversidata_Plants %>%
mutate(classLevels=ifelse(class=='Magnoliopsida'|class=='Liliopsida', 'Angiosperms',
ifelse(class=='Gnetopsida'|class=='Pinopsida', 'Gymnosperms',
ifelse(class=='Polypodiopsida', 'Ferns',
'Lycophytes')))) %>%
mutate(classLevels=factor(classLevels, levels=c('Lycophytes',
'Ferns',
'Gymnosperms',
'Angiosperms'))) %>%
filter(family %in% familiesToPlot) %>%
ggplot(aes(x=year, y=family, color=classLevels)) +
geom_point(show.legend = FALSE) +
scale_colour_brewer(palette ='Spectral')+
facet_grid(classLevels~., scales = "free", space= 'free_y', switch='x' ,drop=TRUE) +
theme_bw() +
theme(text=element_text(family='Calibri', size=10)) +
theme(strip.text.y = element_text(size=10, angle=0)) +
labs(x='Year', y='Family', color = '')
To finish let's map the records.
First, we need to download Uruguay's shapefile data with the level 1 for
administrative boundaries. We can use the package geouy
to download
this data or GADM.
Departamentos <- load_geouy('Departamentos') # We will use this one
Uruguay <- getData("GADM", country="UY", level=1)
Also, we will need a grid of 25x25km unit-size to plot sampling effort
(you can create one using the st_make_grid
function).
Grid_UY25 <- st_read('Grid_UY25.shp')
Finally, we are going to convert the plants data into an sf
object.
Biodiversidata_Plants_GIS <- Biodiversidata_Plants %>%
as.data.frame %>%
sf::st_as_sf(coords = c("decimalLongitude", "decimalLatitude")) %>%
st_set_crs(4326) %>% st_transform(32721) # this CRS and transformation is specifically for Uruguay
To measure sampling effort we perform a spatial filter using st_join
,
with the gridded map as a base to count the occurence records that are
contained within each grid-cell.
Grid_UY25_Plants <- st_join(x=Grid_UY25,
y=(Biodiversidata_Plants_GIS %>%
select(species)), # to simplify we just keep the species field
left=TRUE, join = st_contains) %>%
group_by(Id) %>%
summarise(samplingEffort=ifelse(n_distinct(species, na.rm = TRUE)==0, 0, n())) # records per grid
Now we map the occurrence records and the distribution of the sampling effort.
occurrenceRecords <- ggplot() +
geom_sf(data = Departamentos, color='black', fill='white') +
geom_sf(data = Biodiversidata_Plants_GIS, size=1, alpha=0.5) +
theme_bw() +
theme(text=element_text(size=10))
samplingEffort <- ggplot() +
geom_sf(data = Grid_UY25_Plants, aes(fill=samplingEffort)) +
scale_fill_binned(n.breaks=5, type = "viridis", show.limits = TRUE) +
geom_sf(data = Grid_UY25_Plants %>% filter(samplingEffort==0), aes(fill=N), fill='white', alpha=0.8)+
geom_sf(data = Departamentos, color='black', fill=NA) +
labs(fill = 'Sampling Effort') +
theme_bw() +
theme(text=element_text(size=10))
gridExtra::grid.arrange(occurrenceRecords, samplingEffort, nrow=1)
- For the Data Paper we used Inkscape and ArcGis 10.5 to improve this maps.
If you use the data please cite as: Florencia Grattarola, Andrés González, Patricia Mai, Laura Cappuccio, César Fagúndez-Pachón, Florencia Rossi, Franco Teixeira de Mello, Lucía Urtado & Daniel Pincheira-Donoso. (2020). Biodiversidata: A novel dataset for the vascular plant species diversity in Uruguay. Biodiversity Data Journal 8: e56850. https://doi.org/10.3897/BDJ.8.e56850