# sf package functions
methods(class = "sf")Introduction to spatial analysis in R
Introduction
Spatial epidemiology involves both describing and understanding geographical variation in disease with respect to demographic, environmental, behavioral, socioeconomic, genetic, and infectious risk factors using various methods [Elliott et al. (2001)](Elliott and Wartenberg 2004). Common techniques include proximity calculations, estimating summary measures across geographic areas, assessing clustering, applying spatial smoothing and interpolation techniques, and utilizing spatial regression for in-depth analysis.
Spatial Data
Spatial data, also known as geospatial data, can be broadly categorized into two types: vector and raster data.
Vector Data
Vector data describes the “geometry” or “shape” of objects and often includes additional variables. The three basic types of vector data are points, lines, and polygons (areas). Each point, line, and polygon has a spatial reference frame such as latitude and longitude.
A polygon refers to a set of closed polylines that represent a geographic location, composed of vertices (a series of x and y coordinates or spatial points).
Raster Data
Raster data is pixel-based, where each pixel has a value representing information. This format is particularly useful for representing data that varies continuously over space and is naturally suited to a grid format.
Check the packages required for this tutorial:
sp and sf packages
sp and sf are the two common spatail package in R and work in fundamentally different ways, specifically revolving around how they store spatial data within R.
sp uses something called S4 classes, where the different elements of spatial data (coordinates/geometry, attribute table, coordinate system, etc.) are split up into different ‘slots’ and accessed using the @ symbol. sfuses S3 classes and stores spatial data all within a standard data frame, with some extra header information and a geometry column.
sp and sf spatial package comparisonHow to convert one file format to the other format?
To sp object <- as(object, Class = "Spatial")
To sf object_sf = st_as_sf(object_sp, "sf")
A word of advice: be flexible in the usage of sf and sp. Sometimes it may be hard to explain why functions work for one data type and do not for the other. But since transformation is quite easy, time is better spend on analyzing your data than on wondering why operations do not work. For the purpose of this material we will focus of teaching you the package sf as it is intended to succeed and replace R packages sp, rgeos and the vector parts of rgdal packages. It also connects nicely to tidyverse learnt in previous modules. For this session we focus on the sf package.
Vector - Spatial data
Area spatial data structure
After the package is loaded, you can check out all of the methods that are available for sf data objects.
Import Data
Shapefiles for area data:
We will use the st_read() from sf package, which simply takes the path of the directory with the shapefile as argument. Where as readOGR() from the rgdal packages will take the path and file name readOGR("path","fileName") to a shapefile. In this instance we would load the part of the shapefile that ends with .shp. Setting quiet = T will keep R from being chatty when it imports the data.
Shepfile reads with the sp package has both sf and data.frame classes.
# class - data
class(eth_map)[1] "sf" "data.frame"
Shapefile Metadata & Attributes
You’ll notice that when you load the shapefile in, there will be a set of information explaining the features of the shapefile. The first sentence shows that you have loaded a ESRI shapefile, it contains 46 features (which are polygons in this case) and 5 columns of information stored as a data tabel. It mentions also there is a spatial extent (called bounding box) and the coordinate reference system (CRS).
What does this show us? Let’s break it down. In this case it is important to read in and check the metadata for the type of spatial information you have loaded. Metadata, describing the format, CRS, extent, and other components of the vector data, and the attributes which describe properties associated with each individual vector object.
Key metadata for all shapefiles include:
Object Type: the class of the imported object.
Coordinate Reference System (CRS): the projection of the data.
Extent: the spatial extent (geographic area that the shapefile covers) of the shapefile. Note that the spatial extent for a shapefile represents the extent for ALL spatial objects in the shapefile.
You can view shapefile metadata using the class, crs and extent methods:
# metadata
eth_mapThe str() function will also give us the meta data of the dataset including the class and dimensions:
# structure - column list with details
str(eth_map)Classes 'sf' and 'data.frame': 967 obs. of 19 variables:
$ Shape_Leng: num 1.951 2.034 0.631 1.576 0.227 ...
$ Shape_Area: num 0.17132 0.08034 0.01579 0.07866 0.00158 ...
$ district : chr "Aba-Korow" "Abaala" "Abaala town" "Ababo" ...
$ ADM3_PCODE: chr "ET050699" "ET020203" "ET020209" "ET041904" ...
$ ADM3_REF : chr NA NA NA NA ...
$ ADM3ALT1EN: chr NA "Aba Ala" "Aba Ala" NA ...
$ ADM3ALT2EN: chr NA NA NA NA ...
$ ADM2_EN : chr "Shabelle" "Kilbati /Zone2" "Kilbati /Zone2" "Horo Gudru Wellega" ...
$ ADM2_PCODE: chr "ET0506" "ET0202" "ET0202" "ET0419" ...
$ region : chr "Somali" "Afar" "Afar" "Oromia" ...
$ ADM1_PCODE: chr "ET05" "ET02" "ET02" "ET04" ...
$ ADM0_EN : chr "Ethiopia" "Ethiopia" "Ethiopia" "Ethiopia" ...
$ ADM0_PCODE: chr "ET" "ET" "ET" "ET" ...
$ date : Date, format: "2019-08-19" "2019-08-19" ...
$ validOn : Date, format: "2020-10-08" "2020-10-08" ...
$ validTo : Date, format: "-001-11-30" "-001-11-30" ...
$ X_coordina: num 42.7 39.9 39.8 37.5 42.1 ...
$ Y_coordina: num 6.48 13.37 13.36 9.81 9.31 ...
$ geometry :sfc_MULTIPOLYGON of length 967; first list element: List of 1
..$ :List of 1
.. ..$ : num [1:143, 1:2] 42.8 42.8 42.8 42.8 42.8 ...
..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
- attr(*, "sf_column")= chr "geometry"
- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
..- attr(*, "names")= chr [1:18] "Shape_Leng" "Shape_Area" "district" "ADM3_PCODE" ...
The class function, applied on an sf layer, returns both "sf" and "data.frame" class names:
# spatial attributes
class(eth_map)[1] "sf" "data.frame"
A layer (geometry+attributes) is represented by an sf object. The second printed value, data.frame, implies that sf is, in fact, an extension of the data.frame class, inheriting many of its methods. Indeed, we will see that many functions in R work exactly the same way on sf and data.frame objects.
The geometric part can be extracted with st_geometry, resulting in an object of class sfc, i.e., a geometry column
# get geometry data
st_geometry(eth_map)Geometry set for 967 features
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 32.9918 ymin: 3.40667 xmax: 47.98824 ymax: 14.84548
Geodetic CRS: WGS 84
First 5 geometries:
MULTIPOLYGON (((42.79826 6.19605, 42.79641 6.19...
MULTIPOLYGON (((39.92615 13.57223, 39.92848 13....
MULTIPOLYGON (((39.76732 13.28527, 39.76242 13....
MULTIPOLYGON (((37.39245 10.04636, 37.39828 10....
MULTIPOLYGON (((42.13371 9.322936, 42.13386 9.3...
Conversely, If we want just the non-geometric part (the “attributes”), it can be extracted with st_drop_geometry which returns a data.frame:
# Extract nom-geometric part
map_nogeom <- st_drop_geometry(eth_map)
# top five rows
head(map_nogeom) Shape_Leng Shape_Area district ADM3_PCODE ADM3_REF ADM3ALT1EN ADM3ALT2EN
1 1.9505322 0.171322798 Aba-Korow ET050699 <NA> <NA> <NA>
2 2.0341566 0.080342819 Abaala ET020203 <NA> Aba Ala <NA>
3 0.6310232 0.015789541 Abaala town ET020209 <NA> Aba Ala <NA>
4 1.5755748 0.078658877 Ababo ET041904 <NA> <NA> <NA>
5 0.2274391 0.001580407 Abadir ET130109 <NA> <NA> <NA>
6 1.5390176 0.069847562 Abay Chomen ET041905 <NA> <NA> <NA>
ADM2_EN ADM2_PCODE region ADM1_PCODE ADM0_EN ADM0_PCODE
1 Shabelle ET0506 Somali ET05 Ethiopia ET
2 Kilbati /Zone2 ET0202 Afar ET02 Ethiopia ET
3 Kilbati /Zone2 ET0202 Afar ET02 Ethiopia ET
4 Horo Gudru Wellega ET0419 Oromia ET04 Ethiopia ET
5 Harari ET1301 Harari ET13 Ethiopia ET
6 Horo Gudru Wellega ET0419 Oromia ET04 Ethiopia ET
date validOn validTo X_coordina Y_coordina
1 2019-08-19 2020-10-08 -001-11-30 42.6983 6.47533
2 2019-08-19 2020-10-08 -001-11-30 39.8761 13.36890
3 2019-08-19 2020-10-08 -001-11-30 39.7914 13.35980
4 2019-08-19 2020-10-08 -001-11-30 37.5210 9.80858
5 2019-08-19 2020-10-08 -001-11-30 42.1227 9.31044
6 2019-08-19 2020-10-08 -001-11-30 37.3524 9.72482
Spatial points
Another key type of spatial data is data linked to coordinates, frequently found in a table format. We can read these data in using the read_csv() function, then we may want to convert these into spatial information using the st_as_sf() function. Here we have a csv file containing the coordinates for the locations of health facilities in our routine data. Note that we set the projection for the spatial object using the crs command; crs 4326 is the standard WGS 84 CRS.
# Read the `csv` file with geometry information
hf_gps <- read_csv("C:/Users/User/OneDrive/from_unsw_drive/file_YG/TKI/MAP/training/HF_list.csv") %>%
clean_names() %>%
drop_na() # drop naRows: 1045036 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (7): ZONE, Woreda, HF name, Facility Type, Ownershio, Kebele_Locality, R...
dbl (2): X, Y
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Check the class of the point spatial data. The dataset read as data.frame th output captured the geographic information: longitude and latitude.
# check the data class
class(hf_gps)[1] "tbl_df" "tbl" "data.frame"
The st_as_sf()` function from the sf package will convert into point spatial data format.
# Make the gps data into a point sample feature and define it as shapefile `eth_hf_points`
eth_hf_points <- st_as_sf(hf_gps,
coords = c("x", "y"),
crs = st_crs(4326)
)The map below plotted the each XY geometry health facility points.
# map the geometry
plot(eth_hf_points$geometry)Coordinate Reference Systems (CRS)
A Coordinate Reference System (CRS) defines how the coordinates in our geometries relate to the surface of the Earth. There are two main types of CRS:
Geographic—longitude and latitude, in degrees
Projected—implying flat surface, usually in units of true distance (e.g., meters)
We can then change the projection using the command st_transform(). We can also combine the two commands to change the projection of one data type to be that of the other. The CRS of a given sf layer can be obtained using function st_crs. The CRS information is returned in the WKT format:
# coordinate system
st_crs(eth_map)Coordinate Reference System:
User input: WGS 84
wkt:
GEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["latitude",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["longitude",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]]
A vector layer reprojection
Reprojection is the transformation of geometry coordinates, from one CRS to another. It is an important part of spatial analysis workflow, since we often need to:
Transform several layers into the same projection
Switch between geographic and projected CRS
A vector layer can be reprojected with st_transform. The st_transform function has two important parameters:
x—The layer to be reprojectedcrs—The target CRS
The crs can be specified in one of four ways:
An EPSG code (e.g.,
4326)A PROJ4 string (e.g.,
"+proj=longlat +datum=WGS84 +no_defs")A WKT string
A
crsobject of another layer, as returned byst_crs
For example, the following expression reprojects the nafot layer to the geographic UTM CRS, using its EPSG code (20138). In the metadata, the PROJCRS["Adindan / UTM zone 38N" showed us the projection is UTM zone 38N
# Change the projection to UTM zone 38N
eth_utm <- st_transform(eth_map, 20138)
st_crs(eth_utm)Coordinate Reference System:
User input: EPSG:20138
wkt:
PROJCRS["Adindan / UTM zone 38N",
BASEGEOGCRS["Adindan",
DATUM["Adindan",
ELLIPSOID["Clarke 1880 (RGS)",6378249.145,293.465,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4201]],
CONVERSION["UTM zone 38N",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",0,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",45,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",0.9996,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",500000,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",0,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Engineering survey, topographic mapping."],
AREA["Ethiopia - east of 42°E."],
BBOX[4.11,42,12.85,47.99]],
ID["EPSG",20138]]
To change the projection of the eth_utm shapefile to WGS CRS:
# to WGS projection
eth_map2 <- st_transform(eth_utm, st_crs(eth_map), crs = 4326)Geometric calculations
We will see geometric operations on vector layers:
Numeric values —Functions that summarize geometrical properties of:
A single layer—e.g., area, length
A pair of layers—e.g., distance
Spatial layers —Functions that create a new layer based on:
A single layer—e.g., centroid, buffer
A pair of layers—e.g., intersection area
Numeric
There are several functions to calculate numeric geometric properties of vector layers in package sf:
st_length—Lengthst_area—Areast_distance—Distancest_bbox—Bounding boxst_dimension—Dimensions (0 = points, 1 = lines, 2 = polygons)
For example, we can calculate the area of each feature in the Ethiopia layer using st_area. The result is assigned to a new attribute named area in eth_map:
# calculate the area of each district
eth_map$area <- st_area(eth_map)Summary of the area in \(m^2\)
# summary in m^2
summary(eth_map$area) Min. 1st Qu. Median Mean 3rd Qu. Max.
5.809e+05 2.672e+08 6.652e+08 1.170e+09 1.337e+09 1.254e+10
Plot the area in \(m^2\).
# plot `m^2`
plot(eth_map["area"], main = "Area of the Districts (m^2)")You may notice the area values are printed with their units [m^2]. The resulting area values, just like any other numeric calculations in sf, are returned as object of class units:
class(eth_map$area)[1] "units"
A units object is basically a numeric vector associated with units of measurement. We can transform units measurements to different units, with function set_units from package units. First, we load the units library:
library(units)Warning: package 'units' was built under R version 4.3.3
udunits database from C:/Users/User/AppData/Local/R/win-library/4.3/units/share/udunits/udunits2.xml
## udunits system database from /usr/share/xml/udunitsThen, we make the transformation to \(km^2\) units :
# km^2
eth_map$area <- set_units(eth_map$area, "km^2")Lets check the summary of the transformed area in \(Km^2\). In the below summary, we’ve added across(geometry, st_union) to ensure proper summarization of the spatial data. The st_union function will be applied to the geometry column.
# summary with chain
eth_map %>%
summarise(
mean_area = mean(area),
median_area = median(area),
sd_area = sd(area),
range_area = range(area),
across(geometry, st_union)
)Plot the calculated area for Ethiopia using the plot() function
# plot `km^2`
plot(eth_map["area"], , main = "Area of the Districts (km^2)")Spatial Geometry Function
When working with spatial data in R, the sf package is a powerful tool that allows you to manipulate and analyze geometric objects. In the below functions, we’ll explore some common geometry-generating functions provided by sf. These functions are essential for anyone getting started with spatial data analysis.
- Centroids (
st_centroid)
The st_centroid function calculates the centroid (geometric center) of a polygon. Let’s use an example where we want to find the centroids of polygons representing the “Amhara” region in Ethiopia.
# Filter Amhara
amhara_shp <- eth_map |>
filter(region == "Amhara")Now, let’s calculate the centroids: to calculate the centroids of the “Amhara” polygons we can use the st_centroid function as follows:
# Calculate centroids
amhara_ctr <- st_centroid(amhara_shp)Warning: st_centroid assumes attributes are constant over geometries
# Filled polygons with points
plot(st_geometry(amhara_ctr),
col = "red",
pch = 3
)- Buffering
(st_buffer)
The st_buffer function creates a buffer around a spatial object. Buffers are useful for visualizing areas of influence or proximity. For instance, you can create a buffer around health facilities to represent their service areas.
# Filter hospitals
hospital <- eth_hf_points %>%
filter(facility_type == "Hospital")
# Create a buffer around health facilities (e.g., 2.5 km radius)
buffer_radius_km <- 2.5
health_facility_buffer <- st_buffer(hospital, dist = buffer_radius_km * 1000) # Convert km to meters
# Plot the health facility buffers first (as polygons need a base plot)
plot(st_geometry(health_facility_buffer),
col = "red", lwd = 2, main = "A Buffer arround Health Facilities"
)
# Convert hospital geometries to points if they aren't already
hospital_points <- st_cast(hospital, "POINT")
# Plot the original health facilities on top of the buffers
plot(st_geometry(hospital_points), pch = 16, col = "blue", add = TRUE)- Random sample points (
st_sample)
To generate random sample points within a polygon, use the st_sample function. This is handy for spatial sampling in epidemiology or environmental studies. The sampled points are overlaid on the original geometries.
# Randomly sample 10 points within hospital polygons
sample_points <- st_sample(hospital, size = 10)
# Plot the original hospital locations
plot(hospital$geometry, pch = 16, col = "blue", main = "Sample points of Health Facility Locations")
# Overlay the sampled points
plot(sample_points, add = TRUE, pch = 2, col = "red")Joining non-spatial data to shapefiles
In spatial analysis, it’s common to join non-spatial data (e.g., tabular data) with shapefiles. This process allows you to create maps and perform spatial analyses by combining attribute data with geographic information. In this tutorial, we’ll walk through how to join non-spatial data to shapefile. We’ll also cover some important considerations to ensure your data is properly aligned.
# Load and filter the data for the Amhara region
u5_data <- read_csv("C:/Users/User/OneDrive/from_unsw_drive/training/R_training/Tutorial_R/data_excercise.csv") %>%
filter(region == "Amhara")Rows: 967 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): region, district, age_group
dbl (3): conf, pop, Incid_per_1000
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Note: Always make sure the data is loaded and filtered correctly before proceeding. Here, we’re focusing on a specific region, but you can adjust the filter condition as needed.
Before joining data, it’s crucial to ensure that the names in your non-spatial data match those in your shapefile. If they don’t match exactly, the join won’t work correctly.
Check names in the Shapefile:
# Check if the district name is correctly spelled in the shapefile
eth_map %>%
filter(district == "Hawi Gudina\n")Simple feature collection with 1 feature and 19 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 40.35742 ymin: 7.863617 xmax: 41.15041 ymax: 8.40077
Geodetic CRS: WGS 84
Shape_Leng Shape_Area district ADM3_PCODE ADM3_REF ADM3ALT1EN ADM3ALT2EN
1 2.771471 0.2528438 Hawi Gudina\n ET040915 <NA> <NA> <NA>
ADM2_EN ADM2_PCODE region ADM1_PCODE ADM0_EN ADM0_PCODE date
1 West Hararge ET0409 Oromia ET04 Ethiopia ET 2019-08-19
validOn validTo X_coordina Y_coordina geometry
1 2020-10-08 -001-11-30 40.7048 8.12856 MULTIPOLYGON (((40.74895 8....
area
1 3094.835 [km^2]
# Recode the district name to remove any newline characters
eth_map$district <- recode(eth_map$district,
"Hawi Gudina\n" = "Hawi Gudina"
)
# Verify the correction
eth_map %>%
filter(district == "Hawi Gudina")Simple feature collection with 1 feature and 19 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 40.35742 ymin: 7.863617 xmax: 41.15041 ymax: 8.40077
Geodetic CRS: WGS 84
Shape_Leng Shape_Area district ADM3_PCODE ADM3_REF ADM3ALT1EN ADM3ALT2EN
1 2.771471 0.2528438 Hawi Gudina ET040915 <NA> <NA> <NA>
ADM2_EN ADM2_PCODE region ADM1_PCODE ADM0_EN ADM0_PCODE date
1 West Hararge ET0409 Oromia ET04 Ethiopia ET 2019-08-19
validOn validTo X_coordina Y_coordina geometry
1 2020-10-08 -001-11-30 40.7048 8.12856 MULTIPOLYGON (((40.74895 8....
area
1 3094.835 [km^2]
# Note: Always check that the names are correctly updated in your shapefile.Check names in your data. Please note that ensuring consistent naming between datasets is essential for a successful join
# Verify the district name in the non-spatial data
u5_data %>%
filter(district == "Hawi Gudina")# A tibble: 0 × 6
# ℹ 6 variables: region <chr>, district <chr>, age_group <chr>, conf <dbl>,
# pop <dbl>, Incid_per_1000 <dbl>
Join Non-Spatial Data to the Shapefile
Once you’ve confirmed that the names match, you can proceed to join your non-spatial data with the shapefile.
# Perform the join operation
amhara_data_sf <- amhara_shp %>%
left_join(u5_data, by = "district")
# Note: The `left_join` function keeps all rows from the shapefile and joins matching rows from the data.
# If a match isn't found, the resulting shapefile will still include the original geographic information.After the join, ensure your data is still in the sf (simple features) format, which is essential for spatial operations.
# Convert the joined data to an sf object
amhara_data_sf <- st_as_sf(amhara_data_sf)
# Note: Converting to `sf` ensures that the data retains its spatial properties.Writing shapefiles out
After you’ve completed your analysis or modifications, you may want to save your shapefile for future use or sharing. Here’s how you can write it back to a shapefile format:
# Write the shapefile to disk
st_write(
amhara_data_sf,
"C:/Users/User/Documents/R_training/Tutorial_R/R_Tutorial/scripts/tutorials/2024-08-25-Spatial-analysis-in-R/data/amhara_data_with_shapefile.shp"
)
# Note: Replace "path_to_save" with the directory where you want to save the file.
# The .shp extension indicates it's a shapefile.Introduction to Raster data
In this section, we’ll explore how to work with raster data using the terra package, which has replaced the older raster package. We’ll cover basic operations like loading and visualizing raster data, manipulating rasters, and performing spatial analysis.
Loading raster data
The terra package in R introduces a new class of raster data called SpatRaster. Let’s start by loading the terra package and importing Ethiopian 2016 population raster data.
The rast() function is used to load the raster data, and simply calling the object name (in this case, population) will display the raster properties like extent, resolution, and CRS (Coordinate Reference System).
# Load the population raster data
population <- rast("C:/Users/User/Documents/R_training/Tutorial_R/R_Tutorial/scripts/tutorials/2024-08-25-Spatial-analysis-in-R/data/eth_ppp_2016.tif")
# View the raster properties
populationclass : SpatRaster
dimensions : 13893, 17983, 1 (nrow, ncol, nlyr)
resolution : 0.0008333333, 0.0008333333 (x, y)
extent : 32.99958, 47.98542, 3.322084, 14.89958 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326)
source : eth_ppp_2016.tif
name : eth_ppp_2016
min value : 2.938226e-03
max value : 4.783238e+02
Visualizing the Raster
Visualizing raster data helps you quickly understand its spatial distribution. We can start by plotting the raster using the plot() function.
This simple plot provides a visual overview of the raster data. You can use this for quick inspections before more detailed analysis.
# Plot the raster data
plot(population, main = "Raster level populations map")Plotting, reprojecting and manipulating Rasters in R
Plotting with ggplot2
Before plotting, lets explore the distributions of the raster data with the log scaling.
# plotting the log of the population due to the scale
hist(log(population))
|---------|---------|---------|---------|
=========================================
Warning: [hist] a sample of 0% of the cells was used (of which 47% was NA)
Before plotting, first we need to convert the raster to a data frame. The geom_raster() function is used to plot raster data, and scale_fill_viridis_c() is used to apply a color scale that is perceptually uniform. To make it manageable, first, we need to reduce the resolutions of our data
# resolutions of our data
population_reduced <- aggregate(population, fact = 10, fun = mean)
|---------|---------|---------|---------|
=========================================
# Convert the raster to a data frame
pop_df <- as.data.frame(population_reduced, xy = TRUE)
# Plot the raster using ggplot2
ggplot() +
geom_tile(data = pop_df, aes(x = x, y = y, fill = eth_ppp_2016)) +
scale_fill_viridis_c() +
labs(fill = "Population") +
theme_bw()Re-projecting rasters
Sometimes we need to change the projection of our raster data to align it with other spatial data. The crs() function checks the current projection of the raster. By assigning a new CRS, you can reproject the raster to a different coordinate system, which is crucial for spatial analysis when working with multiple datasets.
# Check the current CRS of the raster
crs(population)
# Reproject the raster to UTM zone 48
# crs(population) <- "+proj=utm +zone=48 +datum=WGS84"Manipulating rasters
We can modify rasters by cropping them to a specific extent or masking out areas outside a certain region. ext() retrieves the spatial extent of the raster. crop() allows you to focus on a specific area by cutting out the rest of the data. mask() sets the raster cells outside a given region (like ‘Amhara’) to NA, effectively hiding them.
# Get the extent of the raster
ext(population)SpatExtent : 32.999583216, 47.985416489, 3.322083521, 14.899583475 (xmin, xmax, ymin, ymax)
# Assume we have a spatial object 'Amhara' representing a region in Ethiopia
Amhara <- filter(eth_map, region == "Amhara")
# Crop the raster to the Amhara region
am_pop <- crop(population, Amhara)
ext(am_pop)SpatExtent : 35.2612498735611, 40.2120831869623, 8.71458349957439, 13.7629168128496 (xmin, xmax, ymin, ymax)
# Mask the raster to keep only the Amhara region
am_pop2 <- terra::mask(am_pop, Amhara)
# plot
plot(am_pop2)Extracting raster data
To extract the population values for each admin 3 district of Ethiopia. For this we provide exact_extract() function in exactextractr package with the raster and the shapefile. The output is then the population value for each cell of the raster, linked to each polygon ID of the shapefile. We can the perform summary statistic on this as we would a standard data frame.
terra::extract(am_pop2, amhara_ctr)
am_population <- exactextractr::exact_extract(am_pop, amhara_shp, "sum", append_cols = "district")
# write.csv(file = "C:/Users/user/Documents/Data_management_in_R/pop.csv", row.names = FALSE) # to save the population
head(am_population) # to six districts with the 2016 population sizeIf the coordinate reference systems (CRS) of the population raster (am_pop2 or am_pop) and the shapefile (amhara_ctr or amhara_shp) do not match, the extraction might fail to correctly overlay the raster and the shapefile, resulting in incorrect or zero values.
If you need to check the projection:
terra::crs(am_pop2) # population raster data
terra::crs(amhara_ctr) # shapefile
The above script result showed us the CRS of the raster data (am_pop2) and shapefile (amhara_ctr) are similar. To ensure proper alignment between the raster and the shapefile for accurate extraction, we need to reproject one of them so that both layers share the same CRS.
Spatial Epidemiology
In epidemiology, it is concerned with the description, examination and spatial interaction of disease and focuses specifically on the spatial components to understand and explain why health and disease vary by place.
This blog provides how to approach spatial problems in epidemiology, with a specific focus on public health applications.
The framework illustrated in below outlines a logical, sequential process for conducting spatial analyses. However, it should be recognized that this is not a strictly linear process, as presenting the results from exploration and modeling often necessitates a return to visualization.
Making chloropleth maps in R
Plotting spatial data using base R
Visualizing the geographical distribution and patterns of a disease is a critical step in epidemiological research. In R, you can create maps using only a few essential packages like rgdal and sf. While simply calling the plot function on a shapefile object will produce a map, it tends to display all attributes in a somewhat unappealing and cluttered manner.
To create a more focused and meaningful map, you can plot a single attribute of interest by subsetting the sf object. This can be done using the standard R subsetting syntax: sf_object[["name_of_attribute"]]. For example, you can quickly visualize the distribution of a particular disease or condition across administrative regions in Ethiopia by selecting the relevant attribute and plotting it by administrative units.
Customizing the Legend
By default, R will automatically place the legend where it “thinks” it fits best on the plot. However, you can customize the legend’s position and size to enhance the map’s readability.
Positioning the Legend: Use the
key.posargument to specify the location of the legend:key.pos = 1places it below the map.key.pos = 2places it to the left.key.pos = 3places it above the map.key.pos = 4places it to the right.
# to lower legend
plot(amhara_data_sf["conf"],
main = "Confirmed Malaria cases in Amhara Region",
key.pos = 1
)Mapping using ggplot2:
Adding features to maps
Recall from earlier the typical ggplot call: ggplot(data, aesthetic). When making chloropleth maps, our data will typically be an sf object, with a column corresponding to the variable you wish to map, and a geometry column describing the geography of the data.
Given an sf object, you can use geom_sf to produce a simple plot of your shapefile without even explicitly defining any aesthetics.
# base
ggplot(amhara_shp) +
geom_sf() + # as we've called ggplot using an sf object as the data argument, geom_sf can figure it out itself
coord_sf() # deals with map projection- viridis package - pretty color map with color scale using the
scale_fill_viridis()function
# pretty color map with color scale
library(viridis)Warning: package 'viridis' was built under R version 4.3.3
Loading required package: viridisLite
Warning: package 'viridisLite' was built under R version 4.3.3
ggplot(
data = amhara_data_sf,
aes(fill = conf)
) +
geom_sf() +
scale_fill_viridis() +
theme_void() +
labs(title = "Confirmed malaria in Amhara Region, Ethiopia")- leaflet package - an open-source JavaScript library for interactive maps to make easy to create Leaflet maps from R. leaflet will also allow you to zoom in and out from the world map view.
library(leaflet)Warning: package 'leaflet' was built under R version 4.3.3
# create an interactive map
pal <- colorNumeric("YlOrRd", domain = amhara_data_sf$conf)
leaflet(amhara_data_sf) %>%
addTiles() %>%
addPolygons(
color = "white", fillColor = ~ pal(conf),
fillOpacity = 1
) %>%
addLegend("bottomright", pal = pal, values = ~conf, opacity = 1, title = "Confirmed malaria")- Mapview pacakge - allows to very quickly create interactive visualizations to investigate both the spatial geometries and the variables in the data
library(mapview)
mapview(amhara_data_sf,
zcol = "conf",
legend = "TRUE",
layer.name = "Confirmed malaria"
) # edit legend nameWe can use the sync() function to produce a lattice-like view of multiple synchronized maps created with mapview or leaflet.
Create maps of confirmed cases and incidence with synchronized zoom and pan by first creating maps of the variables conf and Incid_per_1000with mapview(), and then passing those maps as arguments of the sync() function
library(leafsync)Warning: package 'leafsync' was built under R version 4.3.3
# plot the confirmed case map
mcf <- mapview(amhara_data_sf, zcol = "conf", legend = "TRUE", layer.name = "Confirmed malaria")
# plot the incidence map
minc <- mapview(amhara_data_sf, zcol = "Incid_per_1000", legend = "TRUE", layer.name = "Incidence of malaria/1000 population")
# sync both map using the `leafsync` package
m <- sync(mcf, minc)
m