# sf package functions
methods(class = "sf")
Introduction to spatial analysis in R
Introduction
Spatial epidemiology involves both describing and understanding geographical variation in disease with respect to demographic, environmental, behavioral, socioeconomic, genetic, and infectious risk factors using various methods [Elliott et al. (2001)](Elliott and Wartenberg 2004). Common techniques include proximity calculations, estimating summary measures across geographic areas, assessing clustering, applying spatial smoothing and interpolation techniques, and utilizing spatial regression for in-depth analysis.
Spatial Data
Spatial data, also known as geospatial data, can be broadly categorized into two types: vector and raster data.
Vector Data
Vector data describes the “geometry” or “shape” of objects and often includes additional variables. The three basic types of vector data are points, lines, and polygons (areas). Each point, line, and polygon has a spatial reference frame such as latitude and longitude.
A polygon refers to a set of closed polylines that represent a geographic location, composed of vertices (a series of x
and y
coordinates or spatial points).
Raster Data
Raster data is pixel-based, where each pixel has a value representing information. This format is particularly useful for representing data that varies continuously over space and is naturally suited to a grid format.
Check the packages required for this tutorial:
sp
and sf
packages
sp and sf are the two common spatail package in R and work in fundamentally different ways, specifically revolving around how they store spatial data within R.
sp
uses something called S4 classes, where the different elements of spatial data (coordinates/geometry, attribute table, coordinate system, etc.) are split up into different ‘slots’ and accessed using the @ symbol. sf
uses S3 classes and stores spatial data all within a standard data frame, with some extra header information and a geometry column.
sp
and sf
spatial package comparisonHow to convert one file format to the other format?
To sp object <- as(object, Class = "Spatial")
To sf object_sf = st_as_sf(object_sp, "sf")
A word of advice: be flexible in the usage of sf
and sp.
Sometimes it may be hard to explain why functions work for one data type and do not for the other. But since transformation is quite easy, time is better spend on analyzing your data than on wondering why operations do not work. For the purpose of this material we will focus of teaching you the package sf
as it is intended to succeed and replace R packages sp
, rgeos
and the vector parts of rgdal
packages. It also connects nicely to tidyverse learnt in previous modules. For this session we focus on the sf
package.
Vector - Spatial data
Area spatial data structure
After the package is loaded, you can check out all of the methods that are available for sf
data objects.
Import Data
Shapefiles for area data:
We will use the st_read()
from sf
package, which simply takes the path of the directory with the shapefile as argument. Where as readOGR()
from the rgdal
packages will take the path and file name readOGR("path","fileName")
to a shapefile. In this instance we would load the part of the shapefile that ends with .shp. Setting quiet = T
will keep R from being chatty when it imports the data.
Shepfile reads with the sp package has both sf
and data.frame
classes.
# class - data
class(eth_map)
[1] "sf" "data.frame"
Shapefile Metadata & Attributes
You’ll notice that when you load the shapefile in, there will be a set of information explaining the features of the shapefile. The first sentence shows that you have loaded a ESRI shapefile, it contains 46 features (which are polygons in this case) and 5 columns of information stored as a data tabel. It mentions also there is a spatial extent (called bounding box) and the coordinate reference system (CRS).
What does this show us? Let’s break it down. In this case it is important to read in and check the metadata for the type of spatial information you have loaded. Metadata, describing the format, CRS
, extent
, and other components of the vector data, and the attributes which describe properties associated with each individual vector object.
Key metadata for all shapefiles include:
Object Type: the class of the imported object.
Coordinate Reference System (CRS): the projection of the data.
Extent: the spatial extent (geographic area that the shapefile covers) of the shapefile. Note that the spatial extent for a shapefile represents the extent for ALL spatial objects in the shapefile.
You can view shapefile metadata using the class
, crs
and extent
methods:
# metadata
eth_map
The str()
function will also give us the meta data of the dataset including the class and dimensions:
# structure - column list with details
str(eth_map)
Classes 'sf' and 'data.frame': 967 obs. of 19 variables:
$ Shape_Leng: num 1.951 2.034 0.631 1.576 0.227 ...
$ Shape_Area: num 0.17132 0.08034 0.01579 0.07866 0.00158 ...
$ district : chr "Aba-Korow" "Abaala" "Abaala town" "Ababo" ...
$ ADM3_PCODE: chr "ET050699" "ET020203" "ET020209" "ET041904" ...
$ ADM3_REF : chr NA NA NA NA ...
$ ADM3ALT1EN: chr NA "Aba Ala" "Aba Ala" NA ...
$ ADM3ALT2EN: chr NA NA NA NA ...
$ ADM2_EN : chr "Shabelle" "Kilbati /Zone2" "Kilbati /Zone2" "Horo Gudru Wellega" ...
$ ADM2_PCODE: chr "ET0506" "ET0202" "ET0202" "ET0419" ...
$ region : chr "Somali" "Afar" "Afar" "Oromia" ...
$ ADM1_PCODE: chr "ET05" "ET02" "ET02" "ET04" ...
$ ADM0_EN : chr "Ethiopia" "Ethiopia" "Ethiopia" "Ethiopia" ...
$ ADM0_PCODE: chr "ET" "ET" "ET" "ET" ...
$ date : Date, format: "2019-08-19" "2019-08-19" ...
$ validOn : Date, format: "2020-10-08" "2020-10-08" ...
$ validTo : Date, format: "-001-11-30" "-001-11-30" ...
$ X_coordina: num 42.7 39.9 39.8 37.5 42.1 ...
$ Y_coordina: num 6.48 13.37 13.36 9.81 9.31 ...
$ geometry :sfc_MULTIPOLYGON of length 967; first list element: List of 1
..$ :List of 1
.. ..$ : num [1:143, 1:2] 42.8 42.8 42.8 42.8 42.8 ...
..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
- attr(*, "sf_column")= chr "geometry"
- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
..- attr(*, "names")= chr [1:18] "Shape_Leng" "Shape_Area" "district" "ADM3_PCODE" ...
The class
function, applied on an sf
layer, returns both "sf"
and "data.frame"
class names:
# spatial attributes
class(eth_map)
[1] "sf" "data.frame"
A layer (geometry+attributes) is represented by an sf
object. The second printed value, data.frame
, implies that sf
is, in fact, an extension of the data.frame
class, inheriting many of its methods. Indeed, we will see that many functions in R work exactly the same way on sf
and data.frame
objects.
The geometric part can be extracted with st_geometry
, resulting in an object of class sfc
, i.e., a geometry column
# get geometry data
st_geometry(eth_map)
Geometry set for 967 features
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 32.9918 ymin: 3.40667 xmax: 47.98824 ymax: 14.84548
Geodetic CRS: WGS 84
First 5 geometries:
MULTIPOLYGON (((42.79826 6.19605, 42.79641 6.19...
MULTIPOLYGON (((39.92615 13.57223, 39.92848 13....
MULTIPOLYGON (((39.76732 13.28527, 39.76242 13....
MULTIPOLYGON (((37.39245 10.04636, 37.39828 10....
MULTIPOLYGON (((42.13371 9.322936, 42.13386 9.3...
Conversely, If we want just the non-geometric part (the “attributes”), it can be extracted with st_drop_geometry
which returns a data.frame
:
# Extract nom-geometric part
<- st_drop_geometry(eth_map)
map_nogeom
# top five rows
head(map_nogeom)
Shape_Leng Shape_Area district ADM3_PCODE ADM3_REF ADM3ALT1EN ADM3ALT2EN
1 1.9505322 0.171322798 Aba-Korow ET050699 <NA> <NA> <NA>
2 2.0341566 0.080342819 Abaala ET020203 <NA> Aba Ala <NA>
3 0.6310232 0.015789541 Abaala town ET020209 <NA> Aba Ala <NA>
4 1.5755748 0.078658877 Ababo ET041904 <NA> <NA> <NA>
5 0.2274391 0.001580407 Abadir ET130109 <NA> <NA> <NA>
6 1.5390176 0.069847562 Abay Chomen ET041905 <NA> <NA> <NA>
ADM2_EN ADM2_PCODE region ADM1_PCODE ADM0_EN ADM0_PCODE
1 Shabelle ET0506 Somali ET05 Ethiopia ET
2 Kilbati /Zone2 ET0202 Afar ET02 Ethiopia ET
3 Kilbati /Zone2 ET0202 Afar ET02 Ethiopia ET
4 Horo Gudru Wellega ET0419 Oromia ET04 Ethiopia ET
5 Harari ET1301 Harari ET13 Ethiopia ET
6 Horo Gudru Wellega ET0419 Oromia ET04 Ethiopia ET
date validOn validTo X_coordina Y_coordina
1 2019-08-19 2020-10-08 -001-11-30 42.6983 6.47533
2 2019-08-19 2020-10-08 -001-11-30 39.8761 13.36890
3 2019-08-19 2020-10-08 -001-11-30 39.7914 13.35980
4 2019-08-19 2020-10-08 -001-11-30 37.5210 9.80858
5 2019-08-19 2020-10-08 -001-11-30 42.1227 9.31044
6 2019-08-19 2020-10-08 -001-11-30 37.3524 9.72482
Spatial points
Another key type of spatial data is data linked to coordinates, frequently found in a table format. We can read these data in using the read_csv()
function, then we may want to convert these into spatial information using the st_as_sf()
function. Here we have a csv file containing the coordinates for the locations of health facilities in our routine data. Note that we set the projection for the spatial object using the crs
command; crs 4326 is the standard WGS 84 CRS.
# Read the `csv` file with geometry information
<- read_csv("C:/Users/User/OneDrive/from_unsw_drive/file_YG/TKI/MAP/training/HF_list.csv") %>%
hf_gps clean_names() %>%
drop_na() # drop na
Rows: 1045036 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (7): ZONE, Woreda, HF name, Facility Type, Ownershio, Kebele_Locality, R...
dbl (2): X, Y
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Check the class of the point spatial data. The dataset read as data.frame th output captured the geographic information: longitude and latitude.
# check the data class
class(hf_gps)
[1] "tbl_df" "tbl" "data.frame"
The st_as_sf()
` function from the sf package will convert into point spatial data format.
# Make the gps data into a point sample feature and define it as shapefile `eth_hf_points`
<- st_as_sf(hf_gps,
eth_hf_points coords = c("x", "y"),
crs = st_crs(4326)
)
The map below plotted the each XY geometry health facility points.
# map the geometry
plot(eth_hf_points$geometry)
Coordinate Reference Systems (CRS)
A Coordinate Reference System (CRS) defines how the coordinates in our geometries relate to the surface of the Earth. There are two main types of CRS:
Geographic—longitude and latitude, in degrees
Projected—implying flat surface, usually in units of true distance (e.g., meters)
We can then change the projection using the command st_transform()
. We can also combine the two commands to change the projection of one data type to be that of the other. The CRS of a given sf
layer can be obtained using function st_crs
. The CRS information is returned in the WKT format:
# coordinate system
st_crs(eth_map)
Coordinate Reference System:
User input: WGS 84
wkt:
GEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["latitude",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["longitude",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]]
A vector layer reprojection
Reprojection is the transformation of geometry coordinates, from one CRS to another. It is an important part of spatial analysis workflow, since we often need to:
Transform several layers into the same projection
Switch between geographic and projected CRS
A vector layer can be reprojected with st_transform
. The st_transform
function has two important parameters:
x
—The layer to be reprojectedcrs
—The target CRS
The crs
can be specified in one of four ways:
An EPSG code (e.g.,
4326
)A PROJ4 string (e.g.,
"+proj=longlat +datum=WGS84 +no_defs"
)A WKT string
A
crs
object of another layer, as returned byst_crs
For example, the following expression reprojects the nafot
layer to the geographic UTM CRS, using its EPSG code (20138
). In the metadata, the PROJCRS["Adindan / UTM zone 38N"
showed us the projection is UTM zone 38N
# Change the projection to UTM zone 38N
<- st_transform(eth_map, 20138)
eth_utm st_crs(eth_utm)
Coordinate Reference System:
User input: EPSG:20138
wkt:
PROJCRS["Adindan / UTM zone 38N",
BASEGEOGCRS["Adindan",
DATUM["Adindan",
ELLIPSOID["Clarke 1880 (RGS)",6378249.145,293.465,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4201]],
CONVERSION["UTM zone 38N",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",0,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",45,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",0.9996,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",500000,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",0,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Engineering survey, topographic mapping."],
AREA["Ethiopia - east of 42°E."],
BBOX[4.11,42,12.85,47.99]],
ID["EPSG",20138]]
To change the projection of the eth_utm shapefile
to WGS CRS:
# to WGS projection
<- st_transform(eth_utm, st_crs(eth_map), crs = 4326) eth_map2
Geometric calculations
We will see geometric operations on vector layers:
Numeric values —Functions that summarize geometrical properties of:
A single layer—e.g., area, length
A pair of layers—e.g., distance
Spatial layers —Functions that create a new layer based on:
A single layer—e.g., centroid, buffer
A pair of layers—e.g., intersection area
Numeric
There are several functions to calculate numeric geometric properties of vector layers in package sf
:
st_length
—Lengthst_area
—Areast_distance
—Distancest_bbox
—Bounding boxst_dimension
—Dimensions (0 = points, 1 = lines, 2 = polygons)
For example, we can calculate the area of each feature in the Ethiopia
layer using st_area
. The result is assigned to a new attribute named area
in eth_map
:
# calculate the area of each district
$area <- st_area(eth_map) eth_map
Summary of the area in \(m^2\)
# summary in m^2
summary(eth_map$area)
Min. 1st Qu. Median Mean 3rd Qu. Max.
5.809e+05 2.672e+08 6.652e+08 1.170e+09 1.337e+09 1.254e+10
Plot the area in \(m^2\).
# plot `m^2`
plot(eth_map["area"], main = "Area of the Districts (m^2)")
You may notice the area
values are printed with their units [m^2]
. The resulting area values, just like any other numeric calculations in sf
, are returned as object of class units
:
class(eth_map$area)
[1] "units"
A units
object is basically a numeric vector associated with units of measurement. We can transform units
measurements to different units, with function set_units
from package units
. First, we load the units
library:
library(units)
Warning: package 'units' was built under R version 4.3.3
udunits database from C:/Users/User/AppData/Local/R/win-library/4.3/units/share/udunits/udunits2.xml
## udunits system database from /usr/share/xml/udunits
Then, we make the transformation to \(km^2\) units :
# km^2
$area <- set_units(eth_map$area, "km^2") eth_map
Lets check the summary of the transformed area in \(Km^2\). In the below summary, we’ve added across(geometry, st_union)
to ensure proper summarization of the spatial data. The st_union
function will be applied to the geometry column.
# summary with chain
%>%
eth_map summarise(
mean_area = mean(area),
median_area = median(area),
sd_area = sd(area),
range_area = range(area),
across(geometry, st_union)
)
Plot the calculated area
for Ethiopia using the plot()
function
# plot `km^2`
plot(eth_map["area"], , main = "Area of the Districts (km^2)")
Spatial Geometry Function
When working with spatial data in R, the sf
package is a powerful tool that allows you to manipulate and analyze geometric objects. In the below functions, we’ll explore some common geometry-generating functions provided by sf
. These functions are essential for anyone getting started with spatial data analysis.
- Centroids (
st_centroid)
The st_centroid
function calculates the centroid (geometric center) of a polygon. Let’s use an example where we want to find the centroids of polygons representing the “Amhara” region in Ethiopia.
# Filter Amhara
<- eth_map |>
amhara_shp filter(region == "Amhara")
Now, let’s calculate the centroids: to calculate the centroids of the “Amhara” polygons we can use the st_centroid
function as follows:
# Calculate centroids
<- st_centroid(amhara_shp) amhara_ctr
Warning: st_centroid assumes attributes are constant over geometries
# Filled polygons with points
plot(st_geometry(amhara_ctr),
col = "red",
pch = 3
)
- Buffering
(st_buffer)
The st_buffer
function creates a buffer around a spatial object. Buffers are useful for visualizing areas of influence or proximity. For instance, you can create a buffer around health facilities to represent their service areas.
# Filter hospitals
<- eth_hf_points %>%
hospital filter(facility_type == "Hospital")
# Create a buffer around health facilities (e.g., 2.5 km radius)
<- 2.5
buffer_radius_km <- st_buffer(hospital, dist = buffer_radius_km * 1000) # Convert km to meters
health_facility_buffer
# Plot the health facility buffers first (as polygons need a base plot)
plot(st_geometry(health_facility_buffer),
col = "red", lwd = 2, main = "A Buffer arround Health Facilities"
)
# Convert hospital geometries to points if they aren't already
<- st_cast(hospital, "POINT")
hospital_points
# Plot the original health facilities on top of the buffers
plot(st_geometry(hospital_points), pch = 16, col = "blue", add = TRUE)
- Random sample points (
st_sample)
To generate random sample points within a polygon, use the st_sample function. This is handy for spatial sampling in epidemiology or environmental studies. The sampled points are overlaid on the original geometries.
# Randomly sample 10 points within hospital polygons
<- st_sample(hospital, size = 10)
sample_points
# Plot the original hospital locations
plot(hospital$geometry, pch = 16, col = "blue", main = "Sample points of Health Facility Locations")
# Overlay the sampled points
plot(sample_points, add = TRUE, pch = 2, col = "red")
Joining non-spatial data to shapefiles
In spatial analysis, it’s common to join non-spatial data (e.g., tabular data) with shapefiles. This process allows you to create maps and perform spatial analyses by combining attribute data with geographic information. In this tutorial, we’ll walk through how to join non-spatial data to shapefile. We’ll also cover some important considerations to ensure your data is properly aligned.
# Load and filter the data for the Amhara region
<- read_csv("C:/Users/User/OneDrive/from_unsw_drive/training/R_training/Tutorial_R/data_excercise.csv") %>%
u5_data filter(region == "Amhara")
Rows: 967 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): region, district, age_group
dbl (3): conf, pop, Incid_per_1000
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Note: Always make sure the data is loaded and filtered correctly before proceeding. Here, we’re focusing on a specific region, but you can adjust the filter condition as needed.
Before joining data, it’s crucial to ensure that the names in your non-spatial data match those in your shapefile. If they don’t match exactly, the join won’t work correctly.
Check names in the Shapefile:
# Check if the district name is correctly spelled in the shapefile
%>%
eth_map filter(district == "Hawi Gudina\n")
Simple feature collection with 1 feature and 19 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 40.35742 ymin: 7.863617 xmax: 41.15041 ymax: 8.40077
Geodetic CRS: WGS 84
Shape_Leng Shape_Area district ADM3_PCODE ADM3_REF ADM3ALT1EN ADM3ALT2EN
1 2.771471 0.2528438 Hawi Gudina\n ET040915 <NA> <NA> <NA>
ADM2_EN ADM2_PCODE region ADM1_PCODE ADM0_EN ADM0_PCODE date
1 West Hararge ET0409 Oromia ET04 Ethiopia ET 2019-08-19
validOn validTo X_coordina Y_coordina geometry
1 2020-10-08 -001-11-30 40.7048 8.12856 MULTIPOLYGON (((40.74895 8....
area
1 3094.835 [km^2]
# Recode the district name to remove any newline characters
$district <- recode(eth_map$district,
eth_map"Hawi Gudina\n" = "Hawi Gudina"
)
# Verify the correction
%>%
eth_map filter(district == "Hawi Gudina")
Simple feature collection with 1 feature and 19 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 40.35742 ymin: 7.863617 xmax: 41.15041 ymax: 8.40077
Geodetic CRS: WGS 84
Shape_Leng Shape_Area district ADM3_PCODE ADM3_REF ADM3ALT1EN ADM3ALT2EN
1 2.771471 0.2528438 Hawi Gudina ET040915 <NA> <NA> <NA>
ADM2_EN ADM2_PCODE region ADM1_PCODE ADM0_EN ADM0_PCODE date
1 West Hararge ET0409 Oromia ET04 Ethiopia ET 2019-08-19
validOn validTo X_coordina Y_coordina geometry
1 2020-10-08 -001-11-30 40.7048 8.12856 MULTIPOLYGON (((40.74895 8....
area
1 3094.835 [km^2]
# Note: Always check that the names are correctly updated in your shapefile.
Check names in your data. Please note that ensuring consistent naming between datasets is essential for a successful join
# Verify the district name in the non-spatial data
%>%
u5_data filter(district == "Hawi Gudina")
# A tibble: 0 × 6
# ℹ 6 variables: region <chr>, district <chr>, age_group <chr>, conf <dbl>,
# pop <dbl>, Incid_per_1000 <dbl>
Join Non-Spatial Data to the Shapefile
Once you’ve confirmed that the names match, you can proceed to join your non-spatial data with the shapefile.
# Perform the join operation
<- amhara_shp %>%
amhara_data_sf left_join(u5_data, by = "district")
# Note: The `left_join` function keeps all rows from the shapefile and joins matching rows from the data.
# If a match isn't found, the resulting shapefile will still include the original geographic information.
After the join, ensure your data is still in the sf
(simple features) format, which is essential for spatial operations.
# Convert the joined data to an sf object
<- st_as_sf(amhara_data_sf)
amhara_data_sf
# Note: Converting to `sf` ensures that the data retains its spatial properties.
Writing shapefiles out
After you’ve completed your analysis or modifications, you may want to save your shapefile for future use or sharing. Here’s how you can write it back to a shapefile format:
# Write the shapefile to disk
st_write(
amhara_data_sf,"C:/Users/User/Documents/R_training/Tutorial_R/R_Tutorial/scripts/tutorials/2024-08-25-Spatial-analysis-in-R/data/amhara_data_with_shapefile.shp"
)
# Note: Replace "path_to_save" with the directory where you want to save the file.
# The .shp extension indicates it's a shapefile.
Introduction to Raster data
In this section, we’ll explore how to work with raster data using the terra
package, which has replaced the older raster
package. We’ll cover basic operations like loading and visualizing raster data, manipulating rasters, and performing spatial analysis.
Loading raster data
The terra
package in R introduces a new class of raster data called SpatRaster
. Let’s start by loading the terra
package and importing Ethiopian 2016 population raster data.
The rast()
function is used to load the raster data, and simply calling the object name (in this case, population
) will display the raster properties like extent, resolution, and CRS (Coordinate Reference System).
# Load the population raster data
<- rast("C:/Users/User/Documents/R_training/Tutorial_R/R_Tutorial/scripts/tutorials/2024-08-25-Spatial-analysis-in-R/data/eth_ppp_2016.tif")
population
# View the raster properties
population
class : SpatRaster
dimensions : 13893, 17983, 1 (nrow, ncol, nlyr)
resolution : 0.0008333333, 0.0008333333 (x, y)
extent : 32.99958, 47.98542, 3.322084, 14.89958 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326)
source : eth_ppp_2016.tif
name : eth_ppp_2016
min value : 2.938226e-03
max value : 4.783238e+02
Visualizing the Raster
Visualizing raster data helps you quickly understand its spatial distribution. We can start by plotting the raster using the plot()
function.
This simple plot provides a visual overview of the raster data. You can use this for quick inspections before more detailed analysis.
# Plot the raster data
plot(population, main = "Raster level populations map")
Plotting, reprojecting and manipulating Rasters in R
Plotting with ggplot2
Before plotting, lets explore the distributions of the raster data with the log scaling.
# plotting the log of the population due to the scale
hist(log(population))
|---------|---------|---------|---------|
=========================================
Warning: [hist] a sample of 0% of the cells was used (of which 47% was NA)
Before plotting, first we need to convert the raster to a data frame. The geom_raster()
function is used to plot raster data, and scale_fill_viridis_c()
is used to apply a color scale that is perceptually uniform. To make it manageable, first, we need to reduce the resolutions of our data
# resolutions of our data
<- aggregate(population, fact = 10, fun = mean) population_reduced
|---------|---------|---------|---------|
=========================================
# Convert the raster to a data frame
<- as.data.frame(population_reduced, xy = TRUE)
pop_df
# Plot the raster using ggplot2
ggplot() +
geom_tile(data = pop_df, aes(x = x, y = y, fill = eth_ppp_2016)) +
scale_fill_viridis_c() +
labs(fill = "Population") +
theme_bw()
Re-projecting rasters
Sometimes we need to change the projection of our raster data to align it with other spatial data. The crs()
function checks the current projection of the raster. By assigning a new CRS, you can reproject the raster to a different coordinate system, which is crucial for spatial analysis when working with multiple datasets.
# Check the current CRS of the raster
crs(population)
# Reproject the raster to UTM zone 48
# crs(population) <- "+proj=utm +zone=48 +datum=WGS84"
Manipulating rasters
We can modify rasters by cropping them to a specific extent or masking out areas outside a certain region. ext()
retrieves the spatial extent of the raster. crop()
allows you to focus on a specific area by cutting out the rest of the data. mask()
sets the raster cells outside a given region (like ‘Amhara’) to NA, effectively hiding them.
# Get the extent of the raster
ext(population)
SpatExtent : 32.999583216, 47.985416489, 3.322083521, 14.899583475 (xmin, xmax, ymin, ymax)
# Assume we have a spatial object 'Amhara' representing a region in Ethiopia
<- filter(eth_map, region == "Amhara")
Amhara
# Crop the raster to the Amhara region
<- crop(population, Amhara)
am_pop ext(am_pop)
SpatExtent : 35.2612498735611, 40.2120831869623, 8.71458349957439, 13.7629168128496 (xmin, xmax, ymin, ymax)
# Mask the raster to keep only the Amhara region
<- terra::mask(am_pop, Amhara)
am_pop2 # plot
plot(am_pop2)
Extracting raster data
To extract the population values for each admin 3 district of Ethiopia. For this we provide exact_extract()
function in exactextractr
package with the raster and the shapefile. The output is then the population value for each cell of the raster, linked to each polygon ID of the shapefile. We can the perform summary statistic on this as we would a standard data frame.
::extract(am_pop2, amhara_ctr)
terra
<- exactextractr::exact_extract(am_pop, amhara_shp, "sum", append_cols = "district")
am_population
# write.csv(file = "C:/Users/user/Documents/Data_management_in_R/pop.csv", row.names = FALSE) # to save the population
head(am_population) # to six districts with the 2016 population size
If the coordinate reference systems (CRS) of the population raster (am_pop2
or am_pop
) and the shapefile (amhara_ctr
or amhara_shp
) do not match, the extraction might fail to correctly overlay the raster and the shapefile, resulting in incorrect or zero values.
If you need to check the projection:
terra::crs(am_pop2) # population raster data
terra::crs(amhara_ctr) # shapefile
The above script result showed us the CRS of the raster data (am_pop2
) and shapefile (amhara_ctr
) are similar. To ensure proper alignment between the raster and the shapefile for accurate extraction, we need to reproject one of them so that both layers share the same CRS.
Spatial Epidemiology
In epidemiology, it is concerned with the description, examination and spatial interaction of disease and focuses specifically on the spatial components to understand and explain why health and disease vary by place.
This blog provides how to approach spatial problems in epidemiology, with a specific focus on public health applications.
The framework illustrated in below outlines a logical, sequential process for conducting spatial analyses. However, it should be recognized that this is not a strictly linear process, as presenting the results from exploration and modeling often necessitates a return to visualization.
Making chloropleth maps in R
Plotting spatial data using base R
Visualizing the geographical distribution and patterns of a disease is a critical step in epidemiological research. In R, you can create maps using only a few essential packages like rgdal
and sf
. While simply calling the plot
function on a shapefile object will produce a map, it tends to display all attributes in a somewhat unappealing and cluttered manner.
To create a more focused and meaningful map, you can plot a single attribute of interest by subsetting the sf
object. This can be done using the standard R subsetting syntax: sf_object[["name_of_attribute"]]
. For example, you can quickly visualize the distribution of a particular disease or condition across administrative regions in Ethiopia by selecting the relevant attribute and plotting it by administrative units.
Customizing the Legend
By default, R will automatically place the legend where it “thinks” it fits best on the plot. However, you can customize the legend’s position and size to enhance the map’s readability.
Positioning the Legend: Use the
key.pos
argument to specify the location of the legend:key.pos = 1
places it below the map.key.pos = 2
places it to the left.key.pos = 3
places it above the map.key.pos = 4
places it to the right.
# to lower legend
plot(amhara_data_sf["conf"],
main = "Confirmed Malaria cases in Amhara Region",
key.pos = 1
)
Mapping using ggplot2:
Adding features to maps
Recall from earlier the typical ggplot
call: ggplot(data, aesthetic)
. When making chloropleth maps, our data will typically be an sf
object, with a column corresponding to the variable you wish to map, and a geometry column describing the geography of the data.
Given an sf
object, you can use geom_sf
to produce a simple plot of your shapefile without even explicitly defining any aesthetics.
# base
ggplot(amhara_shp) +
geom_sf() + # as we've called ggplot using an sf object as the data argument, geom_sf can figure it out itself
coord_sf() # deals with map projection
- viridis package - pretty color map with color scale using the
scale_fill_viridis()
function
# pretty color map with color scale
library(viridis)
Warning: package 'viridis' was built under R version 4.3.3
Loading required package: viridisLite
Warning: package 'viridisLite' was built under R version 4.3.3
ggplot(
data = amhara_data_sf,
aes(fill = conf)
+
) geom_sf() +
scale_fill_viridis() +
theme_void() +
labs(title = "Confirmed malaria in Amhara Region, Ethiopia")
- leaflet package - an open-source JavaScript library for interactive maps to make easy to create Leaflet maps from R. leaflet will also allow you to zoom in and out from the world map view.
library(leaflet)
Warning: package 'leaflet' was built under R version 4.3.3
# create an interactive map
<- colorNumeric("YlOrRd", domain = amhara_data_sf$conf)
pal
leaflet(amhara_data_sf) %>%
addTiles() %>%
addPolygons(
color = "white", fillColor = ~ pal(conf),
fillOpacity = 1
%>%
) addLegend("bottomright", pal = pal, values = ~conf, opacity = 1, title = "Confirmed malaria")
- Mapview pacakge - allows to very quickly create interactive visualizations to investigate both the spatial geometries and the variables in the data
library(mapview)
mapview(amhara_data_sf,
zcol = "conf",
legend = "TRUE",
layer.name = "Confirmed malaria"
# edit legend name )
We can use the sync()
function to produce a lattice-like view of multiple synchronized maps created with mapview or leaflet.
Create maps of confirmed cases and incidence with synchronized zoom and pan by first creating maps of the variables conf and Incid_per_1000
with mapview(), and then passing those maps as arguments of the sync() function
library(leafsync)
Warning: package 'leafsync' was built under R version 4.3.3
# plot the confirmed case map
<- mapview(amhara_data_sf, zcol = "conf", legend = "TRUE", layer.name = "Confirmed malaria")
mcf
# plot the incidence map
<- mapview(amhara_data_sf, zcol = "Incid_per_1000", legend = "TRUE", layer.name = "Incidence of malaria/1000 population")
minc
# sync both map using the `leafsync` package
<- sync(mcf, minc)
m
m