It always happens… I get interested in what I think will be a small data project to scratch some itch and end up down a deep rabbit hole. In this case, a passing interest in the geographic distribution of some samples (more on that in a future post) led to a deep dive into spherical distributions and densities.
DISCLAIMER: While I know a thing or two, there’s a reasonable chance I got some things wrong or at very least there are certainly more efficient ways to go about things. Feedback always appreciated!
NOTE: This has been updated to use the native vmf.kerncontour
since the release of Directional 3.3 supports returning results
Motivation
While I got interested in figuring out densities for the purpose of figuring out the density of points on a map, there are plenty of other cases where you might be interested in the distribution of points on a sphere. The trouble is that most functions commonly available, e.g. geom_density_2d
from ggplot2, only handles regular grid coordinates.
The forms:
- Global densities simply fail at the ‘edge’ of coordinates - e.g. near the poles or near +/- 180 degrees longitude.
- Projection issues. On small scales and near the equator, it is generally safe to make the simplification that longitude/latitude forms a square grid. As you look to larger scales and close to the poles, that assumption breaks down.
I think it is important to point out that there are many tutorials on plotting event densities on maps (e.g. crime occurrences), but that these are all at the city level, where the problems of using existing methods is a reasonable approximation.
Set-Up
First, we’ll make use of a number of libraries and setup our plotting environment:
library(ggplot2) # For most of our plotting
library(cowplot) # grid arrangement of plots
library(Directional) # For spherical density functions
library(maps) # vector maps of the world
library(hrbrthemes) # hrbrmstr themes
library(magick) # For animation
library(mapproj) # Needed for projection
# And set some theme defaults
theme_set(theme_ipsum())
# Axis settings we'll reuse a lot
no.axis <- theme(axis.ticks.y = element_blank(), axis.text.y = element_blank(),
axis.ticks.x = element_blank(), axis.text.x = element_blank(),
axis.title.x = element_blank(), axis.title.y = element_blank())
Next, for this example, we’ll be using a random blob placed on a sphere. I’ll use the rvmf
function from the Directional package. Directional is a general purpose library using Latitude defined from 0 to 180 degrees and Longitude from 0 to 360 instead of -90 to 90 and -180 to 180 respectively. The random_points
function here gives us points in a coordinate system we’re used to.
random_points <- function(n_points, lat, lon, concentration) {
# Directional defines lat + long as 0-180 and 0-360 respectively so we
# have to shift back and forth
mu <- euclid(c(lat + 90, lon + 180))[1,]
pts <- euclid.inv(rvmf(n_points, mu, concentration))
pts[,1] <- pts[,1] - 90
pts[,2] <- pts[,2] - 180
data.frame(pts)
}
Problem
To visualize the problem, we’ll create 2 sets of points, one centered on the map, the other near the pole and near 180 degrees. We’ll then plot the contours of the densities to show the issue.
offset.pos <- list(Lat = 75, Long = 175)
positions.center <- random_points(1000, 0, 0, 10)
positions.offset <- random_points(1000, offset.pos$Lat, offset.pos$Long, 10)
plot.colors <- hcl(h = c(0:3)*90, c = 50 , l = 70)
g.base <- ggplot(positions.center, aes(x = Long, y = Lat)) +
scale_y_continuous(breaks = (-2:2) * 30, limits = c(-90, 90)) +
scale_x_continuous(breaks = (-4:4) * 45, limits = c(-180, 180)) +
coord_map()
g.broken <- g.base +
# The centered random points
geom_density_2d(color = plot.colors[1]) +
geom_point(size = 0.5, stroke = 0, color = plot.colors[1]) +
# The offset random points
geom_density_2d(data = positions.offset, color = plot.colors[2]) +
geom_point(data = positions.offset, size = 0.5, stroke = 0,
color = plot.colors[2])
ortho.projections <- plot_grid(
g.broken + coord_map("ortho", orientation = c(0, 0, 0)) + no.axis,
g.broken + coord_map("ortho", orientation = c(offset.pos$Lat, offset.pos$Long, 0))
+ no.axis,
labels = NULL,
align = 'h')
g.broken
ortho.projections
We can quickly see the problem looking at the blue offset density plot - there are multiple “centers” and the contours don’t connect cleanly.
Spherical Densities
The solution is to use spherical densities an fortunately, the Directional package provides functions for spherical (and in fact, circular and spheres of arbitrary dimensions) distributions using the von Mises-Fisher distribution.
Our basic approach will be the following steps:
- Calculate a “grid” of densities manually, covering the entire globe
- Use geom_contour to turn those density maps into contour curves
- Plot away!
Before we fix the problem using spherical densities, we first need to do some setup. We’ll be using vmf.kerncontour
from the Directional library, but in current CRAN version (3.2), that function plots contours itself. We want to get the data to perform the plots ourselves, so we need a version that returns the data. The next version of the package will have that option, but in the meantime we put the code for the revised function in the Appendix as vmf.kerncontour.new
.
Similar to what we did for random_points
, we also need to perform some translation of vmf.kerncontour
’s input and output to out more familiar formats.
vmf_density_grid <- function(u, ngrid = 100) {
# Translate to (0,180) and (0,360)
u[,1] <- u[,1] + 90
u[,2] <- u[,2] + 180
res <- vmf.kerncontour(u, thumb = "none", den.ret = T, full = T,
ngrid = ngrid)
# Translate back to (-90, 90) and (-180, 180) and create a grid of
# coordinates
ret <- expand.grid(Lat = res$lat - 90, Long = res$long - 180)
ret$Density <- c(res$den)
ret
}
Now we can go ahead and calculate the densities and plot the contours. We’ll keep the “bad” contours for comparison.
densities.center <- vmf_density_grid(positions.center)
densities.offset <- vmf_density_grid(positions.offset)
g.broken <- g.base +
geom_density_2d(color = plot.colors[1], alpha = .5) +
geom_point(size = 0.5, stroke = 0, color = plot.colors[1], alpha = .5) +
geom_density_2d(data = positions.offset, color = plot.colors[2], alpha = .5) +
geom_point(data = positions.offset, size = 0.5, stroke = 0, color =
plot.colors[2], alpha = .5)
g.densities <- g.broken +
geom_contour(data = densities.center,
aes(x=Long, y=Lat, z=Density),
color = plot.colors[3]) +
geom_contour(data = densities.offset,
aes(x=Long, y=Lat, z=Density),
color = plot.colors[4])
ortho.projections <- plot_grid(
g.densities + coord_map("ortho", orientation = c(0, 0, 0)) + no.axis,
g.densities + coord_map("ortho",
orientation = c(offset.pos$Lat, offset.pos$Long, 0))
+ no.axis,
labels = NULL,
align = 'h')
g.densities
ortho.projections
Particularly looking at the orthographic plots, it is easy to see that the spherical density process gives the same rings in both locations, with continuous curves.
Practical Example: Global Earthquakes
Earthquake density is used in one of the few existing attempts to perform density calculations with spherical coordiates on R-Bloggers. The Northern California Earthquake Data Center provides an archive of earthquakes for download, so we start with a set of quakes since Jan 1, 1950 of magnitude 5.9 or higher. Given that data, we then follow the same process as we did with our random data to plot both the 2d density contours and the density contours using spherical functions.
earthquakes <- read.csv(file.path("..", "data", "earthquakes.csv"))
earthquake.densities <- vmf_density_grid(earthquakes[,c("Latitude",
"Longitude")],
ngrid = 300)
world <- map_data("world")
g.earthquakes <- ggplot() +
geom_map(data = world, map = world,
mapping = aes(map_id = region),
color = "grey90", fill = "grey80") +
geom_point(data = earthquakes,
mapping = aes(x = Longitude, y = Latitude),
color = "red", alpha = .2, size = .5, stroke = 0) +
geom_density_2d(data = earthquakes,
aes(x=Longitude, y=Latitude),
color = plot.colors[2], alpha = 1) +
geom_contour(data = earthquake.densities, aes(x=Long, y=Lat, z=Density),
color = plot.colors[4]) +
scale_y_continuous(breaks = (-2:2) * 30, limits = c(-90, 90)) +
scale_x_continuous(breaks = (-4:4) * 45, limits = c(-180, 180)) +
coord_map("mercator")
g.earthquakes
n.frames <- 40
img <- image_graph(400, 400, res = 96)
for (i in 1:n.frames) {
long <- 170 + (i - 1) * 360 / n.frames
# We Explicitly use the 'plot' command to show the ggplot
print(g.earthquakes + coord_map("ortho", orientation = c(0, long, 0)) + no.axis)
}
msg <- dev.off()
image_animate(img, fps = 10)
The yellow shows default 2d density, and you can again see the continuity problems. The blue shows the expected Ring of Fire thanks to the spherical density. It isn’t perfect - if we were really interested in the most accurate results, we’d probably want to turn up the grid size to better follow the chains of quakes or tweak the contour breakpoints to see the fine features.
This should be a good first step to looking at densities in geo events.
Next
While this should have given a good introduction to densities on a sphere and the issues with using the default density functions, there is still more we can do. We’ve got a few more posts coming:
- Heatmaps - Working with heatmaps means generating raster data and projections with raster data adds more complexity
- More Real Examples - I mentioned I had an actual project I was curious about, right?
If you want to explore the code yourself, everything is on github here.
Appendix
Spherical Density Function
This calculates a grid of densities which can then be used with geom_contour
. The code basically comes directly from Directional’s vmf.kerncontour, only returning a data.frame instead of actually plotting the output.
vmf.kerncontour.new <- function(u, thumb = "none", ret.all = FALSE, full = FALSE,
ngrid = 100) {
## u contains the data in latitude and longitude
## the first column is the latitude and the
## second column is the longitude
## thumb is either 'none' (default), or 'rot' (Garcia-Portugues, 2013)
## ret.all if set to TRUE returns a matrix with latitude, longitude and density
## full if set to TRUE calculates densities for the full sphere, otherwise
## using extents of the data
## ngrid specifies the number of points taken at each axis
n <- dim(u)[1] ## sample size
x <- euclid(u)
if (thumb == "none") {
h <- as.numeric( vmfkde.tune(x, low = 0.1, up = 1)[1] )
} else if (thumb == "rot") {
k <- vmf(x)$kappa
h <- ( (8 * sinh(k)^2) / (k * n * ( (1 + 4 * k^2) * sinh(2 * k) -
2 * k * cosh(2 * k)) ) ) ^ ( 1/6 )
}
if (full) {
x1 <- seq( 0, 180, length = ngrid ) ## latitude
x2 <- seq( 0, 360, length = ngrid ) ## longitude
} else {
x1 <- seq( min(u[, 1]) - 5, max(u[, 1]) + 5, length = ngrid ) ## latitude
x2 <- seq( min(u[, 2]) - 5, max(u[, 2]) + 5, length = ngrid ) ## longitude
}
cpk <- 1 / ( ( h^2)^0.5 *(2 * pi)^1.5 * besselI(1/h^2, 0.5) )
mat <- matrix(nrow = ngrid, ncol = ngrid)
for (i in 1:ngrid) {
for (j in 1:ngrid) {
y <- euclid( c(x1[i], x2[j]) )
a <- as.vector( tcrossprod(x, y / h^2) )
can <- sum( exp(a + log(cpk)) ) / ngrid
if (abs(can) < Inf) mat[i, j] <- can
}
}
if (ret.all) {
return(list(Lat = x1, Long = x2, h = h, d = mat))
} else {
contour(mat$Lat, mat$Long, mat, nlevels = 10, col = 2, xlab = "Latitude",
ylab = "Longitude")
points(u[, 1], u[, 2])
}
}
References
- Earthquake data was accessed through the Northern California Earthquake Data Center (NCEDC), doi:10.7932/NCEDC.