Geographic Information Systems (GIS) often store
representations of continuous surfaces representing physical
properties such as elevation, temperature, water depth, etc.,
which can be approximated to a reasonable level of accuracy
by smooth, single valued functions of the form
z=F(x,y). Often such a surface is stored as a regular grid—or
raster—of data values. When physically measuring the value of
the surface, it is often impossible to collect data for every point on the
surface. Instead, scientists collect a sample S of N spot
measurements of the form (x,y,z) and reconstruct the continuous surface
z=F(x,y) by interpolating the points in S.
We are particularly interested in data sets collected using
lidar—a remote sensing method that collects highly accurate and
dense point samples that measure the height of a terrain. Given a set
S of N lidar points, we want to construct a gridded digital
elevation model (DEM) of the terrain represented by S. Because
lidar point sets are dense, and thousands of points can be collected in
seconds, lidar data sets often consist of millions or billions of points.
Processing such large point sets poses a number of computational
Input lidar points shown in red (left) and corresponding DEM
(right) with color representing height.
For clarity, only one in every one hundred lidar points
are shown at left. Region shown is a small area outside
of Hillsborough, NC along the Eno river.
Many sophisticated point interpolation routines have
been developed that solve a system of N
input points. Because such methods do not scale to
data sets containing more than a few hundred or a few thousand points,
many methods rely on a segmentation scheme that decomposes the plane into a set of non-overlapping areas
(or segments), containing a small number of points each.
Each segment can then be interpolated independently
but, to achieve smoothness across segment boundaries, it is necessary
to use at least a subset of the points from neighboring
segments in the interpolation. Often the segments are defined by the areas
associated with the leaves of a quad-tree on the set of input
For extremely large point sets common in lidar
applications, not all input points can simultaneously
reside in the main memory of a computer and must therefore
reside on larger but much slower disks. In this case the
transfer of data between disk and main memory (also called
I/O), rather than computation, often becomes the
performance bottleneck. Therefore we must consider
I/O-efficient methods for constructing the segmentation,
finding points in neighboring segments, and building the
We developed a new simple I/O-efficient algorithm for
constructing a quad-tree based segmentation and storing the data
structure on disk. Our algorithm
also reports the points in segments neighboring each segment
efficiently, and arranges the points on disk so that each segment
can be interpolated in a simple scan over our data structure.
Any interpolation method can be used with our
segmentation scheme and a grid DEM can be efficiently
computed using our approach.
We implemented our algorithm and compared our
implementation to existing interpolation methods. Previous
implementations in both the commercial GIS ArcGIS and
open-source GIS GRASS did not scale beyond 25 million points.
Our approach scaled to points sets containing more than 390
million points and grid DEMs containing more than 1.3 billion
DEM of Neuse River basin in North Carolina derived from 390 million lidar points
P. K. Agarwal, L. Arge, and A. Danner.
From point cloud to grid DEM: A scalable approach.
In Andreas Riedl, Wolfgang Kainz, and Gregory Elmes, editors,
Progress in Spatial Data Handling. 12th International Symposium on Spatial
, pages 771-788. Springer-Verlag, 2006.
Lidar fact sheet [
] from NC Floodmaps
Coastal Services Center Lidar Data Retrieval Tool
Lidar Information Coordination and Knowledge