3DSoftware.com > Cartography > Programming > Polyline Reduction  
Cartography Programming 
Polyline Reduction 
This page was originally hosted at http://www.3dsoftware.com/Cartography/Programming/PolyLineReduction/. The original is no longer there, so this page has been mirrored here from the Internet Archive.
In 1973, David H. Douglas and Thomas K. Peucker wrote an article in Canadian Cartographer magazine explaining how they developed an algorithm for reducing the number of points in a polyline, when the polyline needs to be displayed at a lower resolution.
Kraak and Ormeling go on to introduce the concept of line simplification with the nth point algorithm and with the DouglasPeucker algorithm. Douglas and Peucker developed iterative and recursive versions of their algorithm, in the fortran and algol programming languages respectively, on an IBM 370 mainframe computer. In this article we will describe an iterative approach using C/C++. For a C++ implementation using recursion, see the Geometry Algorithms web site. 
Footnotes:

Shortest Distance to a Line Segment
In this section, we will cover Linear Algebra concepts. We begin with a definition of vector. A vector, for this discussion, will be a list of numbers. The following are examples of vectors:
The first vector has 3 components. The second and third vectors have 4 and 2 components respectively. A vector's number of components represents the number of dimensions its vector space has. In this article, we use vectors that have two components, representing points and lines in 2 dimensions. All of the examples in this article may be extended to 3 dimensions, by simply using vectors with 3 components. The length of a vector is called its magnitude or norm. This length is a scalar. (A scalar is a single number, not a vector). It can be calculated using the Pythagorean Theorem. For example, consider a point P represented by the vector The line from the Origin to P can be considered the hypotenuse of a right triangle, with the sides of the triangle being 2 and 3. According to the Pythagorean Theorem, the length of the hypotenuse will equal the square root of the sum of the squares of the sides:
Now we'll define what a unit vector is. A unit vector is a vector with a length of 1. To convert any vector to a unit vector, simply divide each component of the vector with the vector length. This is called normalizing a vector. The unit vector points in the same direction as the vector it represents, but has a length of 1 no matter what the original vector's length is. For example, the unit vector of Next we define the Inner Product, which is also known as the Dot Product (both terms are used interchangably). The Inner Product is an operation that is performed on two vectors that have the same number of components (dimensions), and the result is a scalar (a single number, not a vector). This resulting number (the dot product) is the sum of the products of the corresponding components.
Notice that the symbol for a dot product is a dot (·), not the multiplication sign (×). It is convenient to use labels and symbols to refer to vectors. The vectors
The length of a vector is denoted by its name (u or v in this example) surrounded by double vertical bars:


The unit vector is a vector divided by its length:
The symbol for the unit vector is the hat symbol:
However, unit vectors are often simply labeled as the vector divided by its length:
The inner product of vectors has several interesting properties. Of particular interest for our task is that the inner product can be used to calculate the projection of one vector onto another, which in turn can be used to calculate the shortest distance from the end of one vector (a point) to a line passing through the other vector. Consider projecting a vector onto the line of another vector. We'll call the first vector u and the second vector v. Of special interest for us is the scalar (single number) that represents how far along the line of the vector v is the perpendicular projection of vector u onto the line of v (the distance from Conveniently, that number is the inner product of vector u and the unit vector of v:
After calculating that number, you can use the Pythagorean Theorem to determine the distance from point A to the end of vector u, which is usually the shortest distance from the end of u to the line segment represented by vector v. However, if the angle between the vectors u and v is greater than 90, the perpendicular projection will not land on the line segment represented by the vector v, but will instead land on a line that is collinear with v but extends past the vector's endpoints. In that case, the projection scalar is negative instead of 

It is very convenient to have the projection scalar be negative in that case. Your program simply tests for the projection scalar being negative, and if it is negative then use the length of the vector u as the shortest distance to the line segment. Otherwise, calculate the perpendicular projection distance using the Pythagorean Theorem, and use that as the shortest distance to the line segment. Finally, there is one last consideration. If the length of u is longer than the length of v and the angle between them is small enough, the perpendicular projection of u onto v may not intersect v, but instead intersect the line that is collinear with v extending past v: In that case, the shortest distance to the line segment is not the perpendicular projection. To detect that possibility, we must repeat this process, but with the v vector reversed: This is only necessary if the first set of vectors produced a positive (not negative) projection scalar. After reconstructing both vectors, proceed as before with the new vectors. Then if the projection scalar is negative, use the length of u as the shortest distance from the end point of u to the line segment of v. Otherwise use the perpendicular projection distance as the shortest distance. Using Perpendicular Distance Only Douglas and Peucker seem to have used only the perpendicular distance to select Using only the perpendicular distance makes use of other math formulas than described above. Those other formulas had special properties of their own, which Douglas and Peucker took advantage of. Calculating the perpendicularonly distance from a point to a line of infinte length is accomplished with a single math formula. Detecting a sign change as a point moves across a line allowed Douglas and Peucker to implement an estimation accumulator. For comparison, we implemented a perpendicularonly variation to simulate the original DouglasPeucker algorithm. One characteristic of the perpendicularonly method is that it can ignore long inlets that are parallel to the first and last point of a polyline. Such an inlet could extend far beyond the end points of the polyline, but remain near its infinite line from which perpendicular tolerances are measured. 

The following is an example showing the Texas
coast line on the Gulf of Mexico.
This is a satellite image (courtesy of NASA
Visible Earth project), with boundary points
from the VMap
Here are blow ups of the two polyline reduction examples: 

Polyline reduction with perpendicularonly
distance testing.
Note that the satellite image shows
shallow water ocean bottom (as well as land cover)
so that some areas which appear like land may actually
be under water. Also note that we did not implement
Douglas and Peucker's
exception for closed loops, as specified in 

Polyline reduction with shortest distance testing (not perpendicularonly distances). This is the default for new implementations of the DouglasPeucker algorithm. 
Iterative Implementation
Our iterative implementation consists of two loops, one nested inside the other. The outer loop traverses the entire polyline, one vertex at a time, looking for an anchor point, then (when an anchor point is found) looking for a floater. Then when the corresponding floater is found, the inner loop is invoked. The inner loop first checks to see if the anchor and floater points are adjacent vertices. If they are, they are marked as points to use in the final polyline, and the inner loop is not invoked (in which case the outer loop starts looking for the next anchor/floater pair). If the anchor and floater vertices are not adjacent, the inner loop initializes the anchortofloater length and unit vector, and then invokes the inner loop, which traverses the line segment one vertex at a time. For each intermediate point, that point is compared to the anchor, then if necessary compared to the floater. Comparing to the anchor consists of constructing a vector from the anchor to the point in question, again using simple component subtraction. This is equivalent to the u vector described earlier in this article (in which case the anchortofloater vector would be v). Then the projection scalar is calculated. That is the distance from the origin of the vectors to point A in the diagrams above. If that scalar is negative, the length of the u vector is the shortest distance to the line segment and the inner loop goes to the next vertex. But if the projection scalar is positive, the vertex in question is then compared to the floater point. A vector is constructed from the floater point to the point in question, the projection scalar of the new vector on the floatertoanchor vector is calculated, and if that projection scalar is negative then the length of the new vector is used as the shortest distance. Otherwise, the perpendicular distance to the line is calculated using the Pythagorean Theorem, and that value is used as the shortest distance from the point to the line segment. Note that for every vertex, when the minimum distance of that vertex to the anchor/floater line segment is calculated, that minimum distance is compared to a maximum value, and becomes the maximum value if it is greater than the maximum value. That is, the inner loop keeps track of which point has the greatest distance from the anchor/floater line segment. (That maximum value is reset at the beginning of the inner loop.) After the inner loop reaches and finishes processing the last vertex (the vertex immediately before the floater), it checks to see if the maximum distance is greater than the tolerance value. If it is not greater, then the anchor and floater are marked as points to use in the final polyline. Otherwise, the anchor and floater are marked to be used as anchor and floater points in the next iteration of the outer loop, and the point furthest from the anchor/floater segment is also marked to be used as an anchor and as a floater in the next iteration of the outer loop. The outer loop terminates when no more anchor points are found. Note that before the outer loop is initially executed, the first and last points (only) of the polyline are marked as anchor and floater respectively. Also note that this implementation does not actually delete points. Points are simply marked as suitable for being used. Ignoring such markings lets you access the original polyline. Or if you want to use the reduced polyline, then use only the marked points. This implementation requires three flags per vertex. One flag specifies whether that point will be used in the final polyline. The other two flags are only needed during processing, and specify if the point will be used as an anchor and/or floater in the next iteration of the outer loop. An optimization we implement in the source code below is to use a stack instead of the two temporary flags. In this implementation, only one flag per vertex is needed, to specify which vertex will be used in the final generalized polyline. The stack keeps track of anchor and floater pairs, instead of doing that with two temporary flags per vertex. While the order of marking the flags is not important, in the stack scenario it is important to specify the anchor and floater points in the proper order, as is done in the source code example. Several other optimizations are possible, such as using the square of distances in comparisons, to eliminate usage of the This source code example assumes the pnUseFlag integer array is zeroed out before invoking the point reduction function. After the function returns, that array can be examined to determine the array indices of the vertices that are to be in the final polyline. This can then be saved to other data structures, such as a level of detail number for each vertex, as Paul B. Anderson did with MWDBPOLY, which was included on the MicroCAM CD. To review what MWDBPOLY was, here is an excerpt from its documentation:
As the MWDBPOLY documentation states, its polylines were provided with 5 levels of detail:
To develop such a database, you could use multiple invocations of the ReducePoints() function with a different suitable tolerance for each invocation, zeroing out the pnUseFlag array before invoking the function, extracting the indices from that array after invoking the function, and saving a level number for those indices to a tag per vertex as MWDBPOLY did. Source Code The following source is provided free in the public domain for use without any attribution or payment. You are free to incorporate this into your software without giving us credit. We are not responsible for use or misuse of this source code. 
struct STACK_RECORD { int nAnchorIndex, nFloaterIndex; STACK_RECORD *precPrev; } *m_pStack = null; void StackPush( int nAnchorIndex, int nFloaterIndex ); int StackPop( int *pnAnchorIndex, int *pnFloaterIndex ); void ReducePoints( double *pPointsX, double *pPointsY, int nPointsCount, int *pnUseFlag, double dTolerance ) { int nVertexIndex, nAnchorIndex, nFloaterIndex; double dSegmentVecLength; double dAnchorVecX, dAnchorVecY; double dAnchorUnitVecX, dAnchorUnitVecY; double dVertexVecLength; double dVertexVecX, dVertexVecY; double dProjScalar; double dVertexDistanceToSegment; double dMaxDistThisSegment; int nVertexIndexMaxDistance; nAnchorIndex = 0; nFloaterIndex = nPointsCount  1; StackPush( nAnchorIndex, nFloaterIndex ); while ( StackPop( &nAnchorIndex, &nFloaterIndex ) ) { // initialize line segment dAnchorVecX = pPointsX[ nFloaterIndex ]  pPointsX[ nAnchorIndex ]; dAnchorVecY = pPointsY[ nFloaterIndex ]  pPointsY[ nAnchorIndex ]; dSegmentVecLength = sqrt( dAnchorVecX * dAnchorVecX + dAnchorVecY * dAnchorVecY ); dAnchorUnitVecX = dAnchorVecX / dSegmentVecLength; dAnchorUnitVecY = dAnchorVecY / dSegmentVecLength; // inner loop: dMaxDistThisSegment = 0.0; nVertexIndexMaxDistance = nAnchorIndex + 1; for ( nVertexIndex = nAnchorIndex + 1; nVertexIndex < nFloaterIndex; nVertexIndex++ ) { //compare to anchor dVertexVecX = pPointsX[ nVertexIndex ]  pPointsX[ nAnchorIndex ]; dVertexVecY = pPointsY[ nVertexIndex ]  pPointsY[ nAnchorIndex ]; dVertexVecLength = sqrt( dVertexVecX * dVertexVecX + dVertexVecY * dVertexVecY ); //dot product: dProjScalar = dVertexVecX * dAnchorUnitVecX + dVertexVecY * dAnchorUnitVecY; if ( dProjScalar < 0.0 ) dVertexDistanceToSegment = dVertexVecLength; else { //compare to floater dVertexVecX = pPointsX[ nVertexIndex ]  pPointsX[ nFloaterIndex ]; dVertexVecY = pPointsY[ nVertexIndex ]  pPointsY[ nFloaterIndex ]; dVertexVecLength = sqrt( dVertexVecX * dVertexVecX + dVertexVecY * dVertexVecY ); //dot product: dProjScalar = dVertexVecX * (dAnchorUnitVecX) + dVertexVecY * (dAnchorUnitVecY); if ( dProjScalar < 0.0 ) dVertexDistanceToSegment = dVertexVecLength; else //calculate perpendicular distance to line (pythagorean theorem): dVertexDistanceToSegment = sqrt( fabs( dVertexVecLength * dVertexVecLength  dProjScalar * dProjScalar ) ); } //else if ( dMaxDistThisSegment < dVertexDistanceToSegment ) { dMaxDistThisSegment = dVertexDistanceToSegment; nVertexIndexMaxDistance = nVertexIndex; } //if } //for (inner loop) if ( dMaxDistThisSegment <= dTolerance ) { //use line segment pnUseFlag[ nAnchorIndex ] = 1; pnUseFlag[ nFloaterIndex ] = 1; } //if use points else { StackPush( nAnchorIndex, nVertexIndexMaxDistance ); StackPush( nVertexIndexMaxDistance, nFloaterIndex ); } //else } //while (outer loop) } //end of ReducePoints() function void StackPush( int nAnchorIndex, int nFloaterIndex ) { STACK_RECORD *precPrev = m_pStack; m_pStack = (STACK_RECORD *)malloc( sizeof(STACK_RECORD) ); m_pStack>nAnchorIndex = nAnchorIndex; m_pStack>nFloaterIndex = nFloaterIndex; m_pStack>precPrev = precPrev; } //end of StackPush() function int StackPop( int *pnAnchorIndex, int *pnFloaterIndex ) { STACK_RECORD *precStack = m_pStack; if ( precStack == null ) return 0; //false *pnAnchorIndex = precStack>nAnchorIndex; *pnFloaterIndex = precStack>nFloaterIndex; m_pStack = precStack>precPrev; free( precStack ); return 1; //true } //end of StackPop() function 
Copyright © 2005 by 3D Software. All rights reserved. www.3DSoftware.com Info@3DSoftware.com 
Friday, 06Jul2007 10:47:51 GMT 