Friday, February 21, 2014

Module 6 - Cartography - Data Classification


Comparison of Several Methods of Data Classification
Week 6 of our Cartographic Skills course demonstrated how one set of data can produce different-looking maps, depending on how the data has been divided into classes.  The objective of making the maps to the left was to utilize four different data classification methods to display a single set of data: the percentage of African Americans living in the various census tracts of Escambia County, Florida in 2000. We used ArcMap's data classification tools, in the Layer Properties, to create the data classes and make the maps.

The Equal Interval (blue map), Quantile (green map), and Standard Deviation (red and green map) data classification methods are all methods that do not take into account the real distribution of the data values along range.  Respectively, they divide the data among equal divisions of the total data range, equal classes according to numbers of observations, and into classes that assume that most of the observations fall near the average of the range.

Our dataset has a large number of its observations in the low-percentage range, but also has some “outliers,” a few observations of unusually high percentage.  For this reason, the three methods above do not accurately represent the percentages of African Americans in the census tracts of the county.   

The remaining map, which is shown as red in the composite map above, and also alone in the larger single map below, demonstrates the Jenks Natural Breaks method of data classification.  Simply put, the observations can be visually grouped into classes according to the natural groups that they seem to fall into along the continuum of percentages.  In actuality, mathematical algorithms are used to discover the groups of values that are most similar to each other, and these groups then define the classes.  This method is good for data distributions which are not evenly dispersed along their ranges.  If there are discrete groups of values that are very high or very low, they are grouped into their own classes, apart from the other data.  If there are gaps in the range of data values, these are used to divide classes.


So, this method, Jenks Natural Breaks, is best for this dataset of percentage of Black population in the census tracts of Escambia County.  It takes into account the few tracts which have very high percentages, and also those that lie in the middle.  It divides the lower percentages into more classes, because there are many data values that lie within that range, falling into several groups.  

So, why should we care about how data observations are divided and classified?  

From examination of these maps and their data classification methods, we can see the ease with which a single set of data can be manipulated to anyone's particular agenda.  For example, at first glance, the Equal Interval map gives us the idea that most of the tracts of Escambia County have a low percentage of African Americans.  One has to study the legend to find out the real values, and there is still not enough detail for us to learn much about those tracts' populations.  In fact, a different method of classification reveals that some of these tracts have more medium-sized percentages of African Americans.

 The Natural Breaks map is a more honest representation of our data.

Jenks Natural Breaks Method of Data Classification


No comments:

Post a Comment