Comparison of Several Methods of Data Classification |
The Equal Interval (blue
map), Quantile (green map), and Standard Deviation (red and green map) data classification methods are all methods
that do not take into account the real distribution
of the data values along range.
Respectively, they divide the data among equal divisions of the total
data range, equal classes according to numbers of observations, and into classes
that assume that most of the observations fall near the average of the range.
Our dataset has a large number
of its observations in the low-percentage range, but also has some “outliers,” a
few observations of unusually high percentage.
For this reason, the three methods above do not accurately represent the
percentages of African Americans in the census tracts of the county.
The remaining map, which
is shown as red in the composite map above, and also alone in the larger single
map below, demonstrates the Jenks Natural Breaks method of data classification.
Simply put, the observations can be visually grouped into classes according
to the natural groups that they seem to fall into along the continuum of
percentages. In actuality, mathematical algorithms are used to discover
the groups of values that are most similar to each other, and these groups then
define the classes. This method is good for data distributions which are
not evenly dispersed along their ranges. If there are discrete groups of values
that are very high or very low, they are grouped into their own classes, apart
from the other data. If there are gaps in the range of data values, these are used
to divide classes.
So, this method, Jenks
Natural Breaks, is best for this dataset of percentage of Black population in
the census tracts of Escambia County. It takes into account the few tracts which
have very high percentages, and also those that lie in the middle. It
divides the lower percentages into more classes, because there are many data
values that lie within that range, falling into several groups.
So, why should we care about how data observations are divided and classified?
From examination of these maps and their data classification methods, we can see the ease with which a single set of data can be manipulated to anyone's particular agenda. For example, at first glance, the Equal Interval map gives us the idea that most of the tracts of Escambia County have a low percentage of African Americans. One has to study the legend to find out the real values, and there is still not enough detail for us to learn much about those tracts' populations. In fact, a different method of classification reveals that some of these tracts have more medium-sized percentages of African Americans.
The Natural Breaks map is a more honest representation of our data.
Jenks Natural Breaks Method of Data Classification |
No comments:
Post a Comment