Gail's GIS Blog: 2014

Wednesday, December 3, 2014

Module 14 - Special Topics in GIS - Spatial Data Aggregation

This week in Special Topics we learned about a concept called the Modifiable Areal Unit Problem. This means that the analysis of our data will probably be different, depending on scale at which we look at it, and how we aggregate the zones.

A related topic is Gerrymandering. This is when Congressional Districts are drawn in complex and contorted fashion in order to include certain neighborhoods and the voters that live there, and exclude others.

Two measures of how badly a District is Gerrymandered are "compactness" and "community".

A compact district is one with a simple shape, with a minimum of irregularities on its boundary. This is a good measure of a district that has not been Gerrymandered. One way to measure compactness is to compare the area of a polygon with its perimeter. A simple polygon, with no "ins and outs" will have a smaller perimeter compared to its area than a polygon that is very complex and contorted, such as the example in Figure 1. This is New York District #7, which I determined from my analysis to be the least compact of all the House districts in the continental United States. It is not so easy to see at this scale, but there are many tiny jagged features along the boundary of this district. The thick red part of its west boundary is actually composed of a long series of very small ins and outs, on the scale of a block or less. This level of small-scale irregularity adds up to a very large perimeter, and a very complex district shape, and suggests that it has been Gerrymandered.

Figure 1. New York Congressional District #7

A high level of "community" in a District indicates that a District does not suffer too badly from Gerrymandering. "Community" is measured here by how much counties are cut by District lines and separated among various Districts. I determined this by identifying the counties that are cut by at least one District line, then finding the Districts with the most "cut" counties. Figure 2 shows North Carolina District #1. It cuts parts out of 20 counties, and contains no whole counties at all. This district has a very low level of "community", by our definition, and this hints at a lot of Gerrymandering.

Figure 2. North Carolina District #1

Tuesday, November 25, 2014

Module 13 - Special Topics in GIS - Effects of Scale

We began the last 3-week topic in Special Topics for this semester, on issues of scale and resolution in GIS. As an introduction, we examined how resolution decreases in data as the scale at which it was collected decreases: in other words, 1:100,000 scale data will have much less detail than the same features recorded at 1:1,200 scale.

We examined two different types of DEMs in this exercise: one obtained at 1-m resolution by LIDAR and the other obtained at about 82-m resolution by SRTM, which is space-shuttle-borne microwave, known also as RADAR. I resampled the LIDAR 1-meter data to pixel sizes of 2, 5, 10, 30, and 90 meters, and compared their slopes. The highest resolution data (1-m) will have the highest-value slope and the slope will decrease slightly with decreasing resolution and increasing pixel size. This is because with more and smaller pixels, the slope for each across a given horizontal distance will be higher, resulting in a higher average slope, than the slope across a single larger pixel.
The LIDAR 90-meter DEM was then compared to the SRTM DEM, which was reprojected to the same resolution (90 m) and coordinate system. The elevation and its derivatives (1st-order: slope and aspect, and 2nd-order: curvature) for both 90-m DEM's were compared to those values for the original 1-meter LIDAR DEM, with the assumption that the higher resolution original data is more accurate.

The relative errors for all values (elevation, slope, aspect and curvature) between the 90-m LIDAR and the 1-m LIDAR are smaller than those between the SRTM 90-m DEM and the 1-m LIDAR.

Selected Results from comparison to LIDAR 90-m DEM and SRTM 90-m DEM, to LIDAR 1-m DEM

It can be noted from the values in the above table that while average differences in elevation between the LIDAR 90-meter and SRTM 90-meter are quite small, the differences are compounded in the 1st derivative of slope, which hinges on elevation. The differences are even greater for the 2nd derivative of curvature.

These results make sense, because LIDAR has many more elevation data points because of the much more rapid data pulse rate. It is also a much shorter wavelength than SRTM.

Wednesday, November 19, 2014

Module 12 - Special Topics in GIS - Geographically Weighted Regression

Last week in Special Topics, we learned how to run a linear regression in ArcMap, using the OLS tool: this stands for Ordinary Least Squares. This analysis method gathers explanatory variable values from the entire study area and thus does not take into account spatial variation in correlation between variables. In other words, it assumes that the relationship between, for example, median income and rate of auto theft, is the same over the whole study area. This is probably not the case.

The GWR (Geographically Weighted Regression) tool re-runs the regression analysis repeatedly over small, local areas within the general study area, then produces a correlation coefficient for every data location from the input, between each explanatory variable and the dependent variable. Thus, we can see if and how relationships between variables change spatially.

In this lab, I examined rate of auto theft as a function of four explanatory variables: percent Black population, percent Hispanic population, percent renter-occupied housing units, and median income. By running the Moran's I tool for spatial autocorrelation, I could see that there would be clustering of similar values across the area.

I then ran the same data through the GWR analysis, with the default bandwidth method of AIC. There was no significant improvement in the model performance from the OLS method, based on AIC score and Adjusted R-squared values (AIC is a relative measure, comparing distances between various models to an unknown "truth"; Adjusted R-squared is the percentage of variation of the dependent variable that is explained by the explanatory variables).

The fact that the GWR and OLS tests produced nearly identical results suggested that the GWR analysis settings were ranging too far from each location to collect data points for each small local regression. (In the extreme, a GWR analysis that accepts ALL of the points in the study area as neighbors will be just the same as the global OLS analysis of the whole area.) The solution to this problem is possibly to assign the number of neighbors used by the GWR a particular quantity, rather than letting the AIC calculation figure it out.

I tried a few versions of both Fixed and Adaptive GWR and found that Adaptive GWR with 15 neighbors produced the best-performing model to explain the correlation between auto theft and the four explanatory variables, based on AIC and Adjusted R-squared.

Wednesday, November 12, 2014

Module 11 - Special Topics in GIS - Multivariate Regression, Diagnostics and Regression in ArcGIS

This week's lab was very interesting and very challenging, as we learned more about statistical analysis of data, this time in ArcGIS, and about multivariate analysis.

We can use several tools in the Spatial Statistics toolbox in ArcMap to analyze our spatial data and determine what sorts of relationships exist among variables, and whether the relationships are statistically significant. One of the tools is called Ordinary Least Squares, or OLS. One dependent variable is loaded into the tool, along with one or more explanatory or independent variables. The resulting report shows many different statistics about the strength and type of effect the independent variables have on the dependent variable, and how likely those statistics are to be significant. We can also examine the distribution of residuals on a map. Residuals are the degree to which the explanatory variables actually fail to explain the dependent variable. We can also check for autocorrelation, which is the degree to which data points that are spatially close to each other on a map have similar values. There are 6 criteria to check the results from the ArcGIS OLS, by examination of the various statistics. They include how well the variables help the model, whether any of the variables are redundant, whether the model is biased, either spatially across the study area or by magnitude of values, whether the relationships between explanatory and dependent variables are as we expected them, and if we have all necessary explanatory variables. We can run an additional analysis called Exploratory Regression, that quickly looks at all possible combinations of as many and whichever of our explanatory variables as we want, and give us the statistics we need to decide which explanatory variables create the best model. The key statistics are R-squared (adjusted), which tells us what percentage of the dependent variable is explained by the independent variables, and the AIC (Akaike's Information Criterion) is a relative factor that we use to compare several models to find the best-fitting one.

Tuesday, November 11, 2014

Module 10 - Remote Sensing - Supervised Classification

Supervised Land Cover Classification and Distance File Map

We worked with unsupervised classification last week in Remote Sensing, in which the programs determine classes of pixels that share similar spectral signatures, without prior input from the human analyst. Any adjustments are done afterwards. This week we worked with supervised classification. With this process, we "train" the software by identifying samples of particular classes of land cover, then let the programming assign the rest of the pixels in the image based on their similarity to the signatures of the samples. The map at right was classed from an image using this type of process, after setting up 16 samples representing 8 unique land cover classes (as shown on the map legend at left).

The map was created using a band combination of Red = Band 4, Green = Band 5 and Blue = Band 6. These are all bands in the Near- to Mid-Infrared part of the spectrum. They were chosen because they display the classes most distinctly and there is little overlap between the signatures.

The small grey-scale map at the bottom is called a Distance File map. It shows how well our classification signatures match the signatures of the pixels from the original image. The dark areas are those features which have a signature that matches the class spectral signatures best, while the white areas are those that do not fall into one of the classes created from the samples. It is called a Distance file because it refers not to spatial distance on the ground, but rather distance on an XY plot between defined classes and the image pixels.

Wednesday, November 5, 2014

Module 10 - Special Topics in GIS - Introductory Statistics, Correlation and Bivariate Regression

This week we had a crash course in Statistics, as it might apply to GIS and other areas. We took advantage of the considerable capabilities of Excel for our exercises, which covered basic statistics such as mean, median and Standard Deviation. We also examined correlation: how two or more variables might vary together, and how much the variation in one may influence the other. The task we worked on was over linear regression. This is a predictive tool, in which you plot known values of two variables on X and Y axes, then determine a best-fit line through the points. The slope and y-intercept of the line help us predict unknown values beyond the sample.
In this problem, we had yearly rainfall data for two weather stations, A and B. For B, the weather data went from 1931 to 2004. For Station A, the data was only for 1950 - 2004. We plotted Station B and A values against each other (B as X or the independent variable, and A as Y, the dependent variable which we hope to make predictions on.) Then, slope and y-intercept are calculated using functions in Excel. Y' (Station A rainfall values) can then be calculated from the known X (Station B) value for the years that rainfall data for Station A is missing.

The equations is very simple, just the usual equation for a line that we learned in Algebra:

Y' = bX + a

where b = slope, a = y-intercept of the best fit (regression) line, Y' is the predicted rainfall for a particular year for Station A, and X = known rainfall value for Station B for the same year.

Y' for each year gives us an estimated, hypothetical rainfall for that station, based on its relationship to the other station.

This method does have limitations. If the values for X and Y deviate too far from the line of best fit, then this error will decrease the reliability of predicted values.

Tuesday, November 4, 2014

Module 9 - Remote Sensing - Unsupervised Classification

Result of Unsupervised Classification
of an Air Photo of the UWF Campus

If we want to convert raw pixel data (eg. reflectance, etc.) into thematic material, we'll need to decide what spectral characteristics equate with what sorts of features. Both ArcMap and ERDAS Imagine have automated tools for this conversion. There are two ways of having the software do this. In unsupervised classification, the software clusters groups of pixels with similar spectral signatures and assigns them to thematic classes, without regard to types of features the classes represent. The user specifies how many thematic classes will be used then refines and corrects the final image and assigns the classes to the groups of similar pixels. We did this type of exercise this week in Lab. We started with a color air photo of the UWF campus with only the visible red, green and blue bands. Then, we instructed ERDAS imagine to classify all the pixels in the image, based on color, into 50 classes. We then examined the classes and assigned them to one of five thematic classes: buildings or streets, trees, grass, shadow, and mixed. The software gave us a considerable head start in this task, by grouping the pixels (and features) with similar colors.

Next week, we'll do supervised classification, in which we'll "train" the software with some samples that we've identified ahead of time.

Wednesday, October 29, 2014

Module 9 - Special Topics in GIS - Accuracy of DEMs

This week in Special Topics in GIS, we learned about the sources of error involved in creating DEMs, and conducted two analyses on elevation data. In the first analysis, we compared elevation and land cover class data collected by hand in the field. This elevation data is considered to be the more accurate, or reference data. The sample points were divided among five classes of land cover, shown in the table below. We overlaid a LIDAR image on the field sample point map, and compared the LIDAR-obtained elevations at points corresponding with the field-measured points. The table below shows the accuracy results for the five types of ground cover, separately and together.

Bare earth and low grass, predictably, have the lowest land-cover, because there are few obstructions in the LIDAR view. Fully-forested land has the highest error, which is logical because the canopy of trees will interfere with the LIDAR view. The range of error values was also greater for forest. This might be from the variability of the forest cover.

Another interesting outcome of this analysis was that while the composite error appeared to be fairly unbiased, most of the component land cover data types did show various strong biases. Land cover types a, b, and c (bare earth and low grass, high grass, weeds and crops, and brushland and low trees) all produce LIDAR elevations that are more often higher than the field measured sample elevations.

Although fully-forested land cover has the highest error, it is also shows the least bias in elevation error. Urban land cover has a very strong negative bias in error: LIDAR elevations are nearly always lower than the field-sampled reference elevations. It just so happens that the combination of these 5 types of data produce a more-or-less unbiased composite. This would be very misleading if the results were not closely analyzed by ground-cover type.

Tuesday, October 28, 2014

Module 8 - Remote Sensing - Thermal and Multispectral Analysis

Thermal and Multispectral Analysis to Identify a Geographic Feature

This week, we added Thermal Infrared, or TM band 6, to our multispectral analysis toolbox.
Unlike the near-infrared and mid-infrared wavelengths, which are reflected from the surface of the Earth, back to the sensors, thermal infrared energy is actually emitted from the body of the Earth itself. When shorter-wavelength energy hits the earth, some of it is converted to heat energy, and that is what is emitted back to the thermal infrared sensors on platforms such as LANDSAT.

Sources of heat, such as fires, can be identified by the thermal infrared band, but cool features can be as well, from their anomalously low emission of thermal infrared energy: they are cool. Water bodies are examples of features such as this. In the map above, a band combination of Near Infrared (band 4: red), Visible Green (band 2: green) and Thermal Infrared (band 6: blue) were used, along with several other band combinations, to identify the dark green grid-like feature above as a collection of ponds or flooded fields. In the image pictured here, it can be seen that the ponds reflect varying degrees of visible green light, depending on their depth. The fact that they are a fairly dark green compared to the river to the west show that they are filled with relatively clear water. Whether they are dark or light green also depends upon their depth. Almost all Near Infrared wavelengths are absorbed by the water, and they emit very little thermal infrared energy. This is shown by the very low amounts of blue and red displayed by the ponds in the image. The area to the east of the ponds emits considerably more thermal infrared energy, as shown by its bright blue color, while the vegetated areas shown by pink and red around the ponds reflect much more Near Infrared energy.

Wednesday, October 22, 2014

Module 8 - Special Topics in GIS - Surface Interpolation

Spline Interpolation with input data points

This week we worked with several different types of interpolation, that process by which new values can be estimated between known values. I could have devoted several weeks to this topic, to get a good grasp of which types of interpolation are best for various applications and how best to use them. I felt that this week, I barely scratched the surface, as it were, in understanding this topic.

The map at left is an example of Spline interpolation of elevation (white and purple are highest, turquoise is lowest). Locations of input data are shown as black dots. This is a method which provides a smooth result, and with which it's possible to extrapolate beyond the available data values and study area. Newly interpolated values can be more liberally influenced by input data in their general neighborhoods.

Legend (relative Elevation)

IDW Interpolation

This map is an example of IDW, or Inverse Distance Weighted interpolation of the same data points. It is not as smooth as the Spline method, because interpolated values get less and less influence from the input data points with increased distance; that is, less weight is given. The IDW has a much rougher and spottier aspect because the influence of a single data point rapidly diminishes with distance away from it. All of the small dots in this map are centered around input data points (not shown in this view) that are isolated enough from other points that their weight is felt only in their immediate vicinities. The more extensive areas of color are the result of the combined influence of input data that are more densely arrange, and of similar values.

Both IDW and Spline interpolation honor the input data, that is, the data points retain their original values.

Tuesday, October 21, 2014

Module 7 - Remote Sensing - Multispectral Analysis

Three examples of ground features emphasized by Spectral Band Combinations.

This week in Remote Sensing, we learned about multispectral analysis, specifically with Landsat TM 5 imagery in Washington State. The image we looked at has 6 spectral bands that can be examined one at a time with gray scale imagery, or in combinations, displayed in Red, Green and Blue (RGB).
The bands in this Landsat imagery are 1, 2, and 3, which are the visible wavelengths of red, Band 3,or Near Infrared (NIR), and Bands 5 and 7, which are Middle Infrared. Any combination of 3 bands can be used, represented by the visible red, green and blue in the display. Each of the example features on the left is shown in a different combination, which best emphasizes deep or shallow water and snow, for example.

Wednesday, October 15, 2014

Module 7 - Special Topics in GIS - TIN's and DEM's

Symbology in a TIN elevation image of a landscape, with 5x exaggeration of relief.

This week in Special Topics, we began a 3-module unit on surfaces. In the first lab, we had an introduction to DEM's or Digital Elevation Models, and TIN's or Triangulated Irregular Network Surfaces.

DEM's are raster-based, and made of regular grids of pixels, each of which holds a value.

TIN's however, are based on vector data. Data points values can be turned into a TIN by means of triangulation: the points are connected to their neighbors by edges, and the edges form triangles, which make up facets in the surface in question. Vector lines and polygon boundary lines can be incorporated into a TIN, to show the positions of roads, rivers, lake shores, and other significant features that may affect the shape of a landscape. These linear features are included in the network of triangles, with series of small triangle edges defining the linear features.

In an elevation TIN, each triangle or facet has a uniform slope and aspect, while elevation changes across the triangle. In the figure above, different degrees of slope are shown by different tones of gray. The surface can also have contours overlaid, as well as the original mass points (red). Aspect (azimuthal direction) can also be symbolized.

The figure above shows a close-up example of a TIN. Many features of its can be shown in the symbology, including the nodes (points), edges of the triangles, slope, aspect, and surface.

ArcGIS uses Delaunay triangulation. This process avoids long skinny triangles by requiring that a circle that includes the three vertices of any triangle does not enclose vertices of another triangle.

Tuesday, October 14, 2014

Module 6 - Remote Sensing - Spatial Enhancement

Enhancement of Landsat 7 Imagery with Fourier Transform and
3x3 sharpening, ERDAS Imagine v.2014

In this week's Remote Sensing Lab, we tried out various digital filters in order to enhance imagery. High-pass filters can be used to bring out more detail and contrast in an image, while low-pass filters smooth the imagery and emphasize larger-scale features.

The larger image to the left is an example of how imagery that has been corrupted by missing data can be improved. We used Fourier Transform in ERDAS Imagine to partially blend the values of pixels in the black stripes that show up in the original image (left) into the pixels surrounding them. This processing was followed by convolution filtering in ERDAS. Here, we make a "kernel" of 8 pixels that will surround each pixel in the image in turn. In filtering with a kernel, the values of the 8 pixels surrounding each pixel are averaged, and that value is then applied to the center pixel. This serves to de-emphasize the black stripes. Finally, radiometry was adjusted in ERDAS Imagine, by way of the LUT histogram breakpoints.

In the final enhanced image, the stripes are fainter, but still visible. The main disadvantage to this type of processing is that the image can become very blurry and lose a lot of detail if the analyst is not careful.

Wednesday, October 8, 2014

Module 6 - Special Topics in GIS - Location-Allocation Analysis

In this final week of our Network Analysis unit, we practiced Location-Allocation analysis, in which supply points are allocated to demand points.

A hypothetical retail chain has 22 distribution centers throughout the United States, to supply to more than 4,300 customers. It has divided the customers into more than 200 market areas, shown by the smaller polygons in the map at left. However, the company wants to determine if there is a more efficient scheme for supplying the market areas from the distribution centers. With the help of a Network Dataset of roads, each customer is allocated to the ideal distribution center.
In this study, we did not want to go to the trouble to reassign the individual customers to the distribution centers, so we looked at a way to reassign the market areas, based on if the majority of customers within each market area had been reassigned to a new distribution center. If the majority are better served by a new distribution center, all of the customers within a market area will be reassigned. We had only a few market areas (less than a dozen) of the 227 total reassigned. This scheme does not insure that every single customer is being served by the nearest distribution center, but it does make for less reorganization for the company in terms of reassigning customers to new distribution centers.

Tuesday, September 30, 2014

Module 5 - Special Topics in GIS - Vehicle Routing Problems

16 Truck Routes with Orders for Southern Florida

This week's lab in Special Topics was a continuation of Network Analysis. In these exercises, we worked on establishing routes for vans delivering patients to their doctor appointments, and for trucks delivering goods to supermarkets.

In the map on the left, we can see the routes for 16 trucks from the depot or warehouse, to various stores, as they deliver their orders over the course of a day. Initially, we restricted the trucks to delivering only within their own zones, and excluded any other trucks from delivering to zones not assigned to them. There are 14 zones, and one truck was assigned exclusively to each zone in that scenario (14 trucks total).
This resulted in 6 orders being missed in the day's deliveries, and 10 additional orders for which the established time schedule was violated.

We then developed a new scenario (pictured here), in which we added two additional trucks from the fleet of 22 to be assigned to help with deliveries to the zones. In this case, no orders were missed, and only one order was late. In the figure above, blue dots are order deliveries that were delivered successfully, within their time constraints, and the red square is the solitary order that was delivered outside the time window. The numbers in parentheses after each truck number are the quantities of orders assigned for each route.

Difference in revenue from the first solution (14 trucks) to the second solution (16 trucks):
$33,625 (16) - $32,000 (14) = $1,625 more was earned with new, 16-truck solution.
Difference in cost from first solution to second solution:
16,919.63(16) - $15,067.20 (14) = $1,852.43 more cost was
incurred with new, 16-truck solution.
So, the company had a net cost increase of about $227, but that resulted
in better customer service (6 orders filled that otherwise would not have been),
which will be better for the company in the long run.

Sunday, September 28, 2014

Module 5a - Remote Sensing - Introduction to ERDAS Imagine and Digital Data 1

Landcover Imagery from Washington State
Manipulated in ERDAS Imagine and exported to ArcMap

This week in Remote Sensing lab we had an introduction to ERDAS Imagine, which is software designed to manipulate and adjust the qualities of remote sensing imagery. The various spectral bands from the imagery can be enhanced to make certain features easier to see: for example, the near infrared band is reflected off of vegetation preferentially, so an image enhancing this wavelength can be very usefule in studies of forest and agricultural land.

We are only getting started with learning how to use ERDAS, but it should be a very useful tool to extract the utmost value from remotely sensed data.

Wednesday, September 24, 2014

Module 4- Special Topics in GIS -Network Building

Routes with no restrictions (left) and accounting for traffic (right)

This week's lab was super-challenging. The Network Analyst software of ArcMap is quite sophisticated, and so it's sometimes hard to understand how it works and get the results you're looking for.

In our case, we made a route to go among 19 stops, with no restricted turns. A new network dataset is created within a geodatabase, and feature dataset, with street, traffic and restricted turn feature classes associated with it. The traffic data is based on historical traffic patterns, which are actually models of how traffic ebbs and flows at different speeds over the course of a day, on different days of the week. Rather than attaching all the traffic data to the specific streets, each street is referred to the particular pattern that is best suited to it. A general traffic situation can be created for the streets, but live traffic data can also be accessed and applied to the streets on a minute-by-minute basis.

As can be seen from the two figures above, the route will change depending on whether or not we have no restrictions in the form of restricted turns or traffic. The route at left is with no restrictions, and the route on the right takes into account traffic patterns. By changing the path in a couple of places, indicated by the arrows, the travel time can be optimized. Total time went from 105 minutes in the non-restricted turn case, to 107 minutes once traffic effects were added and the route was adjusted by the Network Analyst.

Monday, September 22, 2014

Module 4 - Remote Sensing - Accuracy Assessment of Land Use and Land Cover Classification

Map of Accuracy Assessment of Land Use and Land Cover,
Pascagoula, Mississippi, based on Google Street View

This week we did a virtual "Ground Truth" accuracy assessment of our Land Use and Land Cover interpretation and classification from last week.

Here is the map from last week, with 30 sample points to be checked against Google Earth Street View. The points were more-or-less randomly scattered across the area, however, none are located in the southwest quarter of the map, because this area is covered with large areas of water and wetlands that are not visible from Street View. A few sample points are located along the coast, in areas that could be seen from the shore-side roads.

The procedure was to find the location of each sample point on Google Maps, satellite view, then go into Street View and ascertain what is actually at that spot, and if it follows the classification that was given earlier for that spot.
In my map, 11 of the 30 sample locations returned incorrect classifications, giving an accuracy of 63.3%. Most of the errors resulted from the misinterpretation of an actual Commercial and Services area (class 12) as a Commercial and Industrial Complex area (class 15). Also, some areas that I earlier classified as Forested Wetlands (61) are probably Mixed Forest Land (43); in other words, forested areas on dry land.

Residential areas, lakes, rivers and roads are very easy to classify from aerial imagery, because they are very distinctive in geometry, details, and textures. However, a couple of spots that looked like residential houses turned out to be commercial locations, and vice-versa.

Wednesday, September 17, 2014

Module 3- Special Topics in GIS - Determining Quality of Road Networks: Completeness

Map depicting relative completeness of two road networks

In this week's lab, we explored a method of comparing the completeness of two road networks covering the same area, in this case, Jackson County, Oregon. The two networks to be compared are the U.S. Census Bureau's TIGER lines for 2000, and the Street Center Lines network from the Jackson County GIS Department.

First, we can simply compare the total length of all streets from the two networks, within the county.

A more rigorous method can show the pattern of relative completeness between the two networks. In this exercise, we overlaid the two road networks with a simple grid. Then, by using ArcMap 10.2 Intersect and Summary Statistics tools, we were able to create a total length of road segments for each separate grid cell, This procedure was carried out for both networks. Then, for each individual cell, the percent difference in road segment length between the two networks was calculated. The following calculation was carried out on each grid:

(Total segment length of Street Centerlines) - (Total segment length of TIGER lines)
Total segment length of Street Centerlines X 100

The result is negative percentage values in grid cells with larger TIGER line segment lengths, and positive values for cells in which Street Center Lines have a larger sum of segments.

Based on this method of analysis, TIGER Lines is more complete than the Street Center Lines, with a more cells with longer segment length.

The completeness results are depicted in the map above. Blue cells are those in which TIGER Lines are more complete, and green cells are those in which Street Center Lines are more complete. (Darker tones represent higher percentage of difference.) The pale cells are those in which the lengths are very similar, within 3%, plus or minus.

Tuesday, September 16, 2014

Module 3 - Remote Sensing - Land Use and Land Cover Classification

Map of USGS Level II Land Use and Land Cover,
Pascagoula, Mississippi

In this week's Remote Sensing lab, we began to gain experience in interpreting land cover and land use from aerial photographs, as well as digitizing land use and land cover features onto a map. The learning curve was steep for me at the beginning, because I had to wade through some inefficient methods and repair a lot of errors before I could really start to make progress. All in all, this exercise was very educational!

This map has been classified and digitized according to USGS Level II of land use and land cover classes (Level I is very general, while level IV, the highest, is most detailed). The main challenge at Level II is learning to generalize, while still identifying the major subclasses. Features that can be discerned from the entire extent of the map must be identified. At Level I, there would only be 4 classes on this map: Urban, Forest Land, Water and Wetland. At Level II, we divide up those very broad classes somewhat, as shown by the map legend directly to the left; for example, the Urban Level I class includes Residential, Commercial and Services, Industrial, and Transportation at Level II.

Wednesday, September 10, 2014

Module 2 - Special Topics in GIS - Positional Accuracy of Road Networks

Map produced by the City of Albuquerque, with sample intersections
placed for this accuracy assessment.

This week in Special Topics in GIS, we continued to practice making calculations of error between a sample dataset and a reference dataset considered to be more accurate.

Last week, we calculated error on two existing sets of data that we were given as part of our lab materials. This week, we generated our own datasets from maps of the city of Albuquerque, New Mexico, then calculated error.

The figure above is a simple street map for the first dataset to be tested . The dark dots represent samples of street intersections taken from a map created by the City of Albuquerque. This City map is considered to be quite accurate, according to our lab materials this week. We also examined the accuracy of another street network from StreetMap USA, over which we placed sample location points on the same street intersections as for the City map. In order to assess the accuracy of these two datasets, we needed an independent reference dataset with corresponding street intersection points, which has been deemed to be much more accurate than those datasets we want to test.
For this "truer" reference, we digitized a new set of points on the centers of the sample street intersections, based on digital orthophoto quarterquads (DOQQ's) covering the city.

For this accuracy assessment, we followed the methods developed for the National Standard for Spatial Data Accuracy (NSSDA). This document and others based on it line out a 7-step process for assessing and documenting the positional accuracy of a data set. A very well-written summary of this process, with several case-studies, was produced by the Minnesota Planning and Land Management Information Center and included in our lab readings, and this was my main guide in this project.

In summary, here are the seven steps for assessing the positional accuracy of the dataset in question:

1. See if you need to test horizontal, vertical, or both accuracy in your dataset.
(For this assignment, we assessed only horizontal accuracy.)
2. Select a set of test or sample points from the dataset that you're testing.
(We tested two datasets, one from the City of Albuquerque map, and the other from the
StreetMaps USA map of the same area.)
For this step, select at least 20 points, make sure that at least 20% of them are in each
quadrant of your study area, and that they are at a distance from each other of at least
10% of the diameter of your study area.
3. Find another, independent dataset that represents the locations of those same points,
but with higher spatial accuracy.
For this, you might have to digitize the locations from an orthophoto, or measure them in the field with GPS, or the points might already exist in another more accurate dataset.
(Our reference data was digitized from the DOQQ's.)
4. Tabulate the x and y coordinates for both datasets.
5. Calculate positional accuracy statistics, either horizontal or vertical.
In this step, calculate the distances for both X and Y between each pair of points (test and
reference), square those differences, then sum the squares.
This is the first part of the Pythagorean equation, but in this case, do NOT find
   the square root of the sum. Instead, calculate the average  of the sums of x and y squares
   for all of the pairs of points.   Then, take the square root of that average.
    This gives you the average error distance between all of your test and reference points.
  This is called the Root Mean Square Error (RMSE).
From this statistic, we now want to calculate the error distance at which 95% of our
tested data points fall from their corresponding reference points.
To do that, simply multiply the RMSE by 1.7308 for horizontal error, and 1.9600 for
vertical error (these factors are calculated from the mathematical characteristics
of a Gaussian or bell-shaped distribution curve.)
These products are known as the NSSDA statistics and should be reported in the
final accuracy assessment.
6. Write a standardized accuracy statement in the NSSDA format.
(My statements for these assessments are shown below.
7. Include that accuracy report in the metadata of the dataset you are testing.

Here are formal my NSSDA accuracy statements for the two tested datasets in Albuquerque, NM.

a. City of Albuquerque road network, Horizontal Positional Accuracy:

Using the National Standard for Spatial Data Accuracy, the dataset tested 16.8 feet horizontal accuracy at 95% confidence level.

b. StreetMaps road network, Horizontal Positional Accuracy:

Using the National Standard for Spatial Data Accuracy, the dataset tested 341.5 feet horizontal accuracy at 95% confidence level.

As we can see from these statements, the City of Albuquerque dataset is much more accurate than the StreetMaps USA dataset. We can say (based on this sample test at least), that 95% of the City of Albuquerque-mapped street intersections lie within 16.8 feet (horizontally) of their true positions based on the DOQQ's. However, we can only say for the StreetMaps USA map that 95% of the samples tested lie within 341.5 feet (horizontally) of their true locations. This disparity was easily seen from cursory visual comparison of the street network datasets and the orthophotos. The City mapped streets were always very close to their counterparts in the DOQQ's, but those of StreetMaps varied considerably from being fairly close (generally in the center of the map) to hundreds of feet away, and skewed at large angles (generally at the corners and edges of the map).

Tuesday, September 9, 2014

Module 2 - Remote Sensing - Visual Interpretation

In the second week of Photo Interpretation and Remote Sensing, we began looking more closely and analytically at aerial photos. By making use of some systematic and discriminating methods, it's possible to learn much more from air photos than we can with a more casual and cursory approach.

The top map to the left shows examples of different degrees of tone, or brightness along a gray-scale, and texture, in features in an aerial photo. Our tone scale has five classes: very light, light, medium, dark and very dark. It's important to use a scale like this in order to maintain consistency as we describe and identify objects on the ground.

The bottom map illustrates four different elements of image interpretation. Different objects on the ground have a variety of characteristics...some, such as cars and houses, are most easily recognized by their shape or size, others, such as a parking lot, by a distinctive pattern, and others, such as light poles, more by the shadow that they cast on the ground. Finally, some features, (such as the pier in this map) can be distinguished mainly by their association with certain other features (the ocean in this case.) An interpreter must keep all of these elements of interpretation in mind while working to identify objects from aerial imagery.

Wednesday, September 3, 2014

Module 1 - Special Topics in GIS - Calculating Metrics for Spatial Data Quality

Data quality is a critical aspect in GIS analysis. Both accuracy and precision must be addressed. Both may be positional, temporal, or thematic in nature. Accuracy is defined as the degree that measured values approach the "true" or reference value. Precision is not the same as accuracy, but rather is a measure of how close repeat measurements are to each other.

The map at left depicts 50 repeat measurements (black dots), made using a GPS device, of a single location in Hillsborough County, Florida, near Tampa. The spatial average of the measurements (horizontal only) is shown by the green circle at the center, while degrees of spread, or estimates of precision, are shown by the blue circular zones. The central zone contains 50% of the measurements, while the outer-most zone contains about 95% of the measurements. This is a measure of the precision of the data gathering.

Horizontal accuracy of the measurements is indicated by the distance between the average of the measured points (green circle) and the reference location, which is know to be truer to the real position on the ground (red triangle). The fact that the average of the measured points lies about 3 meters southeast of the reference location indicates a bias in the measured point toward the southeast: a systematic error in accuracy.

Here is a table of the horizontal and vertical accuracy and precision for these GPS measurements.

-Horizontal precision (68%): 4.4 meters

-Horizontal Accuracy

(Distance between average location and

reference (“true”) point): 3.2 meters

-Vertical precision (68%): 5.7 meters

-Vertical Accuracy

(Difference between average elevation and

reference (“true”) point elevation): 6.0 meters

As seen in the table above, if you’re comparing horizontal to vertical for either precision or accuracy, vertical error is larger in both cases. If you compare horizontal precision to horizontal accuracy, and then vertical precision to vertical accuracy, the pairs of values are fairly similar. The 68% estimate for vertical precision is slightly better than the vertical accuracy.

Friday, July 11, 2014

Module 8 - Application in GIS - Least-Cost Path and Corridor Analysis

In the last regular lab for Application in GIS this term, we worked on least cost paths and corridors. This is somewhat similar to suitability analysis, in that we can assign weights and values to various criteria, then combine them to obtain a final best result.

Instead of suitability, we refer to "cost" in this type of study. Cost can be time, or money, or anything else we must expend to get from a source to a destination.

In this type of analysis, a cost surface is computed, and from that a cost-distance surface. The latter calculates the cells of a raster that are easiest to travel across, based on the criteria, from a source to a destination. We can then produce an ideal path across the surface that will incur the minimum cost of travel.

A corridor can also be found from one place to another. Instead of a single, one-cell-wide path, the corridor is a zone of lesser cost. We can specify a threshold to define the corridor, saying that only costs below a certain value are desirable. The map here shows a movement corridor for black bears between two areas of National Forest in Arizona. The criteria for finding the corridor were land cover type, elevation, and distance from roads. The green area on the map shows the most optimal area through which the bears can travel from either direction, between the two tracts of National Forest.

Monday, July 7, 2014

Module 7 - Applications in GIS - Spatial Accessibility Modeling

This week in Application in GIS, we worked with spatial accessibility modeling. This is the analysis of how well people are able to get to facilities that they need in their communities. We use the ArcMap Network Analysis extension for this type of study. The network dataset consists of streets, junctions, and many attributes associated with them, such as speed limits, stops, one-ways streets and other restrictions and impediments to flow of traffic. By inserting the locations of facilities that people need to access, along with locations or zones in which people travel from, we can plot the fastest or most direct routes.
In this particular assignment, we examined the impact of the closure of a community college on the population of Travis County, Texas. The above map shows polygons created in Network Analysis for the travel times from each census block group to the closest college campus of the Austin Community College system. In the right-hand map, the travel times are examined for the six remaining campuses when the northern-most campus (Cypress Creek, represented by the red dot on the left-hand map) is closed. Further analysis of the census block group data in Network Analyst revealed that more than 4000 people between the ages of 18 and 29 (most potentially college-aged) live closest to the Cypress Creek campus, and would have to travel further to an alternate campus if their nearest campus were closed. In the event that Cypress Creek Campus were closed, most of these displaced potential students would be next-closest to the Northridge Campus. and a few would be closest to the Pinnacle Campus. The average increase in travel time for those displaced potential students to their new college campuses is about 9 minutes.

Monday, June 30, 2014

Module 6 - Applications in GIS - Suitability Analysis

This week in Applications in GIS we practiced suitability analysis, in which several criteria are overlaid and the overall most suitable areas for a particular need or phenomenon are generated. We studied two types of analysis. The first is Boolean, sometimes referred to as binary suitability analysis. Areas are deemed either suitable, or not suitable, based on the criteria of interest. For example, any area with a slope greater than 3 degrees might be classed as unsuitable, while any area with a slope of less than 3 degrees might be suitable. Several criteria can be combined, such as distance to roads or streams, types of land cover, price of land, etc.. Areas in which all criteria are suitable are then considered generally suitable for the purpose at hand.

The other type of suitability analysis is more flexible, because the criteria are judged on a scale of suitability values, and may be assigned different weights or degrees of relative importance.

The figure above shows two scenarios for land suitability depending on slope, land cover, soil, distance from streams and distance from roads. In the map on the left, all 5 criteria are rated on a scale of 1-5 and weighted with equal importance. In the map on the right, slope is twice as important, and distances to streams and roads are half as important. This area contains a flat-floored valley in the center, with mountains on either side. Therefore, making slope more important (with low slope being most suitable) produces larger areas of both "Very High" suitability areas (dark green) and "Low" suitability areas (dark red), as in the case of the right-hand scenario. In the analysis with equally-weighted criteria (left), there are larger areas of neutral or somewhat high suitability, but smaller areas of very low or very high suitability. Although 5 suitability classes (Very Low to Very High) were applied to the individual criteria in this exercise, the combinations of all of the criteria did not produce any areas that can be considered of generally "Very Low" suitability.

Monday, June 23, 2014

Module 5 - Applications in GIS - Visibility Analysis

Security Camera Coverage of the Boston Marathon Finish Line

This week's Application in GIS was visibility analysis. We studied how viewsheds and lines of sight are developed in ArcGIS. A viewshed is a representation of everything that is visible or not visible from particular observation points or along lines of observation such as roads. A line of sight is a single line between an observer and a target, showing which portions of the line are visible to the observer and which are not. Applications that we worked with included an analysis of the visibility of landscapes in Yellowstone National Park and the best placement of security cameras at the finish line of the Boston Marathon (above).

For the Boston Marathon exercise, we started with the view that a single camera has (#1 on the left), then experimented with the placement, angle and height from the ground of two more. For this, we used Viewshed and Visibility tools in ArcMap, adjusting the positions of cameras #2 and #3, as well as their height from the ground and the 90 degree angle they cover. This figure shows my best result for surveillance camera coverage of the finish line. The map indicates areas of visibility from all 3 cameras (yellow), any 2 cameras (green) or only 1 camera (purple). Areas that are not seen by the cameras at all have no color overlay.

Here is a table with the details of the camera parameters. (They are numbered 1, 2, and 3, from west/left to east/right. In this figure the camera positions are represented by red dots, but are a little hard to see.)

Surveillance Camera Parameters

All in all, this week's topic was really interesting, and I continue to be amazed at the versatility of ArcGIS!

Friday, June 20, 2014

GIS Programming Module 5 - Geoprocessing with Python

In this week's GIS Programming lab, we learned a lot about using Python to access the ArcMap tools.
This week was something of a watershed for me, because the help pages for the tools makes much more sense than it did before, and I am also more comfortable with the syntax of modules and functions, in their operation as toolboxes and tools. Python, as it works with ArcGIS, is falling into place.

In the figure above, the results are shown for the script this week: the messages for running the tools.
Our task was to first assign XY coordinates to a point shapefile, then buffer it, then dissolve the buffer using the Dissolve tool in Data Management.

The summary of my process for writing this script can be accessed at this link:.

Process Notes for Module 5

Monday, June 16, 2014

Module 4 - Applications in GIS - Crime Analysis

This week's lab covered spatial and temporal crime analysis by comparing three different methods of determining crime hotspots. The map layout here shows the kernel density method, grid analysis, and local Moran I method. They are based on distribution of points representing locations of burglaries.
To use the kernel density tool, we used the point data for burglaries to make contour map and showed the hotspot with the values at least 3 times the mean density. With the grid-based thematic mapping, burglary point data is overlaid on a grid and the occurrence of burglaries within each grid cell is shown. The top 20% or quintile of the grids are used to create a single polygon which represents the hotspot. The local Moran's I method is a little different, because can be thought of as having a predictive quality. Rather than only representing the current year's burglary point density, it also shows clusters of higher than normal density. Areas with slightly lower crime densities that would not show up on the kernel or grid map hotspots do show up on this map because of their proximity to high crime areas. This goes along with criminology near repeat theories that state that crimes are much more likely to occur where they have occurred in the past, as well as near those areas. Examples in the above map are the pink areas that do not intersect the blue (grid) and yellow (kernel density) hotspots. These are areas that police should monitor a little more carefully, because of their proximity to high-crime areas, and the tendency of crime to spread spatially.
The Kernel and local Moran I analyses were carried out using tools in the Spatial Analyst toolbox: Kernel Density and Cluster and Outlier Analysis tools. The Grid map was accomplished using the Spatial Join tool in the Analysis Overlay toolbox. Point data for burglary locations in 2007 and 2008.

Friday, June 13, 2014

GIS Programming Module 4 - Python Fundamentals Part 2

This week we learned more about Python scripting, especially conditional statements using IF and WHILE. We also did more with lists and started working with modules that must be imported into PythonWin.

The results of the scripting assignment are shown here. The first block is a virtual dice-rolling game that generates random numbers using the imported RANDOM module. We only had to correct a couple of errors in the existing script. In the second block, we generated a list of 20 random numbers, again using the RANDOM module, and a WHILE loop and a counting variable. Finally, in the last block, we took that list of random numbers and removed all instances of an "unlucky" number from it. For this, I used the remove method, inside a WHILE conditional loop, with IN and NOT IN operators.
In addition, I imported the SYS method to Python and used it to allow the user to enter their own unlucky number into the Run Script dialogue box before running the script.. This was discussed in chapter 4 of our text and is very straightforward and convenient way to get input from the user.

Here is a portion of my process in this lab:

Part 1, Step 4: Removing numbers from the list. Explain how you created this block of code.

1. I used in and not in within an if, else conditional structure to determine if the unlucky number was there or not.

2. After the else, I used a new while loop containing the remove method:

while unlucky in list:

list.remove (unlucky)

This ran the code until the in condition no longer applied in the while loop.

Wednesday, June 11, 2014

GIS Participation Assignment # 1: How is GIS used in the real world?

For this assignment, I read about a very interesting application of GIS Network Analysis, which could be expanded and applied to many needs. As we learned in Intro to GIS class, ArcMap Network Analysis is well-developed for use by all kinds of motorists, and especially useful to emergency personnel such as EMT’s, the police, and firefighters to plan the most efficient route among any number of destinations or stops. While planning out a route, it incorporates various forms of impedance to traffic flow, such as stop signs, speed limits and one-way streets.

The authors of this article realized that people in wheelchairs face impedance every day, in the form of stairs, rough pavement and curbs, as they try to navigate around their cities. Their goal was to develop a mapping application to help people in wheelchairs easily plan the best routes. The researchers manually digitized the center-lines of sidewalks in the city of Northampton, UK, then incorporated DEM's to obtain slope for separate segments of the routes. At this point, Event Tables were developed from field work, which measured surface quality (starting and rolling resistance), and the presence of any type of obstacle that might frustrate a wheelchair user: steps, gutters, raised manhole covers, fixed furniture, narrow sidewalks, and several more. This latter part of the field survey was actually carried out in the company of disabled volunteers in wheelchairs, who helped the researchers identify impediments.

The application was developed in ESRI’s ArcView (the precursor to ArcGIS for Desktop Basic) and featured a user-friendly interface. The routes were calculated by considering impedance or “cost” with the network analysis environment as well as user physical ability and type of wheelchair. Two modes of route selection were offered via dialogue boxes: a single best route from one location to another, or all wheelchair-suitable routes from a starting point.

The authors tested their map application with 18 wheelchair users in that city and got very favorable reviews about its usefulness and clarity.

This type of network analysis would be very useful in any area where pedestrians need to navigate, especially tourists, for mapping restaurants, bus stops, presence or absence of sidewalks, etc. These data can also be crowd-sourced, for example, with OpenStreetMap, which we read about in our Cartography class.

This article can be accessed at this link: Mapping for Wheelchair Users: Route Navigation in Urban Spaces .
Source: Beale, L., Field, K., Briggs, D., Picton, P., and Matthews, H. (2006). Mapping for Wheelchair Users: Route Navigation in Urban Spaces. The Cartographic Journal, Vol. 43 (No. 1), pp. 68–81.

(I have placed the PDF in my UWF I:\ drive because it was downloaded from my local library’s EBSCO database, for which access is limited to local card holders.)

Monday, June 9, 2014

Module 3 - Applications in GIS - Damage Assessment

In this week's Applications in GIS lab assignment, we studied damage assessment, specifically for Hurricane Sandy, which hit the east coast of the United States in October 2012.

This figure shows a portion of the New Jersey coast, with post-storm imagery for structures in three city blocks. We used ArcMap to compare post-storm and pre-storm images and assigned degrees of damage to each building in this study area. The damage assessment was based on structural damage, wind damage and inundation of each building. Because of the presence of sand and moisture on the streets of these blocks post-storm, I concluded that probably all of the structures had been inundated to some degree. For a house to be classified as destroyed (red dots), part or all of it had to have been reduced to a pile of rubble. Major structural damage was assigned to a house that still retained its pre-storm shape, but is surrounded by rubble and has other structures pushed against it. Wind damage was judged partly by the exposure of a structure to the coast, and condition of trees around it, and the presence or absence of debris near it.

Here is a table classifying the houses based on their degree of damage and distance from the shore. I used Select by Attribute and Select by Location to tabulate this data.

Results Table for Distance from Coastline vs. Damaged Property

Friday, June 6, 2014

GIS Programming Module 3 - Python Fundamentals Part 1

In this week's GIS Programming lab, we learned more about how to write scripts, and the nature of some of the data types in Python, such as numeric values, strings and lists. We learned the definitions of expressions, statements, functions, methods and objects. We practiced using methods and wrote a small script in which the input is our full name, in the form of a list. Running the script causes the last name only to be output, as well as the number of letters in the last name times 3.
My output is shown above. It consists of my last name, and the number of letters in it (5) times 3 (15).

In order to write this code, I utilized the split method, to separate the whole name string into separate names/strings in a list, then indexing to select the last name from the list, and the length function combined with a simple multiplication operator, to find the number of letters in the last name times 3. Print statements were also included in the script to show the last name and for the number of letters times 3.

Monday, June 2, 2014

Module 2 - Applications in GIS - Coastal Flooding

Impact of sea level rises up to 6 feet on the population of Honolulu

This week in Applications in GIS we learned a lot about working with raster data. There is a lot to learn. I really started to have an appreciation for the elegance and simplicity of raster logic this week. We used DEM imagery of the coast of Oahu at Honolulu to construct ahypothetical scenarios of 3-foot and 6-foot sea level rises. This was done by selecting those portions of the raster elevation data below 3 or 6 feet, then converting it into a vector polygon and draping it across the coastal areas of Honolulu. We then applied US Census data to the map, by tract, and showed the population density compared to sea level rise.

In the second part of the exercise, we took a closer look at the demographics in Honolulu, and compared the percentages of white, home-owner, and older than 65 years in the census blocks to zones of possible inundation.

Here is a table showing some demographics of Honolulu compared to the expected zones of inundation..

The white population is seen to be somewhat more impacted by sea-level rises. Owner-occupied houses are less impacted. People over 65 years old are not disproportionately impacted.

In the second part of the lab, we analyzed the effects of a storm surge in Collier County, Florida on buildings in the city of Naples. We compared the results using a USGS DEM imagery created from photogrammetry, and LIDAR imagery, to get an idea of their relative reliability. We used the DEM's to construct a zone of 1 meter storm surge inundation and overlaid the resulting polygons on shapefiles of the city buildings' footprints. Then, by calculating errors of omission and commission, we evaluated the reliability of the USGS DEM against the presumably more accurate LIDAR DEM.

This table shows numbers of different types of buildings impacted by a 1-meter storm surge, as evaluated by the USGS and LIDAR DEM's.
We are assuming the LIDAR data is superior, so errors of omission are those buildings that counted by LIDAR as being impacted, but NOT counted by the USGS DEM. The errors of commission, on the other hand, are those buildings that are incorrectly counted by the USGS DEM analysis, but do not show up when LIDAR is used. The percentage error for each is calculated against the total LIDAR building counts.

Labels