Wednesday, November 19, 2014

Module 12 - Special Topics in GIS - Geographically Weighted Regression

Last week in Special Topics, we learned how to run a linear regression in ArcMap, using the OLS tool: this stands for Ordinary Least Squares.  This analysis method gathers explanatory variable values from the entire study area and thus does not take into account spatial variation in correlation between variables.  In other words, it assumes that the relationship between, for example, median income and rate of auto theft, is the same over the whole study area.  This is probably not the case.

The GWR (Geographically Weighted Regression) tool re-runs the regression analysis repeatedly over small, local areas within the general study area, then produces a correlation coefficient for every data location from the input, between each explanatory variable and the dependent variable.  Thus, we can see if and how relationships between variables change spatially.

In this lab, I examined rate of auto theft as a function of four explanatory variables: percent Black population, percent Hispanic population, percent renter-occupied housing units, and median income.  By running the Moran's I tool for spatial autocorrelation, I could see that there would be clustering of similar values across the area.

I then ran the same data through the GWR analysis, with the default bandwidth method of AIC.   There was no significant improvement in the model performance from the OLS method, based on AIC score and Adjusted R-squared values (AIC is a relative measure, comparing distances between various models to an unknown "truth"; Adjusted R-squared is the percentage of variation of the dependent variable that is explained by the explanatory variables).

The fact that the GWR and OLS tests produced nearly identical results suggested that the GWR analysis settings were ranging too far from each location to collect data points for each small local regression.  (In the extreme, a GWR analysis that accepts ALL of the points in the study area as neighbors will be just the same as the global OLS analysis of the whole area.)  The solution to this problem is possibly to assign the number of neighbors used by the GWR a particular quantity, rather than letting the AIC calculation figure it out.

I tried a few versions of both Fixed and Adaptive GWR and found that Adaptive GWR with 15 neighbors produced the best-performing model to explain the correlation between auto theft and the four explanatory variables, based on AIC and Adjusted R-squared.

No comments:

Post a Comment