Tractor
Overview
This research was a part of a greater project headed by Dr. William White and Dr. Richard Steckel of The Ohio State University’s Department of Economics. The core focus of the project was to look at the ways education was affected by early tractorization and rural manufacturing.
The Data
Much of the data came from Michale Haine’s ICPSR Study Number 2896. However quite a bit of cleaning and formatting was required to create sets which allowed for our analysis.
The Focus
This was broken down into several parts, atttempting to answer several different questions.
Each of these questions focused around rural counties based on RUCC codes. Highly urban areas were not of interest as they naturally would not contain much farming and thus tractorization would be minimal to non existent, manufacturing was already present in these areas and in general education was high relative to the rural communities.
-
Did early adoption of tractors lead to more manufacturing in these communities? 1A. Did the injection of manufacturing in these rural communities have an effect on education?
-
Was there an increase in average education level in the counties which had early adoption of tractors.
Value Added By Manufacturing
VAM is a metric that describes the how economy of a region, in this case a county, benefitted from the manufacturing industry within that space. This data was sparse but in order to gain best high-level understanind as possible, I created a map with gradient to visualize counties with highest and lowest VAM across our selected time period. Below are comparitive maps, one will all counties availble, the other with urban counties removed.
Table Of Contents
Tractor Diffuision by State and Region
Note about VAM Plots: Any counties with N/A value for VAM were replaced with 0. These are so few that I believe it has minimal effect on the trend lines.
#Function for producing EDU dataset
#
#
eduDataFunc <- function(state){
edu_data <- subset(oth_state, stateicp == state)
edu_data <- subset(edu_data, sex == "Male")
edu_data <- subset(edu_data, between(age, 18,22))
edu_data <- subset(edu_data, higrade < 20)
edu_data <- subset(edu_data, higrade > 0)
edu_data$higrade <- edu_data$higrade - 3
reduced_edu_data <- edu_data %>%
group_by(county) %>%
summarise(avg = mean(higrade))
return(reduced_edu_data)
}
#
#
#
eduMedianFunc <- function(state){
edu_data <- subset(oth_state, stateicp == state)
edu_data <- subset(edu_data, sex == "Male")
edu_data <- subset(edu_data, between(age, 18,22))
edu_data <- subset(edu_data, higrade < 20)
edu_data <- subset(edu_data, higrade > 0)
edu_data$higrade <- edu_data$higrade - 3
reduced_edu_data <- edu_data %>%
group_by(county) %>%
summarise(med = median(higrade))
return(reduced_edu_data)
}
#Function for producing Tractor set given State Abb
#
#
tractDataFunc <- function(stateAbbre){
stateDf <- subset(df30, stateAbb == stateAbbre)
stateDf$county <- stateDf$county * 10
stateDf$X <- NULL
return(stateDf)
}
#
#
#
#National Level
Looking at the national level of percent of farms with tractors over varying years. The percentages are measured on the county level.
% Farm Tractor 1940 vs 1930
df30_40 <- merge(df30, df40, by = c("name", "stateAbb"), all = FALSE)
ggplot(data = df30_40, aes(x=percent_farm_tractor.x, y=percent_farm_tractor.y)) + geom_point() + xlab(label = "% of Farms w/ Tractors 1930's") + ylab(label = "% of Farms w/ Tractors 1940's") + ggtitle(label = "National Level")

% Farm Tractor 1950 vs 1940
df40_50 <- merge(df40, df50, by = c("name", "stateAbb"), all = FALSE)
ggplot(data = df40_50, aes(x=percent_farm_tractor.x, y=percent_farm_tractor.y)) + geom_point() + xlab(label = "% of Farms w/ Tractors 1940's") + ylab(label = "% of Farms w/ Tractors 1950's") + ggtitle(label = "National Level")

% Farm Tractor 1954 vs 1950
df50_54 <- merge(df50, df54, by = c("name", "stateAbb"), all = FALSE)
ggplot(data = df50_54, aes(x=percent_farm_tractor.x, y=percent_farm_tractor.y)) + geom_point() + xlab(label = "% of Farms w/ Tractors 1950") + ylab(label = "% of Farms w/ Tractors 1954") + ggtitle(label = "National Level")

df50_54$logResponse <- log(df50_54$percent_farm_tractor.y)
df50_54$logX <- log(df50_54$percent_farm_tractor.x)
ggplot(data = df50_54, aes(x=percent_farm_tractor.x, y=logResponse)) + geom_point() + xlab(label = "% of Farms w/ Tractors 1950") + ylab(label = "Log(% of Farms w/ Tractors 1954)") + ggtitle(label = "National Level")

ggplot(data = df50_54, aes(x=logX, y=logResponse)) + geom_point() + xlab(label = "Log(% of Farms w/ Tractors 1950)") + ylab(label = "Log(% of Farms w/ Tractors 1954)") + ggtitle(label = "National Level")

% Farm Tractor 1964 vs 1954
df54_64 <- merge(df54, df64, by = c("name", "stateAbb"), all = FALSE)
ggplot(data = df54_64, aes(x=percent_farm_tractor.x, y=percent_farm_tractor.y)) + geom_point() + xlab(label = "% of Farms w/ Tractors 1954") + ylab(label = "% of Farms w/ Tractors 1964") + ggtitle(label = "National Level")

OHIO
Mean Edu vs. Tractor
Median Edu vs. Tractor
Value Added by Manufacturing
edu <- read.dta13("ohio1940.dta")
ohio1930 <- subset(df30, stateAbb == "OH")
#Adjust FIPS code representation to match the EDU dataset
ohio1930$county <- ohio1930$county * 10
#Remove Pickaway
ohio1930 <- subset(ohio1930, county != 129)
#SUBSET to White males between 18,22 and higrade < 20 which would be graduate school
edu <- subset(edu, sex == "Male")
edu <- subset(edu, between(age, 18, 22))
edu <- subset(edu, higrade < 20)
edu$higrade <- edu$higrade - 3
#Remove all values less than 1
edu <- edu[which(edu$higrade > 0),]
#THIS IS FINAL 1940 EDU, write out to new dataset
write.csv(edu, "ohio1940edu.csv")
newEdu <- edu %>%
group_by(county) %>%
summarise(avg = mean(higrade))
ohPlotSrc <- merge(newEdu, ohio1930, by = "county")
ggplot(data = ohPlotSrc, aes(x=percent_farm_tractor, y=avg)) +xlab(label = "% Farms w/ Tractors 1930") + ylab(label = "Avg Years Education") + ggtitle(label = "Ohio") + geom_point()

LOESS REGRESSION on 1940’s Avg EDU vs % Farms w/ Tractor 1930’s for Ohio
ggplot(ohPlotSrc,aes(x = percent_farm_tractor, y = avg)) + geom_point() + geom_smooth() + xlab("1930's % Farms w/ Tractors") + ylab("1940's Edu") + ggtitle("Ohio")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

#identify(ohio1930$percent_farm_tractor, newEdu$avg)
##Transform ohio edu response on Log Scale
Ohio Mean Education Model

ohPlotSrc$logAvg <- log(ohPlotSrc$avg)
ohEduMod <- lm(logAvg ~ percent_farm_tractor, data = ohPlotSrc)
ggplot(data = ohPlotSrc, aes(x=percent_farm_tractor, y=logAvg)) + geom_point() + geom_smooth(method = "lm") + xlab(label = "% Farms Tractor 1930's") + ylab(label = "Log(Avg Education 1940's)") + ggtitle(label = "Log-Linear(Ohio Edu vs Tractor)")
## `geom_smooth()` using formula 'y ~ x'

summary(ohEduMod)
##
## Call:
## lm(formula = logAvg ~ percent_farm_tractor, data = ohPlotSrc)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.121748 -0.021526 0.008973 0.029746 0.066242
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.280754 0.009254 246.458 < 2e-16 ***
## percent_farm_tractor 0.186915 0.035457 5.272 9.92e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04027 on 86 degrees of freedom
## Multiple R-squared: 0.2442, Adjusted R-squared: 0.2354
## F-statistic: 27.79 on 1 and 86 DF, p-value: 9.92e-07
Ohio - Median Education Plots
eduMedian <- edu %>%
group_by(county) %>%
summarise(med = median(higrade))
# Log of Education Median
eduMedian$logMed <- log(eduMedian$med)
ggplot(data = ohio1930, aes(x=jitter(percent_farm_tractor), y=jitter(eduMedian$med))) + geom_point() + xlab(label = "% Farms w/ Tractors 1930's") + ylab(label = "1940's Median Avg. Education")

ggplot(data = ohio1930, aes(x=jitter(percent_farm_tractor), y=jitter(eduMedian$logMed))) + geom_point() + xlab(label = "% Farms w/ Tractors 1930's") + ylab(label = "1940's Median Avg. Education")

Ohio - Median Education Model
ohPlotSrc_med <- merge(eduMedian, ohio1930, by = "county")
ohEduMod_med <- lm(logMed ~ percent_farm_tractor, data = ohPlotSrc_med)
summary(ohEduMod_med)
##
## Call:
## lm(formula = logMed ~ percent_farm_tractor, data = ohPlotSrc_med)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.23988 -0.01976 0.01381 0.03978 0.09668
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.29588 0.01401 163.899 <2e-16 ***
## percent_farm_tractor 0.24231 0.05367 4.515 2e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.06095 on 86 degrees of freedom
## Multiple R-squared: 0.1916, Adjusted R-squared: 0.1822
## F-statistic: 20.38 on 1 and 86 DF, p-value: 2.002e-05
oh_med_df <- as.data.frame(resid(ohEduMod_med))
oh_med_df$county <- ohPlotSrc_med$name
colnames(oh_med_df) <- c( "residual", "county")
Ohio Median Model County Residuals
head(oh_med_df)
## residual county
## 1 -0.11295981 ADAMS
## 2 0.04748636 ALLEN
## 3 0.05497776 ASHLAND
## 4 0.04637276 ASHTABULA
## 5 0.08549094 ATHENS
## 6 0.02526599 AUGLAIZE
OH EDU VS % Farm’s w/ Tractors S.S.E.
#Calculate SSE
pred_ohEdu <- predict(ohEduMod, ohPlotSrc)
#Undo log transformation
pred_ohEdu <- exp(pred_ohEdu)-1
##Sum of Square error
mean((ohPlotSrc$avg - pred_ohEdu)^2)
## [1] 1.174032
#Residuals
ohPlotSrc$modResid <- resid(ohEduMod)
ohPlotSrc$fitted <- predict(ohEduMod)
ggplot(ohPlotSrc, aes(x = fitted, y = modResid)) + geom_point() + geom_hline(yintercept = 0, colour = "red") + ggtitle(label = "OH Edu vs Tractor Residuals")

OHIO Value Added by Manufacturing
VAM 1947 vs % 1940
#SCATTER DIAGRAM AND REGRESSION FOR OHIO VAM VS EDU
#VAM dataset
vam_ohio <- subset(vam, state == "Ohio")
#write.csv(vam_ohio, "vamOhio.csv")
#VAM in 1947 vs % Tractors 1940
vam_ohio <- subset(vam_ohio, year == 1947)
ohio1940 <- subset(df40, stateAbb == "OH")
ohio1930 <- subset(df30, stateAbb == "OH")
ohio1940$county <- ohio1940$county * 10
ggplot(data = ohio1940, aes(x=percent_farm_tractor, y=vam_ohio$VAM)) + geom_point() + xlab(label = "% Farm's w/ Tractors 1940") + ylab(label = "VAM 1947") + ggtitle(label = "Ohio VAM vs. % Tractor") + geom_smooth(method = "lm")
## `geom_smooth()` using formula 'y ~ x'

VAM 1958 vs % 1950
ohio1950 <- subset(df50,stateAbb == "OH" )
vam_ohio1957 <- subset(vam, year == 1958 & state == "Ohio")
ohVam_mod <- lm(vam_ohio1957$VAM ~ ohio1950$percent_farm_tractor)
summary(ohVam_mod)
##
## Call:
## lm(formula = vam_ohio1957$VAM ~ ohio1950$percent_farm_tractor)
##
## Residuals:
## Min 1Q Median 3Q Max
## -569.63 -189.42 -25.26 194.35 616.99
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 188.7 119.4 1.581 0.11950
## ohio1950$percent_farm_tractor 546.8 188.4 2.902 0.00526 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 273.3 on 57 degrees of freedom
## (29 observations deleted due to missingness)
## Multiple R-squared: 0.1288, Adjusted R-squared: 0.1135
## F-statistic: 8.424 on 1 and 57 DF, p-value: 0.005256
ggplot(data = ohio1950, aes(x=percent_farm_tractor, y=vam_ohio1957$VAM)) + geom_point() + xlab(label = "% Farm's w/ Tractors 1950") + ylab(label = "VAM 1958") + ggtitle(label = "Ohio VAM vs. % Tractor") + geom_smooth(method = "lm")
## `geom_smooth()` using formula 'y ~ x'

VAM 1947 vs % 1930
ohVam_mod1947 <- lm(vam_ohio$VAM ~ ohio1930$percent_farm_tractor)
summary(ohVam_mod1947)
##
## Call:
## lm(formula = vam_ohio$VAM ~ ohio1930$percent_farm_tractor)
##
## Residuals:
## Min 1Q Median 3Q Max
## -447.78 -210.40 -43.62 235.08 562.43
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 214.00 62.37 3.431 0.000959 ***
## ohio1930$percent_farm_tractor 827.97 240.82 3.438 0.000938 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 269.5 on 79 degrees of freedom
## (7 observations deleted due to missingness)
## Multiple R-squared: 0.1302, Adjusted R-squared: 0.1191
## F-statistic: 11.82 on 1 and 79 DF, p-value: 0.0009376
ggplot(data = ohio1930, aes(x=percent_farm_tractor, y=vam_ohio$VAM)) + geom_point() + xlab(label = "% Farm's w/ Tractors 1930") + ylab(label = "VAM 1947") + ggtitle(label = "Ohio VAM vs. % Tractor") + geom_smooth(method = "lm")
## `geom_smooth()` using formula 'y ~ x'

1940 Census Dataset; Pull other midwest states from here
# NEW DATASET! -> EDU 1940 vs % Tractor 1930
#oth_state <- read.dta13("cen1940A.dta")
#saveRDS(oth_state, "midwest_state.rds")
# Restore the data object. Shit is huge...I was tired of loading it over and over...
oth_state = readRDS(file = "midwest_state.rds")
#States:
# Michigan, Wisconsin, Indiana, Illinois, Iowa, Minnesota
MICHIGAN
## [1] "use `stateicp` as locations\nage between 18,22\nhigrade < 20"
#LOESS REGRESSION on EDU vs % Tractor
ggplot(michPlotSrc,aes(x = percent_farm_tractor, y = avg)) + geom_point() + geom_smooth() + xlab("1930's % Tractor") + ylab("1940 Avg. Edu") + ggtitle("Michigan")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

#Transform response on Log Scale
michPlotSrc$logAvg <- log(michPlotSrc$avg)
ggplot(data = michPlotSrc, aes(x=percent_farm_tractor, y=logAvg)) + geom_point() + ggtitle(label = "Log-Linear")

michEduMod <- lm(logAvg ~ percent_farm_tractor, data = michPlotSrc)
summary(michEduMod)
##
## Call:
## lm(formula = logAvg ~ percent_farm_tractor, data = michPlotSrc)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.219400 -0.036979 0.000761 0.041468 0.100975
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.25242 0.01312 171.704 <2e-16 ***
## percent_farm_tractor 0.17868 0.07160 2.496 0.0146 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.05275 on 81 degrees of freedom
## Multiple R-squared: 0.07139, Adjusted R-squared: 0.05993
## F-statistic: 6.228 on 1 and 81 DF, p-value: 0.01461
#Residuals
michPlotSrc$modResid <- resid(michEduMod)
michPlotSrc$fitted <- predict(michEduMod)
ggplot(michPlotSrc, aes(x = fitted, y = modResid)) + geom_point() + geom_hline(yintercept = 0, colour = "red") + ggtitle(label = "MI Edu vs Tractor Residuals")

#Michigan - Median
michMedian <- eduMedianFunc("Michigan")
ggplot(data = mich_tract, aes(x=percent_farm_tractor, y=jitter(michMedian$med))) + geom_point() + ggtitle(label = "Michigan-Median")

michMedian$logMed <- log(michMedian$med)
ggplot(data = mich_tract, aes(x=jitter(percent_farm_tractor), y=jitter(michMedian$logMed))) + geom_point() + xlab("% 1930's") + ylab("Log Median years education") + ggtitle("Michigan-Log(Median)")

Michigan - Median Model
plotSrc_med_mich <- merge(michMedian, mich_tract, by = "county")
michEduMod_med <- lm(logMed ~ percent_farm_tractor, data = plotSrc_med_mich)
summary(michEduMod_med)
##
## Call:
## lm(formula = logMed ~ percent_farm_tractor, data = plotSrc_med_mich)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.25098 -0.06656 0.02238 0.05473 0.14252
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.23574 0.02284 97.892 < 2e-16 ***
## percent_farm_tractor 0.33943 0.12465 2.723 0.00792 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.09184 on 81 degrees of freedom
## Multiple R-squared: 0.08386, Adjusted R-squared: 0.07255
## F-statistic: 7.415 on 1 and 81 DF, p-value: 0.007921
mi_med_df <- as.data.frame(resid(michEduMod_med))
mi_med_df$county <- plotSrc_med_mich$name
colnames(mi_med_df) <- c( "residual", "county")
Michigan Median Model County Residuals
mi_med_df
## residual county
## 1 -0.187275178 ALCONA
## 2 -0.085014801 ALGER
## 3 0.008819658 ALLEGAN
## 4 -0.079919458 ALPENA
## 5 -0.077488032 ANTRIM
## 6 -0.200353056 ARENAC
## 7 0.025425850 BARAGA
## 8 0.110906381 BARRY
## 9 -0.012837076 BAY
## 10 0.031026497 BENZIE
## 11 0.107743596 BERRIEN
## 12 0.008707873 BRANCH
## 13 0.111953523 CALHOUN
## 14 0.120925711 CASS
## 15 0.036165627 CHARLEVOIX
## 16 -0.065297665 CHEBOYGAN
## 17 0.020599745 CHIPPEWA
## 18 0.033014005 CLARE
## 19 -0.042063207 CLINTON
## 20 0.014172802 CRAWFORD
## 21 0.012887339 DELTA
## 22 0.101942375 DICKINSON
## 23 0.072106438 EATON
## 24 -0.066056828 EMMET
## 25 0.055198261 GENESEE
## 26 -0.069792934 GLADWIN
## 27 0.142516707 GOGEBIC
## 28 0.011545445 GRAND TRAVERSE
## 29 -0.032579808 GRATIOT
## 30 0.100856541 HILLSDALE
## 31 0.039725469 HOUGHTON
## 32 -0.250981531 HURON
## 33 0.068038995 INGHAM
## 34 -0.128137603 IONIA
## 35 0.027636661 IOSCO
## 36 0.130371319 IRON
## 37 -0.101475239 ISABELLA
## 38 0.105659303 JACKSON
## 39 0.100879266 KALAMAZOO
## 40 -0.075924113 KALKASKA
## 41 0.104813894 KENT
## 42 -0.169611558 KEWEENAW
## 43 0.046514135 LAKE
## 44 -0.216084187 LAPEER
## 45 -0.075546269 LEELANAU
## 46 0.044995306 LENAWEE
## 47 0.076575847 LIVINGSTON
## 48 -0.001805789 LUCE
## 49 -0.093317083 MACKINAC/MICHILIM
## 50 0.061968971 MACOMB
## 51 0.039619066 MANISTEE
## 52 0.126050759 MARQUETTE
## 53 0.116743765 MASON
## 54 0.040682349 MECOSTA
## 55 0.025128222 MENOMINEE
## 56 -0.004886346 MIDLAND
## 57 -0.183214337 MISSAUKEE
## 58 -0.032406605 MONROE
## 59 0.024902446 MONTCALM
## 60 0.022383630 MONTMORENCY
## 61 0.114975307 MUSKEGON
## 62 -0.020210397 NEWAYGO
## 63 0.051362076 OAKLAND
## 64 0.027366523 OCEANA
## 65 -0.069876049 OGEMAW
## 66 -0.080123158 ONTONAGON
## 67 0.048305016 OSCEOLA
## 68 -0.010300206 OSCODA
## 69 -0.067060519 OTSEGO
## 70 0.025111206 OTTAWA
## 71 -0.192049448 PRESQUE ISLE
## 72 0.032899974 ROSCOMMON
## 73 -0.040745153 SAGINAW
## 74 -0.013826630 ST CLAIR
## 75 0.117125263 ST JOSEPH
## 76 -0.103298228 SANILAC
## 77 0.014549838 SCHOOLCRAFT
## 78 -0.043640768 SHIAWASSEE
## 79 -0.134888560 TUSCOLA
## 80 0.119313235 VAN BUREN
## 81 0.054252119 WASHTENAW
## 82 0.045686965 WAYNE
## 83 0.047936516 WEXFORD
#SCATTER DIAGRAM AND REGRESSION FOR MICHIGAN VAM VS EDU
mich_vam <- subset(vam, state == "Michigan" & year == 1947)
ggplot(data=mich_tract, aes(x=percent_farm_tractor, y=mich_vam$VAM)) + geom_point() + geom_smooth(method = "lm") + ggtitle("Michigan VAM 1947 vs % Tractor 1940")
## `geom_smooth()` using formula 'y ~ x'

outlierMich <- michPlotSrc[44,]
outlierMich
## county avg year state name percent_farm_tractor stateAbb fips
## 44 870 7.881265 1930 26 LAPEER 0.1761294 MI 26087
## colorID logAvg modResid fitted
## 44 3 2.064488 -0.2194003 2.283889
MI EDU VS % SSE
#Calculate SSE
pred_michEdu <- predict(michEduMod, michPlotSrc)
#Undo log transformation
pred_michEdu <- exp(pred_michEdu)-1
mean((michPlotSrc$avg - pred_michEdu)^2)
## [1] 1.276815
VAM 1958 vs % 1950
mich1930 <- subset(df30,stateAbb == "MI" )
mich1950 <- subset(df50,stateAbb == "MI" )
vam_mich1957 <- subset(vam, year == 1958 & state == "Michigan")
michVam_mod <- lm(vam_mich1957$VAM ~ mich1950$percent_farm_tractor)
summary(michVam_mod)
##
## Call:
## lm(formula = vam_mich1957$VAM ~ mich1950$percent_farm_tractor)
##
## Residuals:
## Min 1Q Median 3Q Max
## -365.46 -196.66 -25.15 184.44 503.44
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 243.3 202.0 1.204 0.233
## mich1950$percent_farm_tractor 140.9 297.2 0.474 0.637
##
## Residual standard error: 241.4 on 62 degrees of freedom
## (19 observations deleted due to missingness)
## Multiple R-squared: 0.003611, Adjusted R-squared: -0.01246
## F-statistic: 0.2247 on 1 and 62 DF, p-value: 0.6372
ggplot(data=mich1950, aes(x=percent_farm_tractor, y=vam_mich1957$VAM)) + geom_point() + ggtitle("Michigan - 1957 VAM vs 1950 %")

VAM 1947 vs % 1930
vam_mich1947 <- subset(vam, year == 1947 & state == "Michigan")
michVam_mod1947 <- lm(vam_mich1947$VAM ~ mich1930$percent_farm_tractor)
summary(michVam_mod1947)
##
## Call:
## lm(formula = vam_mich1947$VAM ~ mich1930$percent_farm_tractor)
##
## Residuals:
## Min 1Q Median 3Q Max
## -427.69 -191.52 -8.99 126.28 552.66
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 71.76 62.43 1.15 0.254
## mich1930$percent_farm_tractor 1542.55 349.03 4.42 3.24e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 243.3 on 76 degrees of freedom
## (5 observations deleted due to missingness)
## Multiple R-squared: 0.2045, Adjusted R-squared: 0.194
## F-statistic: 19.53 on 1 and 76 DF, p-value: 3.24e-05
ggplot(data=mich1930, aes(percent_farm_tractor, vam_mich1947$VAM)) + ggtitle("Michigan - 1947 VAM vs 1930 %") + geom_point()

WISCONSIN
#Wisconsin Edu
wis_edu <- eduDataFunc("Wisconsin")
#Wisconsin Tractor
wis_tract <- tractDataFunc("WI")
plot(wis_tract$percent_farm_tractor, wis_edu$avg, xlab = "1930 % Tractor", ylab = "1940 Avg. Edu", main = "Wisconsin")

plot(log(wis_tract$percent_farm_tractor), wis_edu$avg)

wisPlotSrc <- merge(wis_edu, wis_tract, by = "county")
#LOESS REGRESSION on EDU vs % Tractor
ggplot(wisPlotSrc,aes(x = percent_farm_tractor, y = avg)) + geom_point() + geom_smooth() + xlab("1930's % Tractor") + ylab("1940 Avg. Edu") + ggtitle("Wisconsin")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

#Transform response on Log Scale
wisPlotSrc$logAvg <- log(wisPlotSrc$avg)
wisPlotSrc$logX <- log(wisPlotSrc$percent_farm_tractor)
plot(wisPlotSrc$percent_farm_tractor, wisPlotSrc$logAvg, main = "Log-Linear")

wisEduMod <- lm(logAvg ~ percent_farm_tractor, data = wisPlotSrc)
summary(wisEduMod)
##
## Call:
## lm(formula = logAvg ~ percent_farm_tractor, data = wisPlotSrc)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.083969 -0.025500 -0.000201 0.020619 0.092817
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.265011 0.009967 227.256 < 2e-16 ***
## percent_farm_tractor 0.100321 0.035121 2.856 0.00566 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03962 on 69 degrees of freedom
## Multiple R-squared: 0.1057, Adjusted R-squared: 0.09279
## F-statistic: 8.159 on 1 and 69 DF, p-value: 0.005656
#Residuals
wisPlotSrc$modResid <- resid(wisEduMod)
wisPlotSrc$fitted <- predict(wisEduMod)
ggplot(wisPlotSrc, aes(x = fitted, y = modResid)) + geom_point() + geom_hline(yintercept = 0, colour = "red") + ggtitle(label = "WI Edu vs Tractor Residuals")

#SCATTER DIAGRAM AND REGRESSION FOR WISCONSIN VAM VS EDU
wis_vam <- subset(vam, state == "Wisconsin" & year == 1947)
wis_vam[is.na(wis_vam)] <- 0
plot(wis_tract$percent_farm_tractor, wis_vam$VAM, xlab = "% 1940", ylab = "VAM 1947", ylim = c(0,1000), main = "Wisconsin VAM vs % Tractor")
abline(lm(wis_vam$VAM ~ wis_tract$percent_farm_tractor))

WI EDU VS % SSE
#Calculate SSE
pred_wisEdu <- predict(wisEduMod, wisPlotSrc)
#Undo log transformation
pred_wisEdu <- exp(pred_wisEdu)-1
mean((wisPlotSrc$avg - pred_wisEdu)^2)
## [1] 1.164601
VAM 1958 vs % 1950
wis1950 <- subset(df50,stateAbb == "WI" )
wis1930 <- subset(df30,stateAbb == "WI" )
vam_wis1957 <- subset(vam, year == 1958 & state == "Wisconsin")
wisVam_mod <- lm(vam_wis1957$VAM ~ wis1950$percent_farm_tractor)
summary(wisVam_mod)
##
## Call:
## lm(formula = vam_wis1957$VAM ~ wis1950$percent_farm_tractor)
##
## Residuals:
## Min 1Q Median 3Q Max
## -333.26 -192.11 -27.48 103.04 676.52
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.341 237.072 -0.010 0.992
## wis1950$percent_farm_tractor 497.488 326.784 1.522 0.134
##
## Residual standard error: 251.5 on 54 degrees of freedom
## (15 observations deleted due to missingness)
## Multiple R-squared: 0.04115, Adjusted R-squared: 0.0234
## F-statistic: 2.318 on 1 and 54 DF, p-value: 0.1337
plot(wis1950$percent_farm_tractor, vam_wis1957$VAM, main = "Wisconsin - 1957 VAM vs 1950 %")
abline(wisVam_mod, col = "red")

VAM 1947 vs % 1930
vam_wis1947 <- subset(vam, year == 1947 & state == "Wisconsin")
wisVam_mod1947 <- lm(vam_wis1947$VAM ~ wis1930$percent_farm_tractor)
summary(wisVam_mod1947)
##
## Call:
## lm(formula = vam_wis1947$VAM ~ wis1930$percent_farm_tractor)
##
## Residuals:
## Min 1Q Median 3Q Max
## -484.24 -152.21 -90.96 135.29 674.42
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 55.46 62.58 0.886 0.379
## wis1930$percent_farm_tractor 1046.06 228.27 4.583 2.1e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 243.9 on 66 degrees of freedom
## (3 observations deleted due to missingness)
## Multiple R-squared: 0.2414, Adjusted R-squared: 0.2299
## F-statistic: 21 on 1 and 66 DF, p-value: 2.098e-05
plot(wis1930$percent_farm_tractor, vam_wis1947$VAM, main = "Wisconsin - 1947 VAM vs 1930 %")
abline(wisVam_mod1947, col = "red")

INDIANA
#Indiana Edu
in_edu <- eduDataFunc("Indiana")
#Indiana Tractor
in_tract <- tractDataFunc("IN")
plot(in_tract$percent_farm_tractor, in_edu$avg, xlab = "1930 % Tractor", ylab = "1940 Avg. Edu", main = "Indiana")

inPlotSrc <- merge(in_edu, in_tract, by = "county")
#LOESS REGRESSION on EDU vs % Tractor
ggplot(inPlotSrc,aes(x = percent_farm_tractor, y = avg)) + geom_point() + geom_smooth() + xlab("1930's % Tractor") + ylab("1940 Avg. Edu") + ggtitle("Indiana")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

#Transform response on Log Scale
inPlotSrc$logAvg <- log(inPlotSrc$avg)
plot(inPlotSrc$percent_farm_tractor, inPlotSrc$logAvg, main = "Log-Linear")

inEduMod <- lm(logAvg ~ percent_farm_tractor, data = inPlotSrc)
summary(inEduMod)
##
## Call:
## lm(formula = logAvg ~ percent_farm_tractor, data = inPlotSrc)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.128838 -0.020547 0.000339 0.027807 0.068493
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.271406 0.009384 242.049 < 2e-16 ***
## percent_farm_tractor 0.243673 0.038546 6.322 9.74e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03588 on 90 degrees of freedom
## Multiple R-squared: 0.3075, Adjusted R-squared: 0.2998
## F-statistic: 39.96 on 1 and 90 DF, p-value: 9.74e-09
#Residuals
inPlotSrc$modResid <- resid(inEduMod)
inPlotSrc$fitted <- predict(inEduMod)
ggplot(inPlotSrc, aes(x = fitted, y = modResid)) + geom_point() + geom_hline(yintercept = 0, colour = "red") + ggtitle(label = "IN Edu vs Tractor Residuals")

#SCATTER DIAGRAM AND REGRESSION FOR Indiana VAM VS EDU
in_vam <- subset(vam, state == "Indiana" & year == 1947)
rownames(in_vam) <- 1:nrow(in_vam)
in_vam <- in_vam[-75,]
rownames(in_vam) <- 1:nrow(in_vam)
in_vam[is.na(in_vam)] <- 0
plot(in_tract$percent_farm_tractor, in_vam$VAM, xlab = "% 1940", ylab = "VAM 1947", ylim = c(0,1000), main = "Indiana VAM vs % Tractor")
abline(lm(in_vam$VAM ~ in_tract$percent_farm_tractor))

#Indiana - Median
inMedian <- eduMedianFunc("Indiana")
plot(jitter(in_tract$percent_farm_tractor), jitter(inMedian$med), xlab = "% 1930", ylab ="Median years education", main = "Indiana- Median")

inMedian$logMed <- log(inMedian$med)
plot(jitter(in_tract$percent_farm_tractor), jitter(inMedian$logMed), xlab = "% 1930", ylab ="Median years education", main = "Indiana- LogMedian")

Indiana - Median Model
plotSrc_med_in <- merge(inMedian, in_tract, by = "county")
inEduMod_med <- lm(logMed ~ percent_farm_tractor, data = plotSrc_med_in)
summary(inEduMod_med)
##
## Call:
## lm(formula = logMed ~ percent_farm_tractor, data = plotSrc_med_in)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.217651 -0.021463 -0.000837 0.035208 0.135319
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.27671 0.01582 143.9 < 2e-16 ***
## percent_farm_tractor 0.38349 0.06500 5.9 6.3e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0605 on 90 degrees of freedom
## Multiple R-squared: 0.2789, Adjusted R-squared: 0.2709
## F-statistic: 34.81 on 1 and 90 DF, p-value: 6.296e-08
in_med_df <- as.data.frame(resid(inEduMod_med))
in_med_df$county <- plotSrc_med_in$name
colnames(in_med_df) <- c( "residual", "county")
Indiana Median Model County Residuals
in_med_df
## residual county
## 1 -0.0039011654 ADAMS
## 2 -0.0045533470 ALLEN
## 3 0.0485600175 BARTHOLOMEW
## 4 -0.0556421083 BENTON
## 5 0.0397637243 BLACKFORD
## 6 0.0172311300 BOONE
## 7 -0.0974781526 BROWN
## 8 -0.0349227644 CARROLL
## 9 0.0353692131 CASS
## 10 -0.0304488200 CLARK
## 11 0.0702713112 CLAY
## 12 -0.0050793908 CLINTON
## 13 -0.2176513073 CRAWFORD
## 14 -0.0297271964 DAVIESS
## 15 -0.0268200706 DEARBORN
## 16 -0.0124768965 DECATUR
## 17 0.0184916342 DE KALB
## 18 0.0351542336 DELAWARE
## 19 -0.1312904249 DUBOIS
## 20 0.0608813534 ELKHART
## 21 -0.1196820773 FAYETTE
## 22 -0.0241780504 FLOYD
## 23 0.0115811945 FOUNTAIN
## 24 -0.1688033073 FRANKLIN
## 25 0.0746289970 FULTON
## 26 0.0253089310 GIBSON
## 27 -0.0007629611 GRANT
## 28 0.0866655157 GREENE
## 29 -0.0014830516 HAMILTON
## 30 0.0116994290 HANCOCK
## 31 -0.0128748566 HARRISON
## 32 0.0177421484 HENDRICKS
## 33 0.0178206224 HENRY
## 34 -0.0025762048 HOWARD
## 35 0.1042119985 HUNTINGTON
## 36 -0.0313739026 JACKSON
## 37 0.0069599110 JASPER
## 38 0.0473013013 JAY
## 39 -0.0122367435 JEFFERSON
## 40 -0.0157585697 JENNINGS
## 41 -0.0086250188 JOHNSON
## 42 0.0122629645 KNOX
## 43 0.0662756138 KOSCIUSKO
## 44 -0.0317447280 LAGRANGE
## 45 -0.0050625322 LAKE
## 46 0.0047870347 LA PORTE
## 47 -0.0144954635 LAWRENCE
## 48 0.0113978554 MADISON
## 49 0.0066256475 MARION
## 50 0.0609437234 MARSHALL
## 51 -0.0171457002 MARTIN
## 52 0.0190393905 MIAMI
## 53 -0.0009467869 MONROE
## 54 0.0065750034 MONTGOMERY
## 55 0.0373660608 MORGAN
## 56 -0.0211253141 NEWTON
## 57 0.0415637340 NOBLE
## 58 0.0745590830 OHIO
## 59 -0.0223773748 ORANGE
## 60 0.0909766279 OWEN
## 61 0.0448138997 PARKE
## 62 -0.1093545276 PERRY
## 63 -0.0145292159 PIKE
## 64 0.0201064860 PORTER
## 65 -0.1790683618 POSEY
## 66 -0.0667393709 PULASKI
## 67 0.0619769431 PUTNAM
## 68 0.0232313852 RANDOLPH
## 69 -0.0342631411 RIPLEY
## 70 -0.0356922635 RUSH
## 71 0.0262023096 ST JOSEPH
## 72 -0.0171502153 SCOTT
## 73 0.0008867328 SHELBY
## 74 -0.0079500668 SPENCER
## 75 -0.0655714442 STARKE
## 76 0.1309497809 STEUBEN
## 77 0.0687769074 SULLIVAN
## 78 -0.1054192161 SWITZERLAND
## 79 -0.0023580977 TIPPECANOE
## 80 0.0011144066 TIPTON
## 81 -0.0256990445 UNION
## 82 0.0413615048 VANDERBURGH
## 83 0.0276761811 VERMILLION
## 84 0.0607328271 VIGO
## 85 0.0055664474 WABASH
## 86 -0.0211584038 WARREN
## 87 -0.0186099550 WARRICK
## 88 -0.0263503943 WASHINGTON
## 89 -0.0009102887 WAYNE
## 90 0.0916333530 WELLS
## 91 -0.0102948964 WHITE
## 92 0.1353186218 WHITLEY
IN EDU VS % SSE
#Calculate SSE
pred_inEdu <- predict(inEduMod, inPlotSrc)
#Undo log transformation
pred_inEdu <- exp(pred_inEdu)-1
mean((inPlotSrc$avg - pred_inEdu)^2)
## [1] 1.13942
VAM 1958 vs % 1950
in1950 <- subset(df50,stateAbb == "IN" )
in1930 <- subset(df30,stateAbb == "IN" )
vam_in1957 <- subset(vam, year == 1958 & state == "Indiana")
inVam_mod <- lm(vam_in1957$VAM ~ in1950$percent_farm_tractor)
summary(inVam_mod)
##
## Call:
## lm(formula = vam_in1957$VAM ~ in1950$percent_farm_tractor)
##
## Residuals:
## Min 1Q Median 3Q Max
## -450.65 -242.12 -46.76 201.85 583.08
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 142.0 173.0 0.821 0.415
## in1950$percent_farm_tractor 395.2 269.2 1.468 0.147
##
## Residual standard error: 289 on 69 degrees of freedom
## (21 observations deleted due to missingness)
## Multiple R-squared: 0.03028, Adjusted R-squared: 0.01623
## F-statistic: 2.155 on 1 and 69 DF, p-value: 0.1467
plot(in1950$percent_farm_tractor, vam_in1957$VAM, main = "Indiana - 1957 VAM vs 1950 %")
abline(inVam_mod, col = "red")

VAM 1947 vs % 1930
vam_in1947 <- subset(vam, year == 1947 & state == "Indiana")
vam_in1947 <- vam_in1947[-75,]
inVam_mod1947 <- lm(vam_in1947$VAM ~ in1930$percent_farm_tractor)
summary(inVam_mod1947)
##
## Call:
## lm(formula = vam_in1947$VAM ~ in1930$percent_farm_tractor)
##
## Residuals:
## Min 1Q Median 3Q Max
## -325.84 -212.46 -97.32 134.94 702.92
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 244.49 72.15 3.389 0.00108 **
## in1930$percent_farm_tractor 232.64 303.97 0.765 0.44627
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 269.6 on 82 degrees of freedom
## (8 observations deleted due to missingness)
## Multiple R-squared: 0.007092, Adjusted R-squared: -0.005016
## F-statistic: 0.5857 on 1 and 82 DF, p-value: 0.4463
plot(in1930$percent_farm_tractor, vam_in1947$VAM, main = "Indiana - 1947 VAM vs 1930 %")
abline(inVam_mod1947, col = "red")
Illinois
#Illinois Edu
il_edu <- eduDataFunc("Illinois")
il_edu_med <- eduMedianFunc("Illinois")
#Illinois Tractor
il_tract <- tractDataFunc("IL")
plot(il_tract$percent_farm_tractor, il_edu$avg, xlab = "1930 % Tractor", ylab = "1940 Avg. Edu", main = "Illinois")

plot(il_tract$percent_farm_tractor, il_edu_med$med, xlab = "1930 % Tractor", ylab = "1940 Med. Edu", main = "Illinois")

ilPlotSrc <- merge(il_edu, il_tract, by = "county")
#LOESS REGRESSION on EDU vs % Tractor
ggplot(ilPlotSrc,aes(x = percent_farm_tractor, y = avg)) + geom_point() + geom_smooth() + xlab("1930's % Tractor") + ylab("1940 Avg. Edu") + ggtitle("Illinois")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

#Transform response on Log Scale
ilPlotSrc$logAvg <- log(ilPlotSrc$avg)
plot(ilPlotSrc$percent_farm_tractor, ilPlotSrc$logAvg, main = "Log-Linear")

ilEduMod <- lm(logAvg ~ percent_farm_tractor, data = ilPlotSrc)
summary(ilEduMod)
##
## Call:
## lm(formula = logAvg ~ percent_farm_tractor, data = ilPlotSrc)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.228625 -0.017533 0.009969 0.029582 0.071982
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.22397 0.01067 208.505 < 2e-16 ***
## percent_farm_tractor 0.22051 0.03129 7.047 2.39e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04953 on 100 degrees of freedom
## Multiple R-squared: 0.3318, Adjusted R-squared: 0.3251
## F-statistic: 49.66 on 1 and 100 DF, p-value: 2.385e-10
#Residuals - Mean Model
ilPlotSrc$modResid <- resid(ilEduMod)
ilPlotSrc$fitted <- predict(ilEduMod)
ggplot(ilPlotSrc, aes(x = fitted, y = modResid)) + geom_point() + geom_hline(yintercept = 0, colour = "red") + ggtitle(label = "IL Edu vs Tractor Residuals")

#SCATTER DIAGRAM AND REGRESSION FOR Illinois VAM VS EDU
il_vam <- subset(vam, state == "Illinois" & year == 1947)
rownames(il_vam) <- 1:nrow(il_vam)
il_vam[is.na(il_vam)] <- 0
plot(il_tract$percent_farm_tractor, il_vam$VAM, xlab = "% 1940", ylab = "VAM 1947", ylim = c(0,1000), main = "Illinois VAM vs % Tractor")
abline(lm(il_vam$VAM ~ il_tract$percent_farm_tractor))

#Identify outliers
#outlier <- ilPlotSrc[c(35,54),]
#outlier
#Illinois - Median
ilMedian <- eduMedianFunc("Illinois")
plot(jitter(il_tract$percent_farm_tractor), jitter(ilMedian$med), xlab = "% 1930", ylab ="Median years education", main = "Illinois- Median")

ilMedian$logMed <- log(ilMedian$med)
plot(jitter(il_tract$percent_farm_tractor), jitter(ilMedian$logMed), xlab = "% 1930", ylab ="Median years education", main = "Illinois - LogMedian")

Illinois - Median Model
plotSrc_med_il <- merge(ilMedian, il_tract, by = "county")
ilEduMod_med <- lm(logMed ~ percent_farm_tractor, data = plotSrc_med_il)
summary(ilEduMod_med)
##
## Call:
## lm(formula = logMed ~ percent_farm_tractor, data = plotSrc_med_il)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.249730 -0.045203 0.009157 0.059213 0.177344
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.18273 0.01665 131.079 < 2e-16 ***
## percent_farm_tractor 0.40905 0.04885 8.373 3.56e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.07732 on 100 degrees of freedom
## Multiple R-squared: 0.4121, Adjusted R-squared: 0.4063
## F-statistic: 70.11 on 1 and 100 DF, p-value: 3.56e-13
il_med_df <- as.data.frame(resid(ilEduMod_med))
il_med_df$county <- plotSrc_med_il$name
colnames(il_med_df) <- c( "residual", "county")
Illinois Median Model County Residuals
il_med_df
## residual county
## 1 0.1159811021 ADAMS
## 2 -0.0219127354 ALEXANDER
## 3 0.0624933542 BOND
## 4 -0.0654303756 BOONE
## 5 -0.0574486943 BROWN
## 6 0.0242888950 BUREAU
## 7 -0.1739178024 CALHOUN
## 8 0.0087382195 CARROLL
## 9 0.0886314774 CASS
## 10 0.0685975463 CHAMPAIGN
## 11 0.0418905544 CHRISTIAN
## 12 0.0611637588 CLARK
## 13 0.0786393457 CLAY
## 14 -0.1803182776 CLINTON
## 15 -0.0684271244 COLES
## 16 0.0623256695 COOK
## 17 0.0456704740 CRAWFORD
## 18 -0.0679222565 CUMBERLAND
## 19 0.0003174398 DE KALB
## 20 0.0369388999 DE WITT
## 21 -0.0121108973 DOUGLAS
## 22 0.0011764439 DU PAGE
## 23 -0.0412973841 EDGAR
## 24 0.1528184335 EDWARDS
## 25 0.0628052418 EFFINGHAM
## 26 -0.0263809189 FAYETTE
## 27 -0.0386041590 FORD
## 28 0.0898362723 FRANKLIN
## 29 0.0884042556 FULTON
## 30 -0.1466355434 GALLATIN
## 31 0.0047574633 GREENE
## 32 -0.0950695919 GRUNDY
## 33 -0.1158571443 HAMILTON
## 34 0.1001316688 HANCOCK
## 35 -0.1132479844 HARDIN
## 36 -0.0219548279 HENDERSON
## 37 0.0420705665 HENRY
## 38 -0.0425400636 IROQUOIS
## 39 0.0372132464 JACKSON
## 40 -0.0460045118 JASPER
## 41 -0.0043966604 JEFFERSON
## 42 -0.0803185571 JERSEY
## 43 0.0003591042 JO DAVIESS
## 44 -0.0141440802 JOHNSON
## 45 -0.0135144444 KANE
## 46 -0.0472771443 KANKAKEE
## 47 -0.0259537624 KENDALL
## 48 0.0643324148 KNOX
## 49 0.0383327769 LAKE
## 50 -0.0812555084 LA SALLE
## 51 0.0350188042 LAWRENCE
## 52 -0.0578001067 LEE
## 53 -0.1113619253 LIVINGSTON
## 54 -0.0888060358 LOGAN
## 55 0.0665464003 MCDONOUGH
## 56 0.0518321475 MCHENRY
## 57 0.0181006265 MCLEAN
## 58 0.0459426482 MACON
## 59 0.0181559531 MACOUPIN
## 60 0.1161359591 MADISON
## 61 0.1773444851 MARION
## 62 0.0183762184 MARSHALL
## 63 -0.0209660543 MASON
## 64 -0.1304346238 MASSAC
## 65 -0.0118754277 MENARD
## 66 0.0700801133 MERCER
## 67 -0.2497303150 MONROE
## 68 0.0308361603 MONTGOMERY
## 69 -0.0019366361 MORGAN
## 70 -0.0427986618 MOULTRIE
## 71 0.0526453814 OGLE
## 72 0.0796242229 PEORIA
## 73 -0.0378438141 PERRY
## 74 -0.0068854448 PIATT
## 75 0.0363154089 PIKE
## 76 -0.1222652841 POPE
## 77 -0.0226174880 PULASKI
## 78 -0.0952511555 PUTNAM
## 79 -0.0634426356 RANDOLPH
## 80 0.0682259564 RICHLAND
## 81 0.0668327657 ROCK ISLAND
## 82 0.0123034269 ST CLAIR
## 83 0.0920240633 SALINE
## 84 0.0860022708 SANGAMON
## 85 0.0215368204 SCHUYLER
## 86 -0.0224760426 SCOTT
## 87 0.0249810386 SHELBY
## 88 0.0324385486 STARK
## 89 0.0598519133 STEPHENSON
## 90 0.0399708018 TAZEWELL
## 91 -0.0599806613 UNION
## 92 0.0572966517 VERMILION
## 93 0.1091509767 WABASH
## 94 0.0406126587 WARREN
## 95 -0.1551419151 WASHINGTON
## 96 -0.1299853400 WAYNE
## 97 0.0551008245 WHITE
## 98 -0.0288185389 WHITESIDE
## 99 -0.0702678619 WILL
## 100 0.0871343160 WILLIAMSON
## 101 0.0747176896 WINNEBAGO
## 102 0.0095765416 WOODFORD
IL EDU VS % SSE
#Calculate SSE
pred_ilEdu <- predict(ilEduMod, ilPlotSrc)
#Undo log transformation
pred_ilEdu <- exp(pred_ilEdu)-1
mean((ilPlotSrc$avg - pred_ilEdu)^2)
## [1] 1.239245
VAM 1958 vs % 1950
il1950 <- subset(df50,stateAbb == "IL" )
vam_il1957 <- subset(vam, year == 1958 & state == "Illinois")
ilVam_mod <- lm(vam_il1957$VAM ~ il1950$percent_farm_tractor)
summary(ilVam_mod)
##
## Call:
## lm(formula = vam_il1957$VAM ~ il1950$percent_farm_tractor)
##
## Residuals:
## Min 1Q Median 3Q Max
## -360.14 -182.35 -65.27 114.67 678.66
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -37.6 133.4 -0.282 0.7787
## il1950$percent_farm_tractor 455.0 186.6 2.438 0.0169 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 254.5 on 82 degrees of freedom
## (18 observations deleted due to missingness)
## Multiple R-squared: 0.06759, Adjusted R-squared: 0.05622
## F-statistic: 5.944 on 1 and 82 DF, p-value: 0.01693
plot(il1950$percent_farm_tractor, vam_il1957$VAM, main = "Illinois - 1957 VAM vs 1950 %")
abline(ilVam_mod, col = "red")

VAM 1947 vs % 1930
il1930 <- subset(df30, stateAbb == "IL")
ilVam_mod1947 <- lm(il_vam$VAM ~ il1930$percent_farm_tractor)
summary(ilVam_mod1947)
##
## Call:
## lm(formula = il_vam$VAM ~ il1930$percent_farm_tractor)
##
## Residuals:
## Min 1Q Median 3Q Max
## -360.67 -162.22 -84.38 119.71 713.75
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 90.09 52.08 1.730 0.08673 .
## il1930$percent_farm_tractor 436.14 152.79 2.855 0.00524 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 241.8 on 100 degrees of freedom
## Multiple R-squared: 0.07535, Adjusted R-squared: 0.0661
## F-statistic: 8.149 on 1 and 100 DF, p-value: 0.00524
plot(il1930$percent_farm_tractor, il_vam$VAM, main = "Illinois - 1947 VAM vs 1930 %")
abline(ilVam_mod1947, col = "red")

IOWA
#Iowa Edu
ia_edu <- eduDataFunc("Iowa")
#Iowa Tractor
ia_tract <- tractDataFunc("IA")
plot(ia_tract$percent_farm_tractor, ia_edu$avg, xlab = "1930 % Tractor", ylab = "1940 Avg. Edu", main = "Iowa")

iaPlotSrc <- merge(ia_edu, ia_tract, by = "county")
#LOESS REGRESSION on EDU vs % Tractor
ggplot(iaPlotSrc,aes(x = percent_farm_tractor, y = avg)) + geom_point() + geom_smooth() + xlab("1930's % Tractor") + ylab("1940 Avg. Edu") + ggtitle("Iowa")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

#Transform response on Log Scale
iaPlotSrc$logAvg <- log(iaPlotSrc$avg)
plot(iaPlotSrc$percent_farm_tractor, iaPlotSrc$logAvg, main = "Log-Linear")

iaEduMod <- lm(logAvg ~ percent_farm_tractor, data = iaPlotSrc)
summary(iaEduMod)
##
## Call:
## lm(formula = logAvg ~ percent_farm_tractor, data = iaPlotSrc)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.07421 -0.02169 0.00363 0.02606 0.07078
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.30712 0.01167 197.644 <2e-16 ***
## percent_farm_tractor 0.09378 0.03806 2.464 0.0155 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03359 on 97 degrees of freedom
## Multiple R-squared: 0.05892, Adjusted R-squared: 0.04922
## F-statistic: 6.073 on 1 and 97 DF, p-value: 0.01548
#Residuals
iaPlotSrc$modResid <- resid(iaEduMod)
iaPlotSrc$fitted <- predict(iaEduMod)
ggplot(iaPlotSrc, aes(x = fitted, y = modResid)) + geom_point() + geom_hline(yintercept = 0, colour = "red") + ggtitle(label = "IA Edu vs Tractor Residuals")

#SCATTER DIAGRAM AND REGRESSION FOR Indiana VAM VS EDU
ia_vam <- subset(vam, state == "Iowa" & year == 1947)
rownames(ia_vam) <- 1:nrow(ia_vam)
ia_vam[is.na(ia_vam)] <- 0
plot(ia_tract$percent_farm_tractor, ia_vam$VAM, xlab = "% 1940", ylab = "VAM 1947", ylim = c(0,1000), main = "Iowa VAM vs % Tractor")
abline(lm(ia_vam$VAM ~ ia_tract$percent_farm_tractor))

IA EDU VS % Sum of Squares Error
#Calculate SSE
pred_iaEdu <- predict(iaEduMod, iaPlotSrc)
#Undo log transformation
pred_iaEdu <- exp(pred_iaEdu)-1
mean((iaPlotSrc$avg - pred_iaEdu)^2)
## [1] 1.128052
VAM 1958 vs % 1950
ia1950 <- subset(df50,stateAbb == "IA" )
vam_ia1957 <- subset(vam, year == 1958 & state == "Iowa")
iaVam_mod <- lm(vam_ia1957$VAM ~ ia1950$percent_farm_tractor)
summary(iaVam_mod)
##
## Call:
## lm(formula = vam_ia1957$VAM ~ ia1950$percent_farm_tractor)
##
## Residuals:
## Min 1Q Median 3Q Max
## -186.93 -116.61 -54.65 28.86 770.95
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -132.8 197.6 -0.672 0.503
## ia1950$percent_farm_tractor 384.9 249.0 1.546 0.126
##
## Residual standard error: 190 on 86 degrees of freedom
## (11 observations deleted due to missingness)
## Multiple R-squared: 0.02703, Adjusted R-squared: 0.01572
## F-statistic: 2.389 on 1 and 86 DF, p-value: 0.1258
plot(ia1950$percent_farm_tractor, vam_ia1957$VAM, main = "Iowa - 1957 VAM vs 1950 %")
abline(iaVam_mod, col = "red")

VAM 1947 vs % 1930
ia1930 <- subset(df30, stateAbb == "IA")
iaVam_mod1947 <- lm(ia_vam$VAM ~ ia1930$percent_farm_tractor)
summary(iaVam_mod1947)
##
## Call:
## lm(formula = ia_vam$VAM ~ ia1930$percent_farm_tractor)
##
## Residuals:
## Min 1Q Median 3Q Max
## -132.30 -105.02 -80.41 -4.50 797.25
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 97.02 66.75 1.454 0.149
## ia1930$percent_farm_tractor 98.86 217.61 0.454 0.651
##
## Residual standard error: 192.1 on 97 degrees of freedom
## Multiple R-squared: 0.002123, Adjusted R-squared: -0.008164
## F-statistic: 0.2064 on 1 and 97 DF, p-value: 0.6506
plot(ia1930$percent_farm_tractor, ia_vam$VAM, main = "Iowa - 1947 VAM vs 1930 %")
abline(iaVam_mod1947, col = "red")

#Iowa - Median
iaMedian <- eduMedianFunc("Iowa")
plot(jitter(ia_tract$percent_farm_tractor), jitter(iaMedian$med), xlab = "% 1930", ylab ="Median years education", main = "Iowa - Median")

iaMedian$logMed <- log(iaMedian$med)
plot(jitter(ia_tract$percent_farm_tractor), jitter(iaMedian$logMed), xlab = "% 1930", ylab ="Median years education", main = "Iowa - LogMedian")

Iowa - Median Model
plotSrc_med_ia <- merge(iaMedian, ia_tract, by = "county")
iaEduMod_med <- lm(logMed ~ percent_farm_tractor, data = plotSrc_med_ia)
summary(iaEduMod_med)
##
## Call:
## lm(formula = logMed ~ percent_farm_tractor, data = plotSrc_med_ia)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.17834 -0.02710 0.01407 0.02660 0.12513
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.33110 0.02033 114.680 <2e-16 ***
## percent_farm_tractor 0.15680 0.06627 2.366 0.02 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.05849 on 97 degrees of freedom
## Multiple R-squared: 0.05457, Adjusted R-squared: 0.04482
## F-statistic: 5.599 on 1 and 97 DF, p-value: 0.01996
ia_med_df <- as.data.frame(resid(iaEduMod_med))
ia_med_df$county <- plotSrc_med_ia$name
colnames(ia_med_df) <- c( "residual", "county")
Iowa Median Model County Residuals
ia_med_df
## residual county
## 1 -0.0698771308 ADAIR
## 2 -0.0626695952 ADAMS
## 3 -0.0577524624 ALLAMAKEE
## 4 -0.0425195858 APPANOOSE
## 5 -0.0703218871 AUDUBON
## 6 0.0135565295 BENTON
## 7 0.0168034522 BLACK HAWK
## 8 0.0183828271 BOONE
## 9 0.0166698887 BREMER
## 10 0.0393247618 BUCHANAN
## 11 0.0876183606 BUENA VISTA
## 12 -0.0639751372 BUTLER
## 13 0.0068539027 CALHOUN
## 14 -0.0807633513 CARROLL
## 15 0.0240589330 CASS
## 16 0.0178388902 CEDAR
## 17 0.0205481719 CERRO GORDO
## 18 0.0012142122 CHEROKEE
## 19 -0.0586279740 CHICKASAW
## 20 0.0326358223 CLARKE
## 21 0.0875095949 CLAY
## 22 -0.0779393929 CLAYTON
## 23 0.0130968125 CLINTON
## 24 -0.0896821411 CRAWFORD
## 25 0.0067905428 DALLAS
## 26 -0.0489229649 DAVIS
## 27 0.0487135367 DECATUR
## 28 -0.0626068575 DELAWARE
## 29 0.0167590460 DES MOINES
## 30 0.0142671137 DICKINSON
## 31 -0.0653086956 DUBUQUE
## 32 0.0084627390 EMMET
## 33 0.0379903618 FAYETTE
## 34 0.0193618671 FLOYD
## 35 0.0102304642 FRANKLIN
## 36 0.0313653750 FREMONT
## 37 0.0089974723 GREENE
## 38 0.0144647967 GRUNDY
## 39 0.0279032762 GUTHRIE
## 40 0.0021000562 HAMILTON
## 41 -0.0890211419 HANCOCK
## 42 0.0154828035 HARDIN
## 43 0.0202798136 HARRISON
## 44 0.0247915724 HENRY
## 45 -0.0648661216 HOWARD
## 46 0.0007743202 HUMBOLDT
## 47 0.0078110812 IDA
## 48 0.0084270726 IOWA
## 49 -0.1111447334 JACKSON
## 50 0.0226355695 JASPER
## 51 0.0375409642 JEFFERSON
## 52 0.0178091783 JOHNSON
## 53 -0.0669817399 JONES
## 54 0.0338417038 KEOKUK
## 55 -0.0001827888 KOSSUTH
## 56 0.0275708975 LEE
## 57 0.1251309801 LINN
## 58 0.0104525307 LOUISA
## 59 0.0466248762 LUCAS
## 60 -0.1783359639 LYON/BUNCOMBE
## 61 0.0256228815 MADISON
## 62 -0.0621280035 MAHASKA
## 63 -0.0591260627 MARION
## 64 0.0109899381 MARSHALL
## 65 0.0186983518 MILLS
## 66 0.0217683251 MITCHELL
## 67 0.0002400920 MONONA
## 68 0.0518968927 MONROE
## 69 0.1055478112 MONTGOMERY
## 70 -0.0777806925 MUSCATINE
## 71 0.0042294249 O BRIEN
## 72 -0.0844676618 OSCEOLA
## 73 0.0227912237 PAGE
## 74 0.0126635703 PALO ALTO
## 75 -0.0917707364 PLYMOUTH
## 76 -0.0116855854 POCAHONTAS
## 77 0.1114975678 POLK
## 78 0.0099677272 POTTAWATTAMIE
## 79 0.0967193630 POWESHIEK
## 80 0.0373701459 RINGGOLD
## 81 0.0026975988 SAC
## 82 0.0035293703 SCOTT
## 83 0.0146905446 SHELBY
## 84 -0.1742005144 SIOUX
## 85 0.1005775055 STORY
## 86 0.0161494265 TAMA
## 87 0.0400091263 TAYLOR
## 88 0.0384390773 UNION
## 89 0.0414439818 VAN BUREN
## 90 0.0353212688 WAPELLO
## 91 0.1232005205 WARREN
## 92 0.0191529246 WASHINGTON
## 93 0.0420458200 WAYNE
## 94 -0.0061101406 WEBSTER
## 95 0.0212120364 WINNEBAGO
## 96 -0.1694246938 WINNESHIEK
## 97 0.0140698129 WOODBURY
## 98 0.0144878191 WORTH
## 99 -0.0015285639 WRIGHT
VAM REGRESSION
Regressing VAM on Logistic Params for years 1954 and 1958
params <- read.csv("Tractor_Raw_Data/TractorCoef.csv")
vam <- read.csv("vamFipsData.csv")
params[,1] <- NULL
vam[,1] <- NULL
Merge params DF and vam DF
INFO:
The dataset i’m using matches each county and year to their corresponding logistic params.(Slope, Ceiling, Mid) Any counties with a Negative slope have been removed Any counties with a ceiling > 1 have been removed. Any counties with a midpoint before 1900 or after 1980 have been removed Any counties with VAM as NA have been removed Any counties with VAM as 0 have been removed
#remove high ceilings
df <- df[df$Ceiling <= 1, ]
df <- df[df$Mid > 1940,]
df <- df[df$Mid < 1960,]
df <- df[!is.na(df$VAM),]
#Remove response variables that are 0
df <- df[which(df$VAM > 0),]
write.csv(df, "filtered_df.csv")
The next step necessary for fitting a logistic is to create a binary response
I’ve set the response as a binary value(1 = “Success”, 0 = “Failure”), this value was determined by wether a county had VAM of > 250 or not. This is arbritary and can be played with to see if it changes the result.
I chose 250 b/c the range of VAM is (1,999) however majority of these values are below 500 so it seemed like a good starting point.
df$response <- ifelse(df$VAM >= 250, 1, 0)
Before subsetting the data and get into modeling, I want to remove any N/A’s.
unique(is.na(df))
## fips Slope Ceiling Mid County.x State VAM year response
## 12 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
As seen above, none of the columns contain any N/A values
Below is a glimpse at the current dataset that will be used in the modeling. Next step is to reduce the observations to only rows that correspond to the years 1954 or 1958
head(df)
## fips Slope Ceiling Mid County.x State VAM year response
## 12 1009 0.2020403 0.9049963 1956.463 BLOUNT AL 15 1929 0
## 13 1009 0.2020403 0.9049963 1956.463 BLOUNT AL 140 1958 0
## 14 1009 0.2020403 0.9049963 1956.463 BLOUNT AL 38 1947 0
## 21 1015 0.1737442 0.9187005 1953.665 CALHOUN/BENTON AL 216 1929 0
## 22 1015 0.1737442 0.9187005 1953.665 CALHOUN/BENTON AL 751 1958 1
## 23 1015 0.1737442 0.9187005 1953.665 CALHOUN/BENTON AL 797 1954 1
df.54 <- subset(df, year == 1954)
rownames(df.54) <- 1:nrow(df.54)
df.58 <- subset(df, year == 1958)
rownames(df.58) <- 1:nrow(df.58)
Glimpse into 1954 dataset
head(df.54)
## fips Slope Ceiling Mid County.x State VAM year response
## 1 1015 0.1737442 0.9187005 1953.665 CALHOUN/BENTON AL 797 1954 1
## 2 1043 0.2265686 0.8037223 1954.990 CULLMAN AL 76 1954 0
## 3 1045 0.1738427 0.9419194 1956.008 DALE AL 128 1954 0
## 4 1049 0.1862319 0.8949618 1956.816 DE KALB AL 140 1954 0
## 5 1053 0.1624123 0.9736148 1957.202 ESCAMBIA AL 298 1954 1
## 6 1061 0.2448450 0.8566386 1953.288 GENEVA AL 244 1954 0
Glimpse into 1958 dataset
head(df.58)
## fips Slope Ceiling Mid County.x State VAM year response
## 1 1009 0.2020403 0.9049963 1956.463 BLOUNT AL 140 1958 0
## 2 1015 0.1737442 0.9187005 1953.665 CALHOUN/BENTON AL 751 1958 1
## 3 1019 0.1804967 0.8845261 1949.885 CHEROKEE AL 86 1958 0
## 4 1043 0.2265686 0.8037223 1954.990 CULLMAN AL 109 1958 0
## 5 1045 0.1738427 0.9419194 1956.008 DALE AL 289 1958 1
## 6 1049 0.1862319 0.8949618 1956.816 DE KALB AL 188 1958 0
#1954 Training and Test Set
smp_size <- floor(.7 * nrow(df.54))
set.seed(123)
train_ind <- sample(seq_len(nrow(df.54)), size = smp_size)
df.54.train <- df.54[train_ind,]
df.54.test <- df.54[-train_ind,]
1954 Model w/o Transformations and Plots
Below are some exploratory plots containing the predictor features for year 1954.
The main one that I want to point out is [VAM vs. Slope], it seems counter-intuitive to me that as the slope increase for the county, the VAM is decreasing.
Maybe you have some other insight into this??



## `geom_smooth()` using formula 'y ~ x'

## fips Slope Ceiling Mid County.x State
## "integer" "numeric" "numeric" "numeric" "character" "character"
## VAM year response
## "numeric" "integer" "numeric"
54 Model Summary
Below is the actual modeling for year 1954 on training set
model <- glm(response ~ Slope + Ceiling + Mid, data = df.54.train, family=binomial(link = "logit"))
summary(model)
##
## Call:
## glm(formula = response ~ Slope + Ceiling + Mid, family = binomial(link = "logit"),
## data = df.54.train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.2309 -0.9268 -0.7328 1.2693 2.1178
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 43.62342 35.72351 1.221 0.222
## Slope -8.05847 1.99785 -4.034 5.49e-05 ***
## Ceiling 1.69938 1.28813 1.319 0.187
## Mid -0.02283 0.01835 -1.244 0.213
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 796.35 on 629 degrees of freedom
## Residual deviance: 760.67 on 626 degrees of freedom
## AIC: 768.67
##
## Number of Fisher Scoring iterations: 4
predicted.54 <- plogis(predict(model, df.54.test))
Looking at the model summary above ^, this tells me that as both slope and mid increase independent of one another, VAM will decrease. However, as the Ceiling parameter increases, VAM will increase with. I’m not sure how to interpret this, maybe to be discussed??
## [1] 0.2593

This model is able to predict Success counties 30% of the time, doesn’t seem very good but at the same time we dont have much to work with
#Sensitivity(Truth Detection Rate)
sensitivity(df.54.test$response, predicted.54, threshold = optCutOff)
## [1] 0.1666667
confusionMatrix(df.54.test$response, predicted.54, threshold = optCutOff)
## 0 1
## 0 187 65
## 1 5 13
1954 Model w/ Transformations
Exploratory plots for predictor features log-transformed
trans54.df.train <- df.54.train
trans54.df.test <- df.54.test
trans54.df.train$Ceiling <- log(trans54.df.train$Ceiling)
trans54.df.test$Ceiling <- log(trans54.df.test$Ceiling)
trans54.df.train$Slope <- log(trans54.df.train$Slope)
trans54.df.test$Slope <- log(trans54.df.test$Slope)
#Plots for Transformed Predictors in Year 1954
plot(trans54.df.train$Slope, trans54.df.train$VAM, main = "1954", xlab = "Slope")

plot(trans54.df.train$Ceiling, trans54.df.train$VAM, main = "1954", xlab = "Ceiling")

plot(trans54.df.train$Mid, trans54.df.train$VAM, main = "1954", xlab = "Mid")

Modeling for transformed predicgtors for year 1954
We see below, same general results, VAM decreases with SLope and Mid but increases with Ceiling
model.trans <- glm(response ~ Slope + Ceiling + Mid, data = trans54.df.train, family=binomial(link="logit"))
summary(model.trans)
##
## Call:
## glm(formula = response ~ Slope + Ceiling + Mid, family = binomial(link = "logit"),
## data = trans54.df.train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.3516 -0.9132 -0.7254 1.2635 2.0038
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 37.96754 36.14104 1.051 0.293
## Slope -1.45589 0.34681 -4.198 2.69e-05 ***
## Ceiling 1.54351 1.07370 1.438 0.151
## Mid -0.02112 0.01848 -1.142 0.253
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 796.35 on 629 degrees of freedom
## Residual deviance: 760.55 on 626 degrees of freedom
## AIC: 768.55
##
## Number of Fisher Scoring iterations: 4
Evaluating this transformed model’s goodness of fit below
predicted.54.trans <- plogis(predict(model.trans, trans54.df.test))
#Find Optimal Prediction cutoff
optCutOff <- optimalCutoff(trans54.df.test$response, predicted.54.trans)
#Misclassification Error
misClassError(trans54.df.test$response, predicted.54.trans, threshold = optCutOff)
## [1] 0.2704
#ROC Curve
plotROC(trans54.df.test, predicted.54.trans)

#Sensitivity(Truth Detection Rate)
sensitivity(trans54.df.test$response, predicted.54.trans, threshold = optCutOff)
## [1] 0.1025641
The transformed model above is able to predict Success counties 21% of the time so this is worse than the non-transformed model
1958
Model for year 1958 below
#1958 Training and Test Set
smp_size <- floor(.7 * nrow(df.58))
set.seed(123)
train_ind <- sample(seq_len(nrow(df.58)), size = smp_size)
df.58.train <- df.58[train_ind,]
df.58.test <- df.58[-train_ind,]
Exploratory plots for year 1958




## `geom_smooth()` using formula 'y ~ x'

1958 Modeling
model <- glm(response ~ Slope + Ceiling + Mid, data = df.58.train, family="binomial")
summary(model)
##
## Call:
## glm(formula = response ~ Slope + Ceiling + Mid, family = "binomial",
## data = df.58.train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.2986 -1.0471 -0.8424 1.2386 1.9531
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 27.75953 35.17439 0.789 0.43000
## Slope -5.27971 1.69721 -3.111 0.00187 **
## Ceiling 1.53451 1.21264 1.265 0.20572
## Mid -0.01466 0.01807 -0.811 0.41735
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 795.93 on 587 degrees of freedom
## Residual deviance: 774.04 on 584 degrees of freedom
## AIC: 782.04
##
## Number of Fisher Scoring iterations: 4
predicted.58 <- plogis(predict(model, df.58.test))
#Find Optimal Prediction cutoff
optCutOff <- optimalCutoff(df.58.test$response, predicted.58)
#Misclassification Error
misClassError(df.58.test$response, predicted.58, threshold = optCutOff)
## [1] 0.3597
#ROC Curve
plotROC(df.58.test, predicted.58)

#Sensitivity(Truth Detection Rate)
sensitivity(df.58.test$response, predicted.58, threshold = optCutOff)
## [1] 0.3396226
Interestingly enough thoughm the 1958 model is able to predict Success counties 47% of the time, this is by far the best.
MidWestern States training and test
#Oh, MI, MN, IL, IN, WI
midWest <- subset(df, State == "OH" | State == "MI" | State == "MN" | State == "IL" | State == "IN" | State == "WI")
smp_size <- floor(.7 * nrow(midWest))
set.seed(123)
train_ind <- sample(seq_len(nrow(midWest)), size = smp_size)
midWest.train <- midWest[train_ind,]
midWest.test <- midWest[-train_ind,]
Exploring only midwestern states
#Plots for Transformed Predictors in Year 1954
plot(midWest$Slope, midWest$VAM)

plot(midWest$Ceiling, midWest$VAM)

plot(midWest$Mid, midWest$VAM)

#Response on X-Axis
plot(midWest$VAM, midWest$Ceiling)

model_midWest <- glm(response ~ Slope + Ceiling + Mid, data = midWest.train, family=binomial(link = "logit"))
summary(model_midWest)
##
## Call:
## glm(formula = response ~ Slope + Ceiling + Mid, family = binomial(link = "logit"),
## data = midWest.train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.1438 -0.9978 -0.8943 1.3097 1.6475
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 144.32865 58.88672 2.451 0.0142 *
## Slope -1.01061 2.66422 -0.379 0.7044
## Ceiling -0.53062 2.01071 -0.264 0.7919
## Mid -0.07411 0.03032 -2.444 0.0145 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 833.46 on 626 degrees of freedom
## Residual deviance: 824.64 on 623 degrees of freedom
## AIC: 832.64
##
## Number of Fisher Scoring iterations: 4
predicted.midWest <- plogis(predict(model_midWest, midWest.test))
#Find Optimal Prediction cutoff
optCutOff <- optimalCutoff(midWest.test$response, predicted.midWest)
#Misclassification Error
misClassError(midWest.test$response, predicted.midWest, threshold = optCutOff)
## [1] 0.3815
#ROC Curve
plotROC(midWest.test, predicted.midWest)

#Sensitivity(Truth Detection Rate)
sensitivity(midWest.test$response, predicted.midWest, threshold = optCutOff)
## [1] 0.1724138
Midwest model is no good, 12% sucess in prediction, however there aren’t many observations so this was to be expected.
Tractor Diffusion Region
## Parsed with column specification:
## cols(
## X1 = col_double(),
## stateAbb = col_character(),
## percent = col_double(),
## year = col_double()
## )



Visualize diffusion by the 3 regions
————————————————————————————————————————————————————————————————-
Back To Top
Ohio Detailed Analysis
## Parsed with column specification:
## cols(
## X1 = col_double(),
## X = col_double(),
## year = col_double(),
## state = col_double(),
## county = col_double(),
## name = col_character(),
## percent_farm_tractor = col_double(),
## stateAbb = col_character(),
## fips = col_double(),
## colorID = col_double()
## )
## Parsed with column specification:
## cols(
## X1 = col_double(),
## X = col_double(),
## year = col_double(),
## state = col_double(),
## county = col_double(),
## name = col_character(),
## percent_farm_tractor = col_double(),
## stateAbb = col_character(),
## fips = col_double(),
## colorID = col_double()
## )
Exploratory Plots of Average Education by County Vs. % of Farms with Tractors
Ohio 1940 Average Education Vs. Ohio 1930 % of Farms with Tractor

Linear Model 1940 Avg. Edu vs 1930 % Farm w/ Tractor
ohEduModLinear <- lm(avg ~ percent_farm_tractor, data = df_40Edu30Tractor)
plot(df_40Edu30Tractor$percent_farm_tractor, df_40Edu30Tractor$avg, main = "Linear")
abline(ohEduModLinear, col = "red")

summary(ohEduModLinear)
##
## Call:
## lm(formula = avg ~ percent_farm_tractor, data = df_40Edu30Tractor)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.14966 -0.22211 0.06822 0.28365 0.67705
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.7760 0.0882 110.833 < 2e-16 ***
## percent_farm_tractor 2.0083 0.3401 5.906 7.03e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.383 on 85 degrees of freedom
## Multiple R-squared: 0.2909, Adjusted R-squared: 0.2826
## F-statistic: 34.88 on 1 and 85 DF, p-value: 7.025e-08
Log-Linear Model 1940 Avg. Edu vs 1930 % Farm w/ Tractor
df_40Edu30Tractor$logAvg <- log(df_40Edu30Tractor$avg)
ohEduModLogLinear <- lm(logAvg ~ percent_farm_tractor, data = df_40Edu30Tractor)
plot(df_40Edu30Tractor$percent_farm_tractor, df_40Edu30Tractor$logAvg, main = "Log-Linear")
abline(ohEduModLogLinear, col = "red")

summary(ohEduModLogLinear)
##
## Call:
## lm(formula = logAvg ~ percent_farm_tractor, data = df_40Edu30Tractor)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.121302 -0.021393 0.007532 0.027990 0.067698
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.279001 0.008841 257.76 < 2e-16 ***
## percent_farm_tractor 0.200427 0.034087 5.88 7.85e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03839 on 85 degrees of freedom
## Multiple R-squared: 0.2891, Adjusted R-squared: 0.2808
## F-statistic: 34.57 on 1 and 85 DF, p-value: 7.848e-08
Linear-Log Model 1940 Avg. Edu vs 1930 % Farm w/ Tractor
df_40Edu30Tractor$logPercent <- log(df_40Edu30Tractor$percent_farm_tractor)
ohEduModLinearLog <- lm(avg ~ logPercent, data = df_40Edu30Tractor)
plot(df_40Edu30Tractor$logPercent, df_40Edu30Tractor$avg, main = "Linear-Log")
abline(ohEduModLinearLog, col = "red")

summary(ohEduModLinearLog)
##
## Call:
## lm(formula = avg ~ logPercent, data = df_40Edu30Tractor)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.19632 -0.21516 0.05499 0.27234 0.98138
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.80745 0.09983 108.255 < 2e-16 ***
## logPercent 0.33845 0.05417 6.248 1.59e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3766 on 85 degrees of freedom
## Multiple R-squared: 0.3147, Adjusted R-squared: 0.3067
## F-statistic: 39.04 on 1 and 85 DF, p-value: 1.589e-08
Linear Model VAM 1947 vs Percent 1930
vam47Tractor30Mod <- lm(VAM ~ percent_farm_tractor, data = vam1947Tractor30)
plot(vam1947Tractor30$percent_farm_tractor, vam1947Tractor30$VAM, main = "Linear")
abline(vam47Tractor30Mod, col = "red")

summary(vam47Tractor30Mod)
##
## Call:
## lm(formula = VAM ~ percent_farm_tractor, data = vam1947Tractor30)
##
## Residuals:
## Min 1Q Median 3Q Max
## -462.10 -208.32 -45.39 235.30 556.35
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 208.46 62.12 3.356 0.001226 **
## percent_farm_tractor 873.07 241.54 3.615 0.000531 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 267.9 on 78 degrees of freedom
## Multiple R-squared: 0.1435, Adjusted R-squared: 0.1325
## F-statistic: 13.07 on 1 and 78 DF, p-value: 0.000531
Residuals from VAM 47 vs. Percent 30
vam47Residuals <-data.frame(vam47Tractor30Mod$residuals, vam1947Tractor30$county, vam1947Tractor30)
vam47Residuals
## vam47Tractor30Mod.residuals vam1947Tractor30.county fips county avg
## 1 -214.994273 10 39001 10 9.209961
## 2 428.070892 30 39003 30 10.623929
## 3 472.063509 50 39005 50 10.622720
## 4 84.058441 70 39007 70 10.609148
## 5 -169.993062 90 39009 90 10.169675
## 6 61.007663 110 39011 110 10.507224
## 7 -58.614924 130 39013 130 10.050313
## 8 -206.340345 150 39015 150 9.738122
## 9 401.695552 170 39017 170 10.196781
## 10 -8.633830 190 39019 190 10.186483
## 11 -125.520252 210 39021 210 10.054731
## 13 -265.328863 250 39025 250 9.655556
## 14 -316.779148 270 39027 270 10.287815
## 15 266.818266 290 39029 290 10.306936
## 16 435.831153 310 39031 310 10.322308
## 17 384.628013 330 39033 330 10.851064
## 19 -328.143794 370 39037 370 10.221900
## 20 -55.972853 390 39039 390 10.341207
## 21 -148.513357 410 39041 410 10.596730
## 22 306.763656 430 39043 430 10.685484
## 23 82.250417 450 39045 450 10.074779
## 24 -395.688823 470 39047 470 9.944751
## 25 79.681637 490 39049 490 10.758445
## 26 -157.187923 510 39051 510 10.567867
## 27 -214.256282 530 39053 530 9.078111
## 28 -138.113593 550 39055 550 10.072043
## 29 -327.773284 570 39057 570 10.698218
## 30 -36.279397 590 39059 590 10.159635
## 32 -58.745962 630 39063 630 10.913314
## 33 -293.189494 650 39065 650 10.281763
## 34 -99.582105 670 39067 670 9.872080
## 35 -462.100049 690 39069 690 10.158685
## 36 -139.027462 710 39071 710 10.189036
## 37 41.890732 730 39073 730 9.911968
## 38 -241.680920 750 39075 750 9.486194
## 39 -155.190608 770 39077 770 10.674342
## 40 76.527608 790 39079 790 9.540707
## 41 234.802219 810 39081 810 10.093175
## 42 325.634939 830 39083 830 10.511027
## 43 280.482618 850 39085 850 10.686385
## 44 368.636885 870 39087 870 9.411918
## 45 117.350774 890 39089 890 10.361612
## 46 -274.716377 910 39091 910 10.684086
## 47 349.053411 930 39093 930 10.763259
## 48 518.069785 950 39095 950 10.592837
## 49 -370.307338 970 39097 970 9.682366
## 50 522.121661 990 39099 990 10.567978
## 51 203.560118 1010 39101 1010 10.490716
## 52 8.646107 1030 39103 1030 10.699282
## 53 -201.757571 1050 39105 1050 9.727691
## 54 -93.612800 1070 39107 1070 9.997748
## 55 246.516899 1090 39109 1090 10.464348
## 56 -238.855979 1110 39111 1110 9.436264
## 58 -187.791582 1150 39115 1150 10.410557
## 59 -343.259852 1170 39117 1170 10.576296
## 60 320.415170 1190 39119 1190 10.002766
## 61 -178.672979 1210 39121 1210 10.497222
## 62 -62.817490 1230 39123 1230 10.310724
## 63 -456.480284 1250 39125 1250 10.157001
## 64 -45.595543 1270 39127 1270 10.021769
## 65 -283.910975 1310 39131 1310 8.820574
## 66 76.702578 1330 39133 1330 10.780847
## 67 -442.686375 1350 39135 1350 10.296046
## 68 -381.604269 1370 39137 1370 9.923810
## 70 -91.948146 1410 39141 1410 9.166793
## 71 241.547837 1430 39143 1430 10.595465
## 72 169.538758 1450 39145 1450 9.300681
## 73 397.767524 1470 39147 1470 10.475133
## 74 236.782384 1490 39149 1490 10.466939
## 75 556.350539 1510 39151 1510 10.589506
## 78 259.489252 1570 39157 1570 10.274903
## 79 -215.824224 1590 39159 1590 10.505435
## 80 -126.027018 1610 39161 1610 10.594572
## 81 -3.748685 1630 39163 1630 9.421053
## 82 83.648255 1650 39165 1650 10.028030
## 83 77.822668 1670 39167 1670 10.202782
## 84 139.775195 1690 39169 1690 10.387983
## 85 -45.180167 1710 39171 1710 10.532258
## 86 103.929477 1730 39173 1730 10.572193
## 87 -297.484334 1750 39175 1750 10.314286
## year.x state.x name percent_farm_tractor stateAbb colorID.x logAvg
## 1 1930 39 ADAMS 0.05903146 OH 1 2.220286
## 2 1930 39 ALLEN 0.22503726 OH 4 2.363109
## 3 1930 39 ASHLAND 0.19412039 OH 3 2.362995
## 4 1930 39 ASHTABULA 0.22963305 OH 4 2.361717
## 5 1930 39 ATHENS 0.06819313 OH 2 2.319410
## 6 1930 39 AUGLAIZE 0.31674026 OH 5 2.352063
## 7 1930 39 BELMONT 0.07692308 OH 2 2.307604
## 8 1930 39 BROWN 0.08691702 OH 2 2.276048
## 9 1930 39 BUTLER 0.36978534 OH 6 2.322072
## 10 1930 39 CARROLL 0.08954394 OH 2 2.321062
## 11 1930 39 CHAMPAIGN 0.27038425 OH 4 2.308043
## 13 1930 39 CLERMONT 0.13500993 OH 2 2.267533
## 14 1930 39 CLINTON 0.31764146 OH 5 2.330960
## 15 1930 39 COLUMBIANA 0.14056832 OH 3 2.332817
## 16 1930 39 COSHOCTON 0.08442982 OH 2 2.334307
## 17 1930 39 CRAWFORD 0.29655823 OH 5 2.384263
## 19 1930 39 DARKE 0.29056995 OH 5 2.324533
## 20 1930 39 DEFIANCE 0.39689471 OH 6 2.336137
## 21 1930 39 DELAWARE 0.22341568 OH 4 2.360545
## 22 1930 39 ERIE 0.36512580 OH 6 2.368886
## 23 1930 39 FAIRFIELD 0.20536013 OH 4 2.310035
## 24 1930 39 FAYETTE 0.40344168 OH 6 2.297045
## 25 1930 39 FRANKLIN 0.30222372 OH 5 2.375691
## 26 1930 39 FULTON 0.32727273 OH 5 2.357818
## 27 1930 39 GALLIA 0.03069700 OH 1 2.205866
## 28 1930 39 GEAUGA 0.25159236 OH 4 2.309764
## 29 1930 39 GREENE 0.30389016 OH 5 2.370077
## 30 1930 39 GUERNSEY 0.05706727 OH 1 2.318423
## 32 1930 39 HANCOCK 0.38403563 OH 6 2.389984
## 33 1930 39 HARDIN 0.27458694 OH 5 2.330372
## 34 1930 39 HARRISON 0.08375778 OH 2 2.289711
## 35 1930 39 HENRY 0.44056493 OH 7 2.318329
## 36 1930 39 HIGHLAND 0.16788079 OH 3 2.321312
## 37 1930 39 HOCKING 0.06030856 OH 1 2.293743
## 38 1930 39 HOLMES 0.14228546 OH 3 2.249838
## 39 1930 39 HURON 0.35018345 OH 6 2.367843
## 40 1930 39 JACKSON 0.04468912 OH 1 2.255568
## 41 1930 39 JEFFERSON 0.12913288 OH 2 2.311859
## 42 1930 39 KNOX 0.13619677 OH 2 2.352425
## 43 1930 39 LAKE 0.31161473 OH 5 2.368970
## 44 1930 39 LAWRENCE 0.03998243 OH 1 2.241977
## 45 1930 39 LICKING 0.13881301 OH 3 2.338108
## 46 1930 39 LOGAN 0.23510204 OH 4 2.368755
## 47 1930 39 LORAIN 0.37853851 OH 6 2.376138
## 48 1930 39 LUCAS 0.30750605 OH 5 2.360178
## 49 1930 39 MADISON 0.34115409 OH 6 2.270306
## 50 1930 39 MAHONING 0.25361367 OH 4 2.357828
## 51 1930 39 MARION 0.37223123 OH 6 2.350491
## 52 1930 39 MEDINA 0.27935677 OH 5 2.370177
## 53 1930 39 MEIGS 0.04959729 OH 1 2.274977
## 54 1930 39 MERCER 0.30599711 OH 5 2.302360
## 55 1930 39 MIAMI 0.31272085 OH 5 2.347974
## 56 1930 39 MONROE 0.03482003 OH 1 2.244560
## 58 1930 39 MORGAN 0.03703704 OH 1 2.342820
## 59 1930 39 MORROW 0.15440238 OH 3 2.358615
## 60 1930 39 MUSKINGUM 0.08147014 OH 2 2.302862
## 61 1930 39 NOBLE 0.02201122 OH 1 2.351111
## 62 1930 39 OTTAWA 0.43565976 OH 7 2.333185
## 63 1930 39 PAULDING 0.43870968 OH 7 2.318163
## 64 1930 39 PERRY 0.10553544 OH 2 2.304760
## 65 1930 39 PIKE 0.09673367 OH 2 2.177087
## 66 1930 39 PORTAGE 0.26211135 OH 4 2.377771
## 67 1930 39 PREBLE 0.33586132 OH 5 2.331760
## 68 1930 39 PUTNAM 0.29911504 OH 5 2.294937
## 70 1930 39 ROSS 0.21131448 OH 4 2.215588
## 71 1930 39 SANDUSKY 0.36766376 OH 6 2.360426
## 72 1930 39 SCIOTO 0.12027158 OH 2 2.230088
## 73 1930 39 SENECA 0.38230211 OH 6 2.349004
## 74 1930 39 SHELBY 0.37197665 OH 6 2.348222
## 75 1930 39 STARK 0.25793304 OH 4 2.359863
## 78 1930 39 TUSCARAWAS 0.13636364 OH 3 2.329704
## 79 1930 39 UNION 0.24324324 OH 4 2.351893
## 80 1930 39 VAN WERT 0.29730849 OH 5 2.360342
## 81 1930 39 VINTON 0.05302326 OH 1 2.242947
## 82 1930 39 WARREN 0.27248201 OH 5 2.305384
## 83 1930 39 WASHINGTON 0.05809575 OH 1 2.322660
## 84 1930 39 WAYNE 0.24255668 OH 4 2.340650
## 85 1930 39 WILLIAMS 0.29061161 OH 5 2.354443
## 86 1930 39 WOOD 0.47947411 OH 8 2.358227
## 87 1930 39 WYANDOT 0.27950617 OH 5 2.333530
## logPercent X.1 X polyname RUCC County VAM year.y
## 1 -2.8296848 10023 10023 ohio,adams 9 Adams County 45 1947
## 2 -1.4914893 10028 10028 ohio,allen 3 Allen County 833 1947
## 3 -1.6392767 10033 10033 ohio,ashland 6 Ashland County 850 1947
## 4 -1.4712727 10038 10038 ohio,ashtabula 4 Ashtabula County 493 1947
## 5 -2.6854114 10043 10043 ohio,athens 7 Athens County 98 1947
## 6 -1.1496732 10047 10047 ohio,auglaize 6 Auglaize County 546 1947
## 7 -2.5649494 10052 10052 ohio,belmont 2 Belmont County 217 1947
## 8 -2.4428014 10057 10057 ohio,brown 9 Brown County 78 1947
## 9 -0.9948326 10059 10059 ohio,butler 3 Butler County 933 1947
## 10 -2.4130259 10067 10067 ohio,carroll 6 Carroll County 278 1947
## 11 -1.3079112 10073 10073 ohio,champaign 6 Champaign County 319 1947
## 13 -2.0024070 10079 10079 ohio,clermont 8 Clermont County 61 1947
## 14 -1.1468320 10088 10088 ohio,clinton 6 Clinton County 169 1947
## 15 -1.9620617 10092 10092 ohio,columbiana 4 Columbiana County 598 1947
## 16 -2.4718346 10098 10098 ohio,coshocton 7 Coshocton County 718 1947
## 17 -1.2155117 10102 10102 ohio,crawford 7 Crawford County 852 1947
## 19 -1.2359109 10112 10112 ohio,darke 6 Darke County 134 1947
## 20 -0.9240842 10116 10116 ohio,defiance 7 Defiance County 499 1947
## 21 -1.4987212 10121 10121 ohio,delaware 6 Delaware County 255 1947
## 22 -1.0075133 10125 10125 ohio,erie 4 Erie County 834 1947
## 23 -1.5829901 10133 10133 ohio,fairfield 4 Fairfield County 470 1947
## 24 -0.9077233 10136 10136 ohio,fayette 6 Fayette County 165 1947
## 25 -1.1965877 10143 10143 ohio,franklin 2 Franklin County 552 1947
## 26 -1.1169614 10147 10147 ohio,fulton 6 Fulton County 337 1947
## 27 -3.4835903 10151 10151 ohio,gallia 6 Gallia County 21 1947
## 28 -1.3799451 10158 10158 ohio,geauga 4 Geauga County 290 1947
## 29 -1.1910890 10162 10162 ohio,greene 2 Greene County 146 1947
## 30 -2.8635245 10167 10167 ohio,guernsey 6 Guernsey County 222 1947
## 32 -0.9570199 10174 10174 ohio,hancock 4 Hancock County 485 1947
## 33 -1.2924873 10183 10183 ohio,hardin 6 Hardin County 155 1947
## 34 -2.4798262 10185 10185 ohio,harrison 6 Harrison County 182 1947
## 35 -0.8196974 10193 10193 ohio,henry 6 Henry County 131 1947
## 36 -1.7845011 10196 10196 ohio,highland 7 Highland County 216 1947
## 37 -2.8082813 10202 10202 ohio,hocking 7 Hocking County 303 1947
## 38 -1.9499200 10207 10207 ohio,holmes 6 Holmes County 91 1947
## 39 -1.0492981 10213 10213 ohio,huron 6 Huron County 359 1947
## 40 -3.1080252 10217 10217 ohio,jackson 6 Jackson County 324 1947
## 41 -2.0469134 10223 10223 ohio,jefferson 2 Jefferson County 556 1947
## 42 -1.9936546 10228 10228 ohio,knox 7 Knox County 653 1947
## 43 -1.1659877 10233 10233 ohio,lake 1 Lake County 761 1947
## 44 -3.2193153 10238 10238 ohio,lawrence 3 Lawrence County 612 1947
## 45 -1.9746275 10243 10243 ohio,licking 4 Licking County 447 1947
## 46 -1.4477356 10248 10248 ohio,logan 7 Logan County 139 1947
## 47 -0.9714375 10250 10250 ohio,lorain 3 Lorain County 888 1947
## 48 -1.1792605 10258 10258 ohio,lucas 2 Lucas County 995 1947
## 49 -1.0754210 10259 10259 ohio,madison 6 Madison County 136 1947
## 50 -1.3719432 10268 10268 ohio,mahoning 2 Mahoning County 952 1947
## 51 -0.9882400 10273 10273 ohio,marion 5 Marion County 737 1947
## 52 -1.2752656 10278 10278 ohio,medina 6 Medina County 461 1947
## 53 -3.0038191 10280 10280 ohio,meigs 7 Meigs County 50 1947
## 54 -1.1841796 10288 10288 ohio,mercer 7 Mercer County 382 1947
## 55 -1.1624443 10291 10291 ohio,miami 6 Miami County 728 1947
## 56 -3.3575624 10297 10297 ohio,monroe 8 Monroe County 0 1947
## 58 -3.2958369 10305 10305 ohio,morgan 9 Morgan County 53 1947
## 59 -1.8681932 10309 10309 ohio,morrow 9 Morrow County 0 1947
## 60 -2.5075187 10314 10314 ohio,muskingum 5 Muskingum County 600 1947
## 61 -3.8162029 10320 10320 ohio,noble 8 Noble County 49 1947
## 62 -0.8308937 10324 10324 ohio,ottawa 6 Ottawa County 526 1947
## 63 -0.8239174 10329 10329 ohio,paulding 9 Paulding County 135 1947
## 64 -2.2487085 10336 10336 ohio,perry 7 Perry County 255 1947
## 65 -2.3357938 10347 10347 ohio,pike 9 Pike County 9 1947
## 66 -1.3389859 10353 10353 ohio,portage 6 Portage County 514 1947
## 67 -1.0910569 10357 10357 ohio,preble 6 Preble County 59 1947
## 68 -1.2069270 10362 10362 ohio,putnam 6 Putnam County 88 1947
## 70 -1.5544078 10371 10371 ohio,ross 5 Ross County 301 1947
## 71 -1.0005864 10374 10374 ohio,sandusky 7 Sandusky County 771 1947
## 72 -2.1180029 10381 10381 ohio,scioto 4 Scioto County 483 1947
## 73 -0.9615441 10386 10386 ohio,seneca 7 Seneca County 940 1947
## 74 -0.9889242 10391 10391 ohio,shelby 7 Shelby County 770 1947
## 75 -1.3550553 10397 10397 ohio,stark 2 Stark County 990 1947
## 78 -1.9924302 10412 10412 ohio,tuscarawas 6 Tuscarawas County 587 1947
## 79 -1.4136933 10417 10417 ohio,union 6 Union County 205 1947
## 80 -1.2129850 10423 10423 ohio,van wert 6 Van Wert County 342 1947
## 81 -2.9370247 10428 10428 ohio,vinton 9 Vinton County 251 1947
## 82 -1.3001827 10432 10432 ohio,warren 6 Warren County 530 1947
## 83 -2.8456628 10438 10438 ohio,washington 7 Washington County 337 1947
## 84 -1.4165199 10443 10443 ohio,wayne 6 Wayne County 560 1947
## 85 -1.2357676 10448 10448 ohio,williams 7 Williams County 417 1947
## 86 -0.7350654 10451 10451 ohio,wood 6 Wood County 731 1947
## 87 -1.2747309 10457 10457 ohio,wyandot 7 Wyandot County 155 1947
## state.y state.low county.low colorID.y color
## 1 Ohio ohio adams county 1 #FF0000
## 2 Ohio ohio allen county 13 #3F0CF9
## 3 Ohio ohio ashland county 13 #3F0CF9
## 4 Ohio ohio ashtabula county 8 #A61DE0
## 5 Ohio ohio athens county 2 #F20420
## 6 Ohio ohio auglaize county 9 #951DF1
## 7 Ohio ohio belmont county 4 #D90C60
## 8 Ohio ohio brown county 2 #F20420
## 9 Ohio ohio butler county 14 #2A08FB
## 10 Ohio ohio carroll county 5 #CC1180
## 11 Ohio ohio champaign county 5 #CC1180
## 13 Ohio ohio clermont county 1 #FF0000
## 14 Ohio ohio clinton county 3 #E50840
## 15 Ohio ohio columbiana county 9 #951DF1
## 16 Ohio ohio coshocton county 11 #6A15F5
## 17 Ohio ohio crawford county 13 #3F0CF9
## 19 Ohio ohio darke county 2 #F20420
## 20 Ohio ohio defiance county 8 #A61DE0
## 21 Ohio ohio delaware county 4 #D90C60
## 22 Ohio ohio erie county 13 #3F0CF9
## 23 Ohio ohio fairfield county 7 #B319C0
## 24 Ohio ohio fayette county 3 #E50840
## 25 Ohio ohio franklin county 9 #951DF1
## 26 Ohio ohio fulton county 5 #CC1180
## 27 Ohio ohio gallia county 1 #FF0000
## 28 Ohio ohio geauga county 5 #CC1180
## 29 Ohio ohio greene county 3 #E50840
## 30 Ohio ohio guernsey county 4 #D90C60
## 32 Ohio ohio hancock county 8 #A61DE0
## 33 Ohio ohio hardin county 3 #E50840
## 34 Ohio ohio harrison county 3 #E50840
## 35 Ohio ohio henry county 2 #F20420
## 36 Ohio ohio highland county 4 #D90C60
## 37 Ohio ohio hocking county 5 #CC1180
## 38 Ohio ohio holmes county 2 #F20420
## 39 Ohio ohio huron county 6 #BF15A0
## 40 Ohio ohio jackson county 5 #CC1180
## 41 Ohio ohio jefferson county 9 #951DF1
## 42 Ohio ohio knox county 10 #8019F3
## 43 Ohio ohio lake county 12 #5511F7
## 44 Ohio ohio lawrence county 9 #951DF1
## 45 Ohio ohio licking county 7 #B319C0
## 46 Ohio ohio logan county 3 #E50840
## 47 Ohio ohio lorain county 14 #2A08FB
## 48 Ohio ohio lucas county 15 #1504FD
## 49 Ohio ohio madison county 2 #F20420
## 50 Ohio ohio mahoning county 14 #2A08FB
## 51 Ohio ohio marion county 11 #6A15F5
## 52 Ohio ohio medina county 7 #B319C0
## 53 Ohio ohio meigs county 1 #FF0000
## 54 Ohio ohio mercer county 6 #BF15A0
## 55 Ohio ohio miami county 11 #6A15F5
## 56 Ohio ohio monroe county 1 #FF0000
## 58 Ohio ohio morgan county 1 #FF0000
## 59 Ohio ohio morrow county 1 #FF0000
## 60 Ohio ohio muskingum county 9 #951DF1
## 61 Ohio ohio noble county 1 #FF0000
## 62 Ohio ohio ottawa county 8 #A61DE0
## 63 Ohio ohio paulding county 2 #F20420
## 64 Ohio ohio perry county 4 #D90C60
## 65 Ohio ohio pike county 1 #FF0000
## 66 Ohio ohio portage county 8 #A61DE0
## 67 Ohio ohio preble county 1 #FF0000
## 68 Ohio ohio putnam county 2 #F20420
## 70 Ohio ohio ross county 5 #CC1180
## 71 Ohio ohio sandusky county 12 #5511F7
## 72 Ohio ohio scioto county 8 #A61DE0
## 73 Ohio ohio seneca county 14 #2A08FB
## 74 Ohio ohio shelby county 12 #5511F7
## 75 Ohio ohio stark county 15 #1504FD
## 78 Ohio ohio tuscarawas county 9 #951DF1
## 79 Ohio ohio union county 4 #D90C60
## 80 Ohio ohio van wert county 6 #BF15A0
## 81 Ohio ohio vinton county 4 #D90C60
## 82 Ohio ohio warren county 8 #A61DE0
## 83 Ohio ohio washington county 5 #CC1180
## 84 Ohio ohio wayne county 9 #951DF1
## 85 Ohio ohio williams county 7 #B319C0
## 86 Ohio ohio wood county 11 #6A15F5
## 87 Ohio ohio wyandot county 3 #E50840
Linear Model VAM 1947 vs Percent 1930
vam54Tractor30Mod <- lm(VAM ~ percent_farm_tractor, data = vam1954Tractor30)
plot(vam1954Tractor30$percent_farm_tractor, vam1954Tractor30$VAM, main = "Linear")
abline(vam54Tractor30Mod, col = "red")

summary(vam54Tractor30Mod)
##
## Call:
## lm(formula = VAM ~ percent_farm_tractor, data = vam1954Tractor30)
##
## Residuals:
## Min 1Q Median 3Q Max
## -454.38 -169.97 -31.25 200.52 568.19
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 362.38 66.15 5.478 1.01e-06 ***
## percent_farm_tractor 668.33 278.94 2.396 0.0199 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 260.4 on 57 degrees of freedom
## Multiple R-squared: 0.0915, Adjusted R-squared: 0.07556
## F-statistic: 5.741 on 1 and 57 DF, p-value: 0.01988
Residuals from VAM 47 vs. Percent 30
vam54Residuals <-data.frame(vam54Tractor30Mod$residuals, vam1954Tractor30$county, vam1954Tractor30)
vam54Residuals
## vam54Tractor30Mod.residuals vam1954Tractor30.county fips county avg
## 1 -333.830662 10 39001 10 9.209961
## 3 458.885693 50 39005 50 10.622720
## 4 261.151609 70 39007 70 10.609148
## 5 -143.953658 90 39009 90 10.169675
## 6 424.935463 110 39011 110 10.507224
## 7 -145.788121 130 39013 130 10.050313
## 8 -299.467348 150 39015 150 9.738122
## 10 131.777011 190 39019 190 10.186483
## 11 -56.083542 210 39021 210 10.054731
## 13 -393.609159 250 39025 250 9.655556
## 14 -145.666833 270 39027 270 10.287815
## 15 351.676015 290 39029 290 10.306936
## 16 568.194913 310 39031 310 10.322308
## 19 -115.574195 370 39037 370 10.221900
## 21 65.306846 410 39041 410 10.596730
## 23 323.373865 450 39045 450 10.074779
## 24 -293.009476 470 39047 470 9.944751
## 26 -13.103671 510 39051 510 10.567867
## 27 -341.893966 530 39053 530 9.078111
## 28 -56.524400 550 39055 550 10.072043
## 29 -180.476459 570 39057 570 10.698218
## 30 -6.517941 590 39059 590 10.159635
## 32 107.960123 630 39063 630 10.913314
## 33 -140.892311 650 39065 650 10.281763
## 34 -134.355942 670 39067 670 9.872080
## 35 -292.819961 690 39069 690 10.158685
## 36 -188.577663 710 39071 710 10.189036
## 37 22.315820 730 39073 730 9.911968
## 38 -159.471598 750 39075 750 9.486194
## 39 61.584464 770 39077 770 10.674342
## 40 -31.245282 790 39079 790 9.540707
## 42 364.597642 830 39083 830 10.511027
## 46 -35.503469 910 39091 910 10.684086
## 49 -454.380967 970 39097 970 9.682366
## 51 360.849329 1010 39101 1010 10.490716
## 52 188.919881 1030 39103 1030 10.699282
## 53 -291.525545 1050 39105 1050 9.727691
## 54 212.115409 1070 39107 1070 9.997748
## 56 -369.649499 1110 39111 1110 9.436264
## 58 100.868815 1150 39115 1150 10.410557
## 59 89.430331 1170 39117 1170 10.576296
## 61 -158.089019 1210 39121 1210 10.497222
## 62 15.458297 1230 39123 1230 10.310724
## 63 -119.580046 1250 39125 1250 10.157001
## 64 -144.910548 1270 39127 1270 10.021769
## 65 318.971916 1310 39131 1310 8.820574
## 66 353.445466 1330 39133 1330 10.780847
## 67 -437.843665 1350 39135 1350 10.296046
## 68 310.714882 1370 39137 1370 9.923810
## 70 82.394415 1410 39141 1410 9.166793
## 72 111.240883 1450 39145 1450 9.300681
## 78 385.486120 1570 39157 1570 10.274903
## 79 -210.944458 1590 39159 1590 10.505435
## 80 14.922253 1610 39161 1610 10.594572
## 81 -120.815214 1630 39163 1630 9.421053
## 82 -279.485532 1650 39165 1650 10.028030
## 83 408.794698 1670 39167 1670 10.202782
## 85 243.397962 1710 39171 1710 10.532258
## 87 -243.179970 1750 39175 1750 10.314286
## year.x state.x name percent_farm_tractor stateAbb colorID.x logAvg
## 1 1930 39 ADAMS 0.05903146 OH 1 2.220286
## 3 1930 39 ASHLAND 0.19412039 OH 3 2.362995
## 4 1930 39 ASHTABULA 0.22963305 OH 4 2.361717
## 5 1930 39 ATHENS 0.06819313 OH 2 2.319410
## 6 1930 39 AUGLAIZE 0.31674026 OH 5 2.352063
## 7 1930 39 BELMONT 0.07692308 OH 2 2.307604
## 8 1930 39 BROWN 0.08691702 OH 2 2.276048
## 10 1930 39 CARROLL 0.08954394 OH 2 2.321062
## 11 1930 39 CHAMPAIGN 0.27038425 OH 4 2.308043
## 13 1930 39 CLERMONT 0.13500993 OH 2 2.267533
## 14 1930 39 CLINTON 0.31764146 OH 5 2.330960
## 15 1930 39 COLUMBIANA 0.14056832 OH 3 2.332817
## 16 1930 39 COSHOCTON 0.08442982 OH 2 2.334307
## 19 1930 39 DARKE 0.29056995 OH 5 2.324533
## 21 1930 39 DELAWARE 0.22341568 OH 4 2.360545
## 23 1930 39 FAIRFIELD 0.20536013 OH 4 2.310035
## 24 1930 39 FAYETTE 0.40344168 OH 6 2.297045
## 26 1930 39 FULTON 0.32727273 OH 5 2.357818
## 27 1930 39 GALLIA 0.03069700 OH 1 2.205866
## 28 1930 39 GEAUGA 0.25159236 OH 4 2.309764
## 29 1930 39 GREENE 0.30389016 OH 5 2.370077
## 30 1930 39 GUERNSEY 0.05706727 OH 1 2.318423
## 32 1930 39 HANCOCK 0.38403563 OH 6 2.389984
## 33 1930 39 HARDIN 0.27458694 OH 5 2.330372
## 34 1930 39 HARRISON 0.08375778 OH 2 2.289711
## 35 1930 39 HENRY 0.44056493 OH 7 2.318329
## 36 1930 39 HIGHLAND 0.16788079 OH 3 2.321312
## 37 1930 39 HOCKING 0.06030856 OH 1 2.293743
## 38 1930 39 HOLMES 0.14228546 OH 3 2.249838
## 39 1930 39 HURON 0.35018345 OH 6 2.367843
## 40 1930 39 JACKSON 0.04468912 OH 1 2.255568
## 42 1930 39 KNOX 0.13619677 OH 2 2.352425
## 46 1930 39 LOGAN 0.23510204 OH 4 2.368755
## 49 1930 39 MADISON 0.34115409 OH 6 2.270306
## 51 1930 39 MARION 0.37223123 OH 6 2.350491
## 52 1930 39 MEDINA 0.27935677 OH 5 2.370177
## 53 1930 39 MEIGS 0.04959729 OH 1 2.274977
## 54 1930 39 MERCER 0.30599711 OH 5 2.302360
## 56 1930 39 MONROE 0.03482003 OH 1 2.244560
## 58 1930 39 MORGAN 0.03703704 OH 1 2.342820
## 59 1930 39 MORROW 0.15440238 OH 3 2.358615
## 61 1930 39 NOBLE 0.02201122 OH 1 2.351111
## 62 1930 39 OTTAWA 0.43565976 OH 7 2.333185
## 63 1930 39 PAULDING 0.43870968 OH 7 2.318163
## 64 1930 39 PERRY 0.10553544 OH 2 2.304760
## 65 1930 39 PIKE 0.09673367 OH 2 2.177087
## 66 1930 39 PORTAGE 0.26211135 OH 4 2.377771
## 67 1930 39 PREBLE 0.33586132 OH 5 2.331760
## 68 1930 39 PUTNAM 0.29911504 OH 5 2.294937
## 70 1930 39 ROSS 0.21131448 OH 4 2.215588
## 72 1930 39 SCIOTO 0.12027158 OH 2 2.230088
## 78 1930 39 TUSCARAWAS 0.13636364 OH 3 2.329704
## 79 1930 39 UNION 0.24324324 OH 4 2.351893
## 80 1930 39 VAN WERT 0.29730849 OH 5 2.360342
## 81 1930 39 VINTON 0.05302326 OH 1 2.242947
## 82 1930 39 WARREN 0.27248201 OH 5 2.305384
## 83 1930 39 WASHINGTON 0.05809575 OH 1 2.322660
## 85 1930 39 WILLIAMS 0.29061161 OH 5 2.354443
## 87 1930 39 WYANDOT 0.27950617 OH 5 2.333530
## logPercent X.1 X polyname RUCC County VAM year.y
## 1 -2.8296848 10021 10021 ohio,adams 9 Adams County 68 1954
## 3 -1.6392767 10031 10031 ohio,ashland 6 Ashland County 951 1954
## 4 -1.4712727 10036 10036 ohio,ashtabula 4 Ashtabula County 777 1954
## 5 -2.6854114 10042 10042 ohio,athens 7 Athens County 264 1954
## 6 -1.1496732 10044 10044 ohio,auglaize 6 Auglaize County 999 1954
## 7 -2.5649494 10051 10051 ohio,belmont 2 Belmont County 268 1954
## 8 -2.4428014 10058 10058 ohio,brown 9 Brown County 121 1954
## 10 -2.4130259 10066 10066 ohio,carroll 6 Carroll County 554 1954
## 11 -1.3079112 10071 10071 ohio,champaign 6 Champaign County 487 1954
## 13 -2.0024070 10081 10081 ohio,clermont 8 Clermont County 59 1954
## 14 -1.1468320 10087 10087 ohio,clinton 6 Clinton County 429 1954
## 15 -1.9620617 10089 10089 ohio,columbiana 4 Columbiana County 808 1954
## 16 -2.4718346 10095 10095 ohio,coshocton 7 Coshocton County 987 1954
## 19 -1.2359109 10110 10110 ohio,darke 6 Darke County 441 1954
## 21 -1.4987212 10123 10123 ohio,delaware 6 Delaware County 577 1954
## 23 -1.5829901 10131 10131 ohio,fairfield 4 Fairfield County 823 1954
## 24 -0.9077233 10138 10138 ohio,fayette 6 Fayette County 339 1954
## 26 -1.1169614 10144 10144 ohio,fulton 6 Fulton County 568 1954
## 27 -3.4835903 10149 10149 ohio,gallia 6 Gallia County 41 1954
## 28 -1.3799451 10155 10155 ohio,geauga 4 Geauga County 474 1954
## 29 -1.1910890 10159 10159 ohio,greene 2 Greene County 385 1954
## 30 -2.8635245 10165 10165 ohio,guernsey 6 Guernsey County 394 1954
## 32 -0.9570199 10178 10178 ohio,hancock 4 Hancock County 727 1954
## 33 -1.2924873 10181 10181 ohio,hardin 6 Hardin County 405 1954
## 34 -2.4798262 10187 10187 ohio,harrison 6 Harrison County 284 1954
## 35 -0.8196974 10190 10190 ohio,henry 6 Henry County 364 1954
## 36 -1.7845011 10198 10198 ohio,highland 7 Highland County 286 1954
## 37 -2.8082813 10200 10200 ohio,hocking 7 Hocking County 425 1954
## 38 -1.9499200 10205 10205 ohio,holmes 6 Holmes County 298 1954
## 39 -1.0492981 10210 10210 ohio,huron 6 Huron County 658 1954
## 40 -3.1080252 10214 10214 ohio,jackson 6 Jackson County 361 1954
## 42 -1.9936546 10225 10225 ohio,knox 7 Knox County 818 1954
## 46 -1.4477356 10245 10245 ohio,logan 7 Logan County 484 1954
## 49 -1.0754210 10263 10263 ohio,madison 6 Madison County 136 1954
## 51 -0.9882400 10272 10272 ohio,marion 5 Marion County 972 1954
## 52 -1.2752656 10277 10277 ohio,medina 6 Medina County 738 1954
## 53 -3.0038191 10279 10279 ohio,meigs 7 Meigs County 104 1954
## 54 -1.1841796 10287 10287 ohio,mercer 7 Mercer County 779 1954
## 56 -3.3575624 10296 10296 ohio,monroe 8 Monroe County 16 1954
## 58 -3.2958369 10308 10308 ohio,morgan 9 Morgan County 488 1954
## 59 -1.8681932 10312 10312 ohio,morrow 9 Morrow County 555 1954
## 61 -3.8162029 10322 10322 ohio,noble 8 Noble County 219 1954
## 62 -0.8308937 10328 10328 ohio,ottawa 6 Ottawa County 669 1954
## 63 -0.8239174 10333 10333 ohio,paulding 9 Paulding County 536 1954
## 64 -2.2487085 10335 10335 ohio,perry 7 Perry County 288 1954
## 65 -2.3357938 10346 10346 ohio,pike 9 Pike County 746 1954
## 66 -1.3389859 10349 10349 ohio,portage 6 Portage County 891 1954
## 67 -1.0910569 10354 10354 ohio,preble 6 Preble County 149 1954
## 68 -1.2069270 10360 10360 ohio,putnam 6 Putnam County 873 1954
## 70 -1.5544078 10369 10369 ohio,ross 5 Ross County 586 1954
## 72 -2.1180029 10382 10382 ohio,scioto 4 Scioto County 554 1954
## 78 -1.9924302 10411 10411 ohio,tuscarawas 6 Tuscarawas County 839 1954
## 79 -1.4136933 10414 10414 ohio,union 6 Union County 314 1954
## 80 -1.2129850 10419 10419 ohio,van wert 6 Van Wert County 576 1954
## 81 -2.9370247 10426 10426 ohio,vinton 9 Vinton County 277 1954
## 82 -1.3001827 10433 10433 ohio,warren 6 Warren County 265 1954
## 83 -2.8456628 10435 10435 ohio,washington 7 Washington County 810 1954
## 85 -1.2357676 10446 10446 ohio,williams 7 Williams County 800 1954
## 87 -1.2747309 10456 10456 ohio,wyandot 7 Wyandot County 306 1954
## state.y state.low county.low colorID.y color
## 1 Ohio ohio adams county 1 #FF0000
## 3 Ohio ohio ashland county 14 #2A08FB
## 4 Ohio ohio ashtabula county 12 #5511F7
## 5 Ohio ohio athens county 4 #D90C60
## 6 Ohio ohio auglaize county 15 #1504FD
## 7 Ohio ohio belmont county 4 #D90C60
## 8 Ohio ohio brown county 2 #F20420
## 10 Ohio ohio carroll county 9 #951DF1
## 11 Ohio ohio champaign county 8 #A61DE0
## 13 Ohio ohio clermont county 1 #FF0000
## 14 Ohio ohio clinton county 7 #B319C0
## 15 Ohio ohio columbiana county 12 #5511F7
## 16 Ohio ohio coshocton county 15 #1504FD
## 19 Ohio ohio darke county 7 #B319C0
## 21 Ohio ohio delaware county 9 #951DF1
## 23 Ohio ohio fairfield county 13 #3F0CF9
## 24 Ohio ohio fayette county 5 #CC1180
## 26 Ohio ohio fulton county 9 #951DF1
## 27 Ohio ohio gallia county 1 #FF0000
## 28 Ohio ohio geauga county 7 #B319C0
## 29 Ohio ohio greene county 6 #BF15A0
## 30 Ohio ohio guernsey county 6 #BF15A0
## 32 Ohio ohio hancock county 11 #6A15F5
## 33 Ohio ohio hardin county 6 #BF15A0
## 34 Ohio ohio harrison county 5 #CC1180
## 35 Ohio ohio henry county 6 #BF15A0
## 36 Ohio ohio highland county 5 #CC1180
## 37 Ohio ohio hocking county 7 #B319C0
## 38 Ohio ohio holmes county 5 #CC1180
## 39 Ohio ohio huron county 10 #8019F3
## 40 Ohio ohio jackson county 6 #BF15A0
## 42 Ohio ohio knox county 13 #3F0CF9
## 46 Ohio ohio logan county 8 #A61DE0
## 49 Ohio ohio madison county 2 #F20420
## 51 Ohio ohio marion county 15 #1504FD
## 52 Ohio ohio medina county 11 #6A15F5
## 53 Ohio ohio meigs county 2 #F20420
## 54 Ohio ohio mercer county 12 #5511F7
## 56 Ohio ohio monroe county 1 #FF0000
## 58 Ohio ohio morgan county 8 #A61DE0
## 59 Ohio ohio morrow county 9 #951DF1
## 61 Ohio ohio noble county 4 #D90C60
## 62 Ohio ohio ottawa county 10 #8019F3
## 63 Ohio ohio paulding county 8 #A61DE0
## 64 Ohio ohio perry county 5 #CC1180
## 65 Ohio ohio pike county 11 #6A15F5
## 66 Ohio ohio portage county 14 #2A08FB
## 67 Ohio ohio preble county 3 #E50840
## 68 Ohio ohio putnam county 13 #3F0CF9
## 70 Ohio ohio ross county 9 #951DF1
## 72 Ohio ohio scioto county 9 #951DF1
## 78 Ohio ohio tuscarawas county 13 #3F0CF9
## 79 Ohio ohio union county 5 #CC1180
## 80 Ohio ohio van wert county 9 #951DF1
## 81 Ohio ohio vinton county 5 #CC1180
## 82 Ohio ohio warren county 4 #D90C60
## 83 Ohio ohio washington county 12 #5511F7
## 85 Ohio ohio williams county 12 #5511F7
## 87 Ohio ohio wyandot county 5 #CC1180
High Education and VAM(1947, 1954)
##1947
plot(vam1947Tractor30$VAM, vam1947Tractor30$avg, main = "Education vs. VAM (1947)")

##1954
plot(vam1954Tractor30$VAM, vam1954Tractor30$avg, main = "Education vs. VAM (1954)")

Notice the lowest points going across the X-Axis(On Plot with line connecting outlier counties). These counties identified left-to-right are:
- Pike County
- Ross County
- Madison County
- Fayette County
The 2 Counties which are overlaying each other and can be seen to the right of the outlier line are
- Paulding County
- Henry County
The objective was to look at the difference in education from 1930 and 1940 and plot that against the % of Farms with Tractors in 1930. The problem I encountered was that the 1930’s Education dataset has not features which indicate the highest grade completed for an individual.
Taking a closer look at the outlier counties listed above(Pike, Ross, Madison, Fayette) but removing everyone who is currently in school, as they could continue their education and in that case the current represented value would not be accurate.
Highest Grade Completed by Age for the 4 outlier counties
pike <- subset(pike, school == "No, not in school")
ggplot(pike, aes(x = age, y = higrade, group = age)) + geom_boxplot() + xlab("Age Groups") + ylab("Highest Grade Completed") + ggtitle("Pike County")

ross <- subset(ross, school == "No, not in school")
ggplot(ross, aes(x = age, y = higrade, group = age)) + geom_boxplot() + xlab("Age Groups") + ylab("Highest Grade Completed") + ggtitle("Ross County")

madison <- subset(madison, school == "No, not in school")
ggplot(madison, aes(x = age, y = higrade, group = age)) + geom_boxplot() + xlab("Age Groups") + ylab("Highest Grade Completed") + ggtitle("Madison County")

fayette <- subset(fayette, school == "No, not in school")
ggplot(fayette, aes(x = age, y = higrade, group = age)) + geom_boxplot() + xlab("Age Groups") + ylab("Highest Grade Completed") + ggtitle("Fayette County")

Tractor Diffusion Maps
County Level
VAM Maps
















