Tractor

73 minute read

Overview

This research was a part of a greater project headed by Dr. William White and Dr. Richard Steckel of The Ohio State University’s Department of Economics. The core focus of the project was to look at the ways education was affected by early tractorization and rural manufacturing.

The Data

Much of the data came from Michale Haine’s ICPSR Study Number 2896. However quite a bit of cleaning and formatting was required to create sets which allowed for our analysis.

The Focus

This was broken down into several parts, atttempting to answer several different questions.

Each of these questions focused around rural counties based on RUCC codes. Highly urban areas were not of interest as they naturally would not contain much farming and thus tractorization would be minimal to non existent, manufacturing was already present in these areas and in general education was high relative to the rural communities.

Did early adoption of tractors lead to more manufacturing in these communities? 1A. Did the injection of manufacturing in these rural communities have an effect on education?
Was there an increase in average education level in the counties which had early adoption of tractors.

Value Added By Manufacturing

VAM is a metric that describes the how economy of a region, in this case a county, benefitted from the manufacturing industry within that space. This data was sparse but in order to gain best high-level understanind as possible, I created a map with gradient to visualize counties with highest and lowest VAM across our selected time period. Below are comparitive maps, one will all counties availble, the other with urban counties removed.

Tractor/VAM Plots & Models:
Ohio
Michigan
Wisconsin
Indiana
Illinois
Iowa

VAM Regression

Tractor Diffuision by State and Region

Ohio Detailed Analysis

Tractor Diffusion Maps

VAM Maps

Note about VAM Plots: Any counties with N/A value for VAM were replaced with 0. These are so few that I believe it has minimal effect on the trend lines.

#Function for producing EDU dataset
#
#
eduDataFunc <- function(state){
  edu_data <- subset(oth_state, stateicp == state)
  edu_data <- subset(edu_data, sex == "Male")
  edu_data <- subset(edu_data, between(age, 18,22))
  edu_data <- subset(edu_data, higrade < 20)
  edu_data <- subset(edu_data, higrade > 0)
  edu_data$higrade <- edu_data$higrade - 3

  reduced_edu_data <- edu_data %>%
    group_by(county) %>%
    summarise(avg = mean(higrade))

  return(reduced_edu_data)
}
#
#
#

eduMedianFunc <- function(state){
  edu_data <- subset(oth_state, stateicp == state)
  edu_data <- subset(edu_data, sex == "Male")
  edu_data <- subset(edu_data, between(age, 18,22))
  edu_data <- subset(edu_data, higrade < 20)
  edu_data <- subset(edu_data, higrade > 0)
  edu_data$higrade <- edu_data$higrade - 3

  reduced_edu_data <- edu_data %>%
    group_by(county) %>%
    summarise(med = median(higrade))

  return(reduced_edu_data)
}


#Function for producing Tractor set given State Abb
#
#
tractDataFunc <- function(stateAbbre){
  stateDf <- subset(df30, stateAbb == stateAbbre)
  stateDf$county <- stateDf$county * 10
  stateDf$X <- NULL
  return(stateDf)
}
#
#
#

#National Level

Looking at the national level of percent of farms with tractors over varying years. The percentages are measured on the county level.

% Farm Tractor 1940 vs 1930

df30_40 <- merge(df30, df40, by = c("name", "stateAbb"), all = FALSE)
ggplot(data = df30_40, aes(x=percent_farm_tractor.x, y=percent_farm_tractor.y)) + geom_point() + xlab(label = "% of Farms w/ Tractors 1930's") + ylab(label = "% of Farms w/ Tractors 1940's") + ggtitle(label = "National Level")

% Farm Tractor 1950 vs 1940

df40_50 <- merge(df40, df50, by = c("name", "stateAbb"), all = FALSE)
ggplot(data = df40_50, aes(x=percent_farm_tractor.x, y=percent_farm_tractor.y)) + geom_point() + xlab(label = "% of Farms w/ Tractors 1940's") + ylab(label = "% of Farms w/ Tractors 1950's") + ggtitle(label = "National Level")

% Farm Tractor 1954 vs 1950

df50_54 <- merge(df50, df54, by = c("name", "stateAbb"), all = FALSE)
ggplot(data = df50_54, aes(x=percent_farm_tractor.x, y=percent_farm_tractor.y)) + geom_point() + xlab(label = "% of Farms w/ Tractors 1950") + ylab(label = "% of Farms w/ Tractors 1954") + ggtitle(label = "National Level")

df50_54$logResponse <- log(df50_54$percent_farm_tractor.y)
df50_54$logX <- log(df50_54$percent_farm_tractor.x)
ggplot(data = df50_54, aes(x=percent_farm_tractor.x, y=logResponse)) + geom_point() + xlab(label = "% of Farms w/ Tractors 1950") + ylab(label = "Log(% of Farms w/ Tractors 1954)") + ggtitle(label = "National Level")

ggplot(data = df50_54, aes(x=logX, y=logResponse)) + geom_point() + xlab(label = "Log(% of Farms w/ Tractors 1950)") + ylab(label = "Log(% of Farms w/ Tractors 1954)") + ggtitle(label = "National Level")

% Farm Tractor 1964 vs 1954

df54_64 <- merge(df54, df64, by = c("name", "stateAbb"), all = FALSE)
ggplot(data = df54_64, aes(x=percent_farm_tractor.x, y=percent_farm_tractor.y)) + geom_point() + xlab(label = "% of Farms w/ Tractors 1954") + ylab(label = "% of Farms w/ Tractors 1964") + ggtitle(label = "National Level")

OHIO

Mean Edu vs. Tractor
Median Edu vs. Tractor
Value Added by Manufacturing

edu <- read.dta13("ohio1940.dta")

ohio1930 <- subset(df30, stateAbb == "OH")
#Adjust FIPS code representation to match the EDU dataset
ohio1930$county <- ohio1930$county * 10
#Remove Pickaway
ohio1930 <- subset(ohio1930, county != 129)

#SUBSET to White males between 18,22 and higrade < 20 which would be graduate school
edu <- subset(edu, sex == "Male")
edu <- subset(edu, between(age, 18, 22))
edu <- subset(edu, higrade < 20)
edu$higrade <- edu$higrade - 3

#Remove all values less than 1
edu <- edu[which(edu$higrade > 0),]


#THIS IS FINAL 1940 EDU, write out to new dataset
write.csv(edu, "ohio1940edu.csv")


newEdu <- edu %>%
  group_by(county) %>%
  summarise(avg = mean(higrade))

ohPlotSrc <- merge(newEdu, ohio1930, by = "county")

ggplot(data = ohPlotSrc, aes(x=percent_farm_tractor, y=avg)) +xlab(label = "% Farms w/ Tractors 1930") + ylab(label = "Avg Years Education") + ggtitle(label = "Ohio") + geom_point()

LOESS REGRESSION on 1940’s Avg EDU vs % Farms w/ Tractor 1930’s for Ohio

ggplot(ohPlotSrc,aes(x = percent_farm_tractor, y = avg)) + geom_point()  + geom_smooth() + xlab("1930's % Farms w/ Tractors") + ylab("1940's Edu") + ggtitle("Ohio")

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

#identify(ohio1930$percent_farm_tractor, newEdu$avg)

##Transform ohio edu response on Log Scale

Ohio Mean Education Model

Linear Log Interpretation

ohPlotSrc$logAvg <- log(ohPlotSrc$avg)

ohEduMod <- lm(logAvg ~ percent_farm_tractor, data = ohPlotSrc)
ggplot(data = ohPlotSrc, aes(x=percent_farm_tractor, y=logAvg)) + geom_point() + geom_smooth(method = "lm") + xlab(label = "% Farms Tractor 1930's") + ylab(label = "Log(Avg Education 1940's)") + ggtitle(label = "Log-Linear(Ohio Edu vs Tractor)")

## `geom_smooth()` using formula 'y ~ x'

summary(ohEduMod)

##
## Call:
## lm(formula = logAvg ~ percent_farm_tractor, data = ohPlotSrc)
##
## Residuals:
##       Min        1Q    Median        3Q       Max
## -0.121748 -0.021526  0.008973  0.029746  0.066242
##
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)
## (Intercept)          2.280754   0.009254 246.458  < 2e-16 ***
## percent_farm_tractor 0.186915   0.035457   5.272 9.92e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04027 on 86 degrees of freedom
## Multiple R-squared:  0.2442,	Adjusted R-squared:  0.2354
## F-statistic: 27.79 on 1 and 86 DF,  p-value: 9.92e-07

Ohio - Median Education Plots

eduMedian <- edu %>%
  group_by(county) %>%
  summarise(med = median(higrade))
# Log of Education Median
eduMedian$logMed <- log(eduMedian$med)

ggplot(data = ohio1930, aes(x=jitter(percent_farm_tractor), y=jitter(eduMedian$med))) + geom_point() + xlab(label = "% Farms w/ Tractors 1930's") + ylab(label = "1940's Median Avg. Education")

ggplot(data = ohio1930, aes(x=jitter(percent_farm_tractor), y=jitter(eduMedian$logMed))) + geom_point() + xlab(label = "% Farms w/ Tractors 1930's") + ylab(label = "1940's Median Avg. Education")

Ohio - Median Education Model

ohPlotSrc_med <- merge(eduMedian, ohio1930, by = "county")
ohEduMod_med <- lm(logMed ~ percent_farm_tractor, data = ohPlotSrc_med)
summary(ohEduMod_med)

##
## Call:
## lm(formula = logMed ~ percent_farm_tractor, data = ohPlotSrc_med)
##
## Residuals:
##      Min       1Q   Median       3Q      Max
## -0.23988 -0.01976  0.01381  0.03978  0.09668
##
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)
## (Intercept)           2.29588    0.01401 163.899   <2e-16 ***
## percent_farm_tractor  0.24231    0.05367   4.515    2e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.06095 on 86 degrees of freedom
## Multiple R-squared:  0.1916,	Adjusted R-squared:  0.1822
## F-statistic: 20.38 on 1 and 86 DF,  p-value: 2.002e-05

oh_med_df <- as.data.frame(resid(ohEduMod_med))
oh_med_df$county <- ohPlotSrc_med$name
colnames(oh_med_df) <- c( "residual", "county")

Ohio Median Model County Residuals

head(oh_med_df)

##      residual    county
## 1 -0.11295981     ADAMS
## 2  0.04748636     ALLEN
## 3  0.05497776   ASHLAND
## 4  0.04637276 ASHTABULA
## 5  0.08549094    ATHENS
## 6  0.02526599  AUGLAIZE

OH EDU VS % Farm’s w/ Tractors S.S.E.

#Calculate SSE
pred_ohEdu <- predict(ohEduMod, ohPlotSrc)
#Undo log transformation
pred_ohEdu <- exp(pred_ohEdu)-1

##Sum of Square error
mean((ohPlotSrc$avg - pred_ohEdu)^2)

## [1] 1.174032

#Residuals
ohPlotSrc$modResid <- resid(ohEduMod)
ohPlotSrc$fitted <- predict(ohEduMod)
ggplot(ohPlotSrc, aes(x = fitted, y = modResid)) + geom_point() + geom_hline(yintercept = 0, colour = "red") + ggtitle(label = "OH Edu vs Tractor Residuals")

OHIO Value Added by Manufacturing

VAM 1947 vs % 1940

#SCATTER DIAGRAM AND REGRESSION FOR OHIO VAM VS EDU
#VAM dataset
vam_ohio <- subset(vam, state == "Ohio")
#write.csv(vam_ohio, "vamOhio.csv")

#VAM in 1947 vs % Tractors 1940
vam_ohio <- subset(vam_ohio, year == 1947)
ohio1940 <- subset(df40, stateAbb == "OH")
ohio1930 <- subset(df30, stateAbb == "OH")
ohio1940$county <- ohio1940$county * 10

ggplot(data = ohio1940, aes(x=percent_farm_tractor, y=vam_ohio$VAM)) + geom_point() + xlab(label = "% Farm's w/ Tractors 1940") + ylab(label = "VAM 1947") + ggtitle(label = "Ohio VAM vs. % Tractor") + geom_smooth(method = "lm")

## `geom_smooth()` using formula 'y ~ x'

VAM 1958 vs % 1950

ohio1950 <- subset(df50,stateAbb == "OH" )
vam_ohio1957 <- subset(vam, year == 1958 & state == "Ohio")
ohVam_mod <- lm(vam_ohio1957$VAM ~ ohio1950$percent_farm_tractor)
summary(ohVam_mod)

##
## Call:
## lm(formula = vam_ohio1957$VAM ~ ohio1950$percent_farm_tractor)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -569.63 -189.42  -25.26  194.35  616.99
##
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)
## (Intercept)                      188.7      119.4   1.581  0.11950
## ohio1950$percent_farm_tractor    546.8      188.4   2.902  0.00526 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 273.3 on 57 degrees of freedom
##   (29 observations deleted due to missingness)
## Multiple R-squared:  0.1288,	Adjusted R-squared:  0.1135
## F-statistic: 8.424 on 1 and 57 DF,  p-value: 0.005256

ggplot(data = ohio1950, aes(x=percent_farm_tractor, y=vam_ohio1957$VAM)) + geom_point() + xlab(label = "% Farm's w/ Tractors 1950") + ylab(label = "VAM 1958") + ggtitle(label = "Ohio VAM vs. % Tractor") + geom_smooth(method = "lm")

## `geom_smooth()` using formula 'y ~ x'

VAM 1947 vs % 1930

ohVam_mod1947 <- lm(vam_ohio$VAM ~ ohio1930$percent_farm_tractor)
summary(ohVam_mod1947)

##
## Call:
## lm(formula = vam_ohio$VAM ~ ohio1930$percent_farm_tractor)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -447.78 -210.40  -43.62  235.08  562.43
##
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)
## (Intercept)                     214.00      62.37   3.431 0.000959 ***
## ohio1930$percent_farm_tractor   827.97     240.82   3.438 0.000938 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 269.5 on 79 degrees of freedom
##   (7 observations deleted due to missingness)
## Multiple R-squared:  0.1302,	Adjusted R-squared:  0.1191
## F-statistic: 11.82 on 1 and 79 DF,  p-value: 0.0009376

ggplot(data = ohio1930, aes(x=percent_farm_tractor, y=vam_ohio$VAM)) + geom_point() + xlab(label = "% Farm's w/ Tractors 1930") + ylab(label = "VAM 1947") + ggtitle(label = "Ohio VAM vs. % Tractor") + geom_smooth(method = "lm")

## `geom_smooth()` using formula 'y ~ x'

1940 Census Dataset; Pull other midwest states from here

# NEW DATASET! -> EDU 1940 vs % Tractor 1930
#oth_state <- read.dta13("cen1940A.dta")
#saveRDS(oth_state, "midwest_state.rds")

# Restore the data object. Shit is huge...I was tired of loading it over and over...
oth_state = readRDS(file = "midwest_state.rds")

#States:
# Michigan, Wisconsin, Indiana, Illinois, Iowa, Minnesota

MICHIGAN

## [1] "use `stateicp` as locations\nage between 18,22\nhigrade < 20"

#LOESS REGRESSION on EDU vs % Tractor

ggplot(michPlotSrc,aes(x = percent_farm_tractor, y = avg)) + geom_point()  + geom_smooth() + xlab("1930's % Tractor") + ylab("1940 Avg. Edu") + ggtitle("Michigan")

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

#Transform response on Log Scale

michPlotSrc$logAvg <- log(michPlotSrc$avg)

ggplot(data = michPlotSrc, aes(x=percent_farm_tractor, y=logAvg)) + geom_point() + ggtitle(label = "Log-Linear")

michEduMod <- lm(logAvg ~ percent_farm_tractor, data = michPlotSrc)
summary(michEduMod)

##
## Call:
## lm(formula = logAvg ~ percent_farm_tractor, data = michPlotSrc)
##
## Residuals:
##       Min        1Q    Median        3Q       Max
## -0.219400 -0.036979  0.000761  0.041468  0.100975
##
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)
## (Intercept)           2.25242    0.01312 171.704   <2e-16 ***
## percent_farm_tractor  0.17868    0.07160   2.496   0.0146 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.05275 on 81 degrees of freedom
## Multiple R-squared:  0.07139,	Adjusted R-squared:  0.05993
## F-statistic: 6.228 on 1 and 81 DF,  p-value: 0.01461

#Residuals
michPlotSrc$modResid <- resid(michEduMod)
michPlotSrc$fitted <- predict(michEduMod)
ggplot(michPlotSrc, aes(x = fitted, y = modResid)) + geom_point() + geom_hline(yintercept = 0, colour = "red") + ggtitle(label = "MI Edu vs Tractor Residuals")

#Michigan - Median

michMedian <- eduMedianFunc("Michigan")

ggplot(data = mich_tract, aes(x=percent_farm_tractor, y=jitter(michMedian$med))) + geom_point() + ggtitle(label = "Michigan-Median")

michMedian$logMed <- log(michMedian$med)

ggplot(data = mich_tract, aes(x=jitter(percent_farm_tractor), y=jitter(michMedian$logMed))) + geom_point() + xlab("% 1930's") + ylab("Log Median years education") + ggtitle("Michigan-Log(Median)")

Michigan - Median Model

plotSrc_med_mich <- merge(michMedian, mich_tract, by = "county")
michEduMod_med <- lm(logMed ~ percent_farm_tractor, data = plotSrc_med_mich)
summary(michEduMod_med)

##
## Call:
## lm(formula = logMed ~ percent_farm_tractor, data = plotSrc_med_mich)
##
## Residuals:
##      Min       1Q   Median       3Q      Max
## -0.25098 -0.06656  0.02238  0.05473  0.14252
##
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)
## (Intercept)           2.23574    0.02284  97.892  < 2e-16 ***
## percent_farm_tractor  0.33943    0.12465   2.723  0.00792 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.09184 on 81 degrees of freedom
## Multiple R-squared:  0.08386,	Adjusted R-squared:  0.07255
## F-statistic: 7.415 on 1 and 81 DF,  p-value: 0.007921

mi_med_df <- as.data.frame(resid(michEduMod_med))
mi_med_df$county <- plotSrc_med_mich$name
colnames(mi_med_df) <- c( "residual", "county")

Michigan Median Model County Residuals

mi_med_df

##        residual            county
## 1  -0.187275178            ALCONA
## 2  -0.085014801             ALGER
## 3   0.008819658           ALLEGAN
## 4  -0.079919458            ALPENA
## 5  -0.077488032            ANTRIM
## 6  -0.200353056            ARENAC
## 7   0.025425850            BARAGA
## 8   0.110906381             BARRY
## 9  -0.012837076               BAY
## 10  0.031026497            BENZIE
## 11  0.107743596           BERRIEN
## 12  0.008707873            BRANCH
## 13  0.111953523           CALHOUN
## 14  0.120925711              CASS
## 15  0.036165627        CHARLEVOIX
## 16 -0.065297665         CHEBOYGAN
## 17  0.020599745          CHIPPEWA
## 18  0.033014005             CLARE
## 19 -0.042063207           CLINTON
## 20  0.014172802          CRAWFORD
## 21  0.012887339             DELTA
## 22  0.101942375         DICKINSON
## 23  0.072106438             EATON
## 24 -0.066056828             EMMET
## 25  0.055198261           GENESEE
## 26 -0.069792934           GLADWIN
## 27  0.142516707           GOGEBIC
## 28  0.011545445    GRAND TRAVERSE
## 29 -0.032579808           GRATIOT
## 30  0.100856541         HILLSDALE
## 31  0.039725469          HOUGHTON
## 32 -0.250981531             HURON
## 33  0.068038995            INGHAM
## 34 -0.128137603             IONIA
## 35  0.027636661             IOSCO
## 36  0.130371319              IRON
## 37 -0.101475239          ISABELLA
## 38  0.105659303           JACKSON
## 39  0.100879266         KALAMAZOO
## 40 -0.075924113          KALKASKA
## 41  0.104813894              KENT
## 42 -0.169611558          KEWEENAW
## 43  0.046514135              LAKE
## 44 -0.216084187            LAPEER
## 45 -0.075546269          LEELANAU
## 46  0.044995306           LENAWEE
## 47  0.076575847        LIVINGSTON
## 48 -0.001805789              LUCE
## 49 -0.093317083 MACKINAC/MICHILIM
## 50  0.061968971            MACOMB
## 51  0.039619066          MANISTEE
## 52  0.126050759         MARQUETTE
## 53  0.116743765             MASON
## 54  0.040682349           MECOSTA
## 55  0.025128222         MENOMINEE
## 56 -0.004886346           MIDLAND
## 57 -0.183214337         MISSAUKEE
## 58 -0.032406605            MONROE
## 59  0.024902446          MONTCALM
## 60  0.022383630       MONTMORENCY
## 61  0.114975307          MUSKEGON
## 62 -0.020210397           NEWAYGO
## 63  0.051362076           OAKLAND
## 64  0.027366523            OCEANA
## 65 -0.069876049            OGEMAW
## 66 -0.080123158         ONTONAGON
## 67  0.048305016           OSCEOLA
## 68 -0.010300206            OSCODA
## 69 -0.067060519            OTSEGO
## 70  0.025111206            OTTAWA
## 71 -0.192049448      PRESQUE ISLE
## 72  0.032899974         ROSCOMMON
## 73 -0.040745153           SAGINAW
## 74 -0.013826630          ST CLAIR
## 75  0.117125263         ST JOSEPH
## 76 -0.103298228           SANILAC
## 77  0.014549838       SCHOOLCRAFT
## 78 -0.043640768        SHIAWASSEE
## 79 -0.134888560           TUSCOLA
## 80  0.119313235         VAN BUREN
## 81  0.054252119         WASHTENAW
## 82  0.045686965             WAYNE
## 83  0.047936516           WEXFORD

#SCATTER DIAGRAM AND REGRESSION FOR MICHIGAN VAM VS EDU

mich_vam <- subset(vam, state == "Michigan" & year == 1947)
ggplot(data=mich_tract, aes(x=percent_farm_tractor, y=mich_vam$VAM)) + geom_point() + geom_smooth(method = "lm") + ggtitle("Michigan VAM 1947 vs % Tractor 1940")

## `geom_smooth()` using formula 'y ~ x'

outlierMich <- michPlotSrc[44,]
outlierMich

##    county      avg year state   name percent_farm_tractor stateAbb  fips
## 44    870 7.881265 1930    26 LAPEER            0.1761294       MI 26087
##    colorID   logAvg   modResid   fitted
## 44       3 2.064488 -0.2194003 2.283889

MI EDU VS % SSE

#Calculate SSE
pred_michEdu <- predict(michEduMod, michPlotSrc)
#Undo log transformation
pred_michEdu <- exp(pred_michEdu)-1

mean((michPlotSrc$avg - pred_michEdu)^2)

## [1] 1.276815

VAM 1958 vs % 1950

mich1930 <- subset(df30,stateAbb == "MI" )
mich1950 <- subset(df50,stateAbb == "MI" )
vam_mich1957 <- subset(vam, year == 1958 & state == "Michigan")
michVam_mod <- lm(vam_mich1957$VAM ~ mich1950$percent_farm_tractor)
summary(michVam_mod)

##
## Call:
## lm(formula = vam_mich1957$VAM ~ mich1950$percent_farm_tractor)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -365.46 -196.66  -25.15  184.44  503.44
##
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)
## (Intercept)                      243.3      202.0   1.204    0.233
## mich1950$percent_farm_tractor    140.9      297.2   0.474    0.637
##
## Residual standard error: 241.4 on 62 degrees of freedom
##   (19 observations deleted due to missingness)
## Multiple R-squared:  0.003611,	Adjusted R-squared:  -0.01246
## F-statistic: 0.2247 on 1 and 62 DF,  p-value: 0.6372

ggplot(data=mich1950, aes(x=percent_farm_tractor, y=vam_mich1957$VAM)) + geom_point() + ggtitle("Michigan - 1957 VAM vs 1950 %")

VAM 1947 vs % 1930

vam_mich1947 <- subset(vam, year == 1947 & state == "Michigan")
michVam_mod1947 <- lm(vam_mich1947$VAM ~ mich1930$percent_farm_tractor)
summary(michVam_mod1947)

##
## Call:
## lm(formula = vam_mich1947$VAM ~ mich1930$percent_farm_tractor)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -427.69 -191.52   -8.99  126.28  552.66
##
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)
## (Intercept)                      71.76      62.43    1.15    0.254
## mich1930$percent_farm_tractor  1542.55     349.03    4.42 3.24e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 243.3 on 76 degrees of freedom
##   (5 observations deleted due to missingness)
## Multiple R-squared:  0.2045,	Adjusted R-squared:  0.194
## F-statistic: 19.53 on 1 and 76 DF,  p-value: 3.24e-05

ggplot(data=mich1930, aes(percent_farm_tractor, vam_mich1947$VAM)) + ggtitle("Michigan - 1947 VAM vs 1930 %")  + geom_point()

WISCONSIN

#Wisconsin Edu
wis_edu <- eduDataFunc("Wisconsin")

#Wisconsin Tractor
wis_tract <- tractDataFunc("WI")

plot(wis_tract$percent_farm_tractor, wis_edu$avg, xlab = "1930 % Tractor", ylab = "1940 Avg. Edu", main = "Wisconsin")

plot(log(wis_tract$percent_farm_tractor), wis_edu$avg)

wisPlotSrc <- merge(wis_edu, wis_tract, by = "county")

#LOESS REGRESSION on EDU vs % Tractor
ggplot(wisPlotSrc,aes(x = percent_farm_tractor, y = avg)) + geom_point()  + geom_smooth() + xlab("1930's % Tractor") + ylab("1940 Avg. Edu") + ggtitle("Wisconsin")

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

#Transform response on Log Scale
wisPlotSrc$logAvg <- log(wisPlotSrc$avg)
wisPlotSrc$logX <- log(wisPlotSrc$percent_farm_tractor)

plot(wisPlotSrc$percent_farm_tractor, wisPlotSrc$logAvg, main = "Log-Linear")

wisEduMod <- lm(logAvg ~ percent_farm_tractor, data = wisPlotSrc)
summary(wisEduMod)

##
## Call:
## lm(formula = logAvg ~ percent_farm_tractor, data = wisPlotSrc)
##
## Residuals:
##       Min        1Q    Median        3Q       Max
## -0.083969 -0.025500 -0.000201  0.020619  0.092817
##
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)
## (Intercept)          2.265011   0.009967 227.256  < 2e-16 ***
## percent_farm_tractor 0.100321   0.035121   2.856  0.00566 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03962 on 69 degrees of freedom
## Multiple R-squared:  0.1057,	Adjusted R-squared:  0.09279
## F-statistic: 8.159 on 1 and 69 DF,  p-value: 0.005656

#Residuals
wisPlotSrc$modResid <- resid(wisEduMod)
wisPlotSrc$fitted <- predict(wisEduMod)
ggplot(wisPlotSrc, aes(x = fitted, y = modResid)) + geom_point() + geom_hline(yintercept = 0, colour = "red") + ggtitle(label = "WI Edu vs Tractor Residuals")

#SCATTER DIAGRAM AND REGRESSION FOR WISCONSIN VAM VS EDU
wis_vam <- subset(vam, state == "Wisconsin" & year == 1947)
wis_vam[is.na(wis_vam)] <- 0
plot(wis_tract$percent_farm_tractor, wis_vam$VAM, xlab = "% 1940", ylab = "VAM 1947", ylim = c(0,1000), main = "Wisconsin VAM vs % Tractor")
abline(lm(wis_vam$VAM ~ wis_tract$percent_farm_tractor))

WI EDU VS % SSE

#Calculate SSE
pred_wisEdu <- predict(wisEduMod, wisPlotSrc)
#Undo log transformation
pred_wisEdu <- exp(pred_wisEdu)-1

mean((wisPlotSrc$avg - pred_wisEdu)^2)

## [1] 1.164601

VAM 1958 vs % 1950

wis1950 <- subset(df50,stateAbb == "WI" )
wis1930 <- subset(df30,stateAbb == "WI" )
vam_wis1957 <- subset(vam, year == 1958 & state == "Wisconsin")
wisVam_mod <- lm(vam_wis1957$VAM ~ wis1950$percent_farm_tractor)
summary(wisVam_mod)

##
## Call:
## lm(formula = vam_wis1957$VAM ~ wis1950$percent_farm_tractor)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -333.26 -192.11  -27.48  103.04  676.52
##
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)
## (Intercept)                    -2.341    237.072  -0.010    0.992
## wis1950$percent_farm_tractor  497.488    326.784   1.522    0.134
##
## Residual standard error: 251.5 on 54 degrees of freedom
##   (15 observations deleted due to missingness)
## Multiple R-squared:  0.04115,	Adjusted R-squared:  0.0234
## F-statistic: 2.318 on 1 and 54 DF,  p-value: 0.1337

plot(wis1950$percent_farm_tractor, vam_wis1957$VAM, main = "Wisconsin - 1957 VAM vs 1950 %")
abline(wisVam_mod, col = "red")

VAM 1947 vs % 1930

vam_wis1947 <- subset(vam, year == 1947 & state == "Wisconsin")
wisVam_mod1947 <- lm(vam_wis1947$VAM ~ wis1930$percent_farm_tractor)
summary(wisVam_mod1947)

##
## Call:
## lm(formula = vam_wis1947$VAM ~ wis1930$percent_farm_tractor)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -484.24 -152.21  -90.96  135.29  674.42
##
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)
## (Intercept)                     55.46      62.58   0.886    0.379
## wis1930$percent_farm_tractor  1046.06     228.27   4.583  2.1e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 243.9 on 66 degrees of freedom
##   (3 observations deleted due to missingness)
## Multiple R-squared:  0.2414,	Adjusted R-squared:  0.2299
## F-statistic:    21 on 1 and 66 DF,  p-value: 2.098e-05

plot(wis1930$percent_farm_tractor, vam_wis1947$VAM, main = "Wisconsin - 1947 VAM vs 1930 %")
abline(wisVam_mod1947, col = "red")

INDIANA

#Indiana Edu
in_edu <- eduDataFunc("Indiana")

#Indiana Tractor
in_tract <- tractDataFunc("IN")

plot(in_tract$percent_farm_tractor, in_edu$avg, xlab = "1930 % Tractor", ylab = "1940 Avg. Edu", main = "Indiana")

inPlotSrc <- merge(in_edu, in_tract, by = "county")

#LOESS REGRESSION on EDU vs % Tractor
ggplot(inPlotSrc,aes(x = percent_farm_tractor, y = avg)) + geom_point()  + geom_smooth() + xlab("1930's % Tractor") + ylab("1940 Avg. Edu") + ggtitle("Indiana")

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

#Transform response on Log Scale

inPlotSrc$logAvg <- log(inPlotSrc$avg)

plot(inPlotSrc$percent_farm_tractor, inPlotSrc$logAvg, main = "Log-Linear")

inEduMod <- lm(logAvg ~ percent_farm_tractor, data = inPlotSrc)
summary(inEduMod)

##
## Call:
## lm(formula = logAvg ~ percent_farm_tractor, data = inPlotSrc)
##
## Residuals:
##       Min        1Q    Median        3Q       Max
## -0.128838 -0.020547  0.000339  0.027807  0.068493
##
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)
## (Intercept)          2.271406   0.009384 242.049  < 2e-16 ***
## percent_farm_tractor 0.243673   0.038546   6.322 9.74e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03588 on 90 degrees of freedom
## Multiple R-squared:  0.3075,	Adjusted R-squared:  0.2998
## F-statistic: 39.96 on 1 and 90 DF,  p-value: 9.74e-09

#Residuals
inPlotSrc$modResid <- resid(inEduMod)
inPlotSrc$fitted <- predict(inEduMod)
ggplot(inPlotSrc, aes(x = fitted, y = modResid)) + geom_point() + geom_hline(yintercept = 0, colour = "red") + ggtitle(label = "IN Edu vs Tractor Residuals")

#SCATTER DIAGRAM AND REGRESSION FOR Indiana VAM VS EDU

in_vam <- subset(vam, state == "Indiana" & year == 1947)
rownames(in_vam) <- 1:nrow(in_vam)
in_vam <- in_vam[-75,]
rownames(in_vam) <- 1:nrow(in_vam)
in_vam[is.na(in_vam)] <- 0
plot(in_tract$percent_farm_tractor, in_vam$VAM, xlab = "% 1940", ylab = "VAM 1947", ylim = c(0,1000), main = "Indiana VAM vs % Tractor")
abline(lm(in_vam$VAM ~ in_tract$percent_farm_tractor))

#Indiana - Median

inMedian <- eduMedianFunc("Indiana")

plot(jitter(in_tract$percent_farm_tractor), jitter(inMedian$med), xlab = "% 1930", ylab ="Median years education", main = "Indiana- Median")

inMedian$logMed <- log(inMedian$med)

plot(jitter(in_tract$percent_farm_tractor), jitter(inMedian$logMed), xlab = "% 1930", ylab ="Median years education", main = "Indiana- LogMedian")

Indiana - Median Model

plotSrc_med_in <- merge(inMedian, in_tract, by = "county")
inEduMod_med <- lm(logMed ~ percent_farm_tractor, data = plotSrc_med_in)
summary(inEduMod_med)

##
## Call:
## lm(formula = logMed ~ percent_farm_tractor, data = plotSrc_med_in)
##
## Residuals:
##       Min        1Q    Median        3Q       Max
## -0.217651 -0.021463 -0.000837  0.035208  0.135319
##
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)
## (Intercept)           2.27671    0.01582   143.9  < 2e-16 ***
## percent_farm_tractor  0.38349    0.06500     5.9  6.3e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0605 on 90 degrees of freedom
## Multiple R-squared:  0.2789,	Adjusted R-squared:  0.2709
## F-statistic: 34.81 on 1 and 90 DF,  p-value: 6.296e-08

in_med_df <- as.data.frame(resid(inEduMod_med))
in_med_df$county <- plotSrc_med_in$name
colnames(in_med_df) <- c( "residual", "county")

Indiana Median Model County Residuals

in_med_df

##         residual      county
## 1  -0.0039011654       ADAMS
## 2  -0.0045533470       ALLEN
## 3   0.0485600175 BARTHOLOMEW
## 4  -0.0556421083      BENTON
## 5   0.0397637243   BLACKFORD
## 6   0.0172311300       BOONE
## 7  -0.0974781526       BROWN
## 8  -0.0349227644     CARROLL
## 9   0.0353692131        CASS
## 10 -0.0304488200       CLARK
## 11  0.0702713112        CLAY
## 12 -0.0050793908     CLINTON
## 13 -0.2176513073    CRAWFORD
## 14 -0.0297271964     DAVIESS
## 15 -0.0268200706    DEARBORN
## 16 -0.0124768965     DECATUR
## 17  0.0184916342     DE KALB
## 18  0.0351542336    DELAWARE
## 19 -0.1312904249      DUBOIS
## 20  0.0608813534     ELKHART
## 21 -0.1196820773     FAYETTE
## 22 -0.0241780504       FLOYD
## 23  0.0115811945    FOUNTAIN
## 24 -0.1688033073    FRANKLIN
## 25  0.0746289970      FULTON
## 26  0.0253089310      GIBSON
## 27 -0.0007629611       GRANT
## 28  0.0866655157      GREENE
## 29 -0.0014830516    HAMILTON
## 30  0.0116994290     HANCOCK
## 31 -0.0128748566    HARRISON
## 32  0.0177421484   HENDRICKS
## 33  0.0178206224       HENRY
## 34 -0.0025762048      HOWARD
## 35  0.1042119985  HUNTINGTON
## 36 -0.0313739026     JACKSON
## 37  0.0069599110      JASPER
## 38  0.0473013013         JAY
## 39 -0.0122367435   JEFFERSON
## 40 -0.0157585697    JENNINGS
## 41 -0.0086250188     JOHNSON
## 42  0.0122629645        KNOX
## 43  0.0662756138   KOSCIUSKO
## 44 -0.0317447280    LAGRANGE
## 45 -0.0050625322        LAKE
## 46  0.0047870347    LA PORTE
## 47 -0.0144954635    LAWRENCE
## 48  0.0113978554     MADISON
## 49  0.0066256475      MARION
## 50  0.0609437234    MARSHALL
## 51 -0.0171457002      MARTIN
## 52  0.0190393905       MIAMI
## 53 -0.0009467869      MONROE
## 54  0.0065750034  MONTGOMERY
## 55  0.0373660608      MORGAN
## 56 -0.0211253141      NEWTON
## 57  0.0415637340       NOBLE
## 58  0.0745590830        OHIO
## 59 -0.0223773748      ORANGE
## 60  0.0909766279        OWEN
## 61  0.0448138997       PARKE
## 62 -0.1093545276       PERRY
## 63 -0.0145292159        PIKE
## 64  0.0201064860      PORTER
## 65 -0.1790683618       POSEY
## 66 -0.0667393709     PULASKI
## 67  0.0619769431      PUTNAM
## 68  0.0232313852    RANDOLPH
## 69 -0.0342631411      RIPLEY
## 70 -0.0356922635        RUSH
## 71  0.0262023096   ST JOSEPH
## 72 -0.0171502153       SCOTT
## 73  0.0008867328      SHELBY
## 74 -0.0079500668     SPENCER
## 75 -0.0655714442      STARKE
## 76  0.1309497809     STEUBEN
## 77  0.0687769074    SULLIVAN
## 78 -0.1054192161 SWITZERLAND
## 79 -0.0023580977  TIPPECANOE
## 80  0.0011144066      TIPTON
## 81 -0.0256990445       UNION
## 82  0.0413615048 VANDERBURGH
## 83  0.0276761811  VERMILLION
## 84  0.0607328271        VIGO
## 85  0.0055664474      WABASH
## 86 -0.0211584038      WARREN
## 87 -0.0186099550     WARRICK
## 88 -0.0263503943  WASHINGTON
## 89 -0.0009102887       WAYNE
## 90  0.0916333530       WELLS
## 91 -0.0102948964       WHITE
## 92  0.1353186218     WHITLEY

IN EDU VS % SSE

#Calculate SSE
pred_inEdu <- predict(inEduMod, inPlotSrc)
#Undo log transformation
pred_inEdu <- exp(pred_inEdu)-1

mean((inPlotSrc$avg - pred_inEdu)^2)

## [1] 1.13942

VAM 1958 vs % 1950

in1950 <- subset(df50,stateAbb == "IN" )
in1930 <- subset(df30,stateAbb == "IN" )
vam_in1957 <- subset(vam, year == 1958 & state == "Indiana")
inVam_mod <- lm(vam_in1957$VAM ~ in1950$percent_farm_tractor)
summary(inVam_mod)

##
## Call:
## lm(formula = vam_in1957$VAM ~ in1950$percent_farm_tractor)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -450.65 -242.12  -46.76  201.85  583.08
##
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)
## (Intercept)                    142.0      173.0   0.821    0.415
## in1950$percent_farm_tractor    395.2      269.2   1.468    0.147
##
## Residual standard error: 289 on 69 degrees of freedom
##   (21 observations deleted due to missingness)
## Multiple R-squared:  0.03028,	Adjusted R-squared:  0.01623
## F-statistic: 2.155 on 1 and 69 DF,  p-value: 0.1467

plot(in1950$percent_farm_tractor, vam_in1957$VAM, main = "Indiana - 1957 VAM vs 1950 %")
abline(inVam_mod, col = "red")

VAM 1947 vs % 1930

vam_in1947 <- subset(vam, year == 1947 & state == "Indiana")
vam_in1947 <- vam_in1947[-75,]

inVam_mod1947 <- lm(vam_in1947$VAM ~ in1930$percent_farm_tractor)
summary(inVam_mod1947)

##
## Call:
## lm(formula = vam_in1947$VAM ~ in1930$percent_farm_tractor)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -325.84 -212.46  -97.32  134.94  702.92
##
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)
## (Intercept)                   244.49      72.15   3.389  0.00108 **
## in1930$percent_farm_tractor   232.64     303.97   0.765  0.44627
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 269.6 on 82 degrees of freedom
##   (8 observations deleted due to missingness)
## Multiple R-squared:  0.007092,	Adjusted R-squared:  -0.005016
## F-statistic: 0.5857 on 1 and 82 DF,  p-value: 0.4463

plot(in1930$percent_farm_tractor, vam_in1947$VAM, main = "Indiana - 1947 VAM vs 1930 %")
abline(inVam_mod1947, col = "red")

Illinois

#Illinois Edu
il_edu <- eduDataFunc("Illinois")

il_edu_med <- eduMedianFunc("Illinois")

#Illinois Tractor
il_tract <- tractDataFunc("IL")

plot(il_tract$percent_farm_tractor, il_edu$avg, xlab = "1930 % Tractor", ylab = "1940 Avg. Edu", main = "Illinois")

plot(il_tract$percent_farm_tractor, il_edu_med$med, xlab = "1930 % Tractor", ylab = "1940 Med. Edu", main = "Illinois")

ilPlotSrc <- merge(il_edu, il_tract, by = "county")

#LOESS REGRESSION on EDU vs % Tractor

ggplot(ilPlotSrc,aes(x = percent_farm_tractor, y = avg)) + geom_point()  + geom_smooth() + xlab("1930's % Tractor") + ylab("1940 Avg. Edu") + ggtitle("Illinois")

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

#Transform response on Log Scale

ilPlotSrc$logAvg <- log(ilPlotSrc$avg)

plot(ilPlotSrc$percent_farm_tractor, ilPlotSrc$logAvg, main = "Log-Linear")

ilEduMod <- lm(logAvg ~ percent_farm_tractor, data = ilPlotSrc)
summary(ilEduMod)

##
## Call:
## lm(formula = logAvg ~ percent_farm_tractor, data = ilPlotSrc)
##
## Residuals:
##       Min        1Q    Median        3Q       Max
## -0.228625 -0.017533  0.009969  0.029582  0.071982
##
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)
## (Intercept)           2.22397    0.01067 208.505  < 2e-16 ***
## percent_farm_tractor  0.22051    0.03129   7.047 2.39e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04953 on 100 degrees of freedom
## Multiple R-squared:  0.3318,	Adjusted R-squared:  0.3251
## F-statistic: 49.66 on 1 and 100 DF,  p-value: 2.385e-10

#Residuals - Mean Model

ilPlotSrc$modResid <- resid(ilEduMod)
ilPlotSrc$fitted <- predict(ilEduMod)
ggplot(ilPlotSrc, aes(x = fitted, y = modResid)) + geom_point() + geom_hline(yintercept = 0, colour = "red") + ggtitle(label = "IL Edu vs Tractor Residuals")

#SCATTER DIAGRAM AND REGRESSION FOR Illinois VAM VS EDU

il_vam <- subset(vam, state == "Illinois" & year == 1947)
rownames(il_vam) <- 1:nrow(il_vam)
il_vam[is.na(il_vam)] <- 0
plot(il_tract$percent_farm_tractor, il_vam$VAM, xlab = "% 1940", ylab = "VAM 1947", ylim = c(0,1000), main = "Illinois VAM vs % Tractor")
abline(lm(il_vam$VAM ~ il_tract$percent_farm_tractor))

#Identify outliers
 #outlier <- ilPlotSrc[c(35,54),]
 #outlier

#Illinois - Median

ilMedian <- eduMedianFunc("Illinois")

plot(jitter(il_tract$percent_farm_tractor), jitter(ilMedian$med), xlab = "% 1930", ylab ="Median years education", main = "Illinois- Median")

ilMedian$logMed <- log(ilMedian$med)

plot(jitter(il_tract$percent_farm_tractor), jitter(ilMedian$logMed), xlab = "% 1930", ylab ="Median years education", main = "Illinois - LogMedian")

Illinois - Median Model

plotSrc_med_il <- merge(ilMedian, il_tract, by = "county")
ilEduMod_med <- lm(logMed ~ percent_farm_tractor, data = plotSrc_med_il)
summary(ilEduMod_med)

##
## Call:
## lm(formula = logMed ~ percent_farm_tractor, data = plotSrc_med_il)
##
## Residuals:
##       Min        1Q    Median        3Q       Max
## -0.249730 -0.045203  0.009157  0.059213  0.177344
##
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)
## (Intercept)           2.18273    0.01665 131.079  < 2e-16 ***
## percent_farm_tractor  0.40905    0.04885   8.373 3.56e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.07732 on 100 degrees of freedom
## Multiple R-squared:  0.4121,	Adjusted R-squared:  0.4063
## F-statistic: 70.11 on 1 and 100 DF,  p-value: 3.56e-13

il_med_df <- as.data.frame(resid(ilEduMod_med))
il_med_df$county <- plotSrc_med_il$name
colnames(il_med_df) <- c( "residual", "county")

Illinois Median Model County Residuals

il_med_df

##          residual      county
## 1    0.1159811021       ADAMS
## 2   -0.0219127354   ALEXANDER
## 3    0.0624933542        BOND
## 4   -0.0654303756       BOONE
## 5   -0.0574486943       BROWN
## 6    0.0242888950      BUREAU
## 7   -0.1739178024     CALHOUN
## 8    0.0087382195     CARROLL
## 9    0.0886314774        CASS
## 10   0.0685975463   CHAMPAIGN
## 11   0.0418905544   CHRISTIAN
## 12   0.0611637588       CLARK
## 13   0.0786393457        CLAY
## 14  -0.1803182776     CLINTON
## 15  -0.0684271244       COLES
## 16   0.0623256695        COOK
## 17   0.0456704740    CRAWFORD
## 18  -0.0679222565  CUMBERLAND
## 19   0.0003174398     DE KALB
## 20   0.0369388999     DE WITT
## 21  -0.0121108973     DOUGLAS
## 22   0.0011764439     DU PAGE
## 23  -0.0412973841       EDGAR
## 24   0.1528184335     EDWARDS
## 25   0.0628052418   EFFINGHAM
## 26  -0.0263809189     FAYETTE
## 27  -0.0386041590        FORD
## 28   0.0898362723    FRANKLIN
## 29   0.0884042556      FULTON
## 30  -0.1466355434    GALLATIN
## 31   0.0047574633      GREENE
## 32  -0.0950695919      GRUNDY
## 33  -0.1158571443    HAMILTON
## 34   0.1001316688     HANCOCK
## 35  -0.1132479844      HARDIN
## 36  -0.0219548279   HENDERSON
## 37   0.0420705665       HENRY
## 38  -0.0425400636    IROQUOIS
## 39   0.0372132464     JACKSON
## 40  -0.0460045118      JASPER
## 41  -0.0043966604   JEFFERSON
## 42  -0.0803185571      JERSEY
## 43   0.0003591042  JO DAVIESS
## 44  -0.0141440802     JOHNSON
## 45  -0.0135144444        KANE
## 46  -0.0472771443    KANKAKEE
## 47  -0.0259537624     KENDALL
## 48   0.0643324148        KNOX
## 49   0.0383327769        LAKE
## 50  -0.0812555084    LA SALLE
## 51   0.0350188042    LAWRENCE
## 52  -0.0578001067         LEE
## 53  -0.1113619253  LIVINGSTON
## 54  -0.0888060358       LOGAN
## 55   0.0665464003   MCDONOUGH
## 56   0.0518321475     MCHENRY
## 57   0.0181006265      MCLEAN
## 58   0.0459426482       MACON
## 59   0.0181559531    MACOUPIN
## 60   0.1161359591     MADISON
## 61   0.1773444851      MARION
## 62   0.0183762184    MARSHALL
## 63  -0.0209660543       MASON
## 64  -0.1304346238      MASSAC
## 65  -0.0118754277      MENARD
## 66   0.0700801133      MERCER
## 67  -0.2497303150      MONROE
## 68   0.0308361603  MONTGOMERY
## 69  -0.0019366361      MORGAN
## 70  -0.0427986618    MOULTRIE
## 71   0.0526453814        OGLE
## 72   0.0796242229      PEORIA
## 73  -0.0378438141       PERRY
## 74  -0.0068854448       PIATT
## 75   0.0363154089        PIKE
## 76  -0.1222652841        POPE
## 77  -0.0226174880     PULASKI
## 78  -0.0952511555      PUTNAM
## 79  -0.0634426356    RANDOLPH
## 80   0.0682259564    RICHLAND
## 81   0.0668327657 ROCK ISLAND
## 82   0.0123034269    ST CLAIR
## 83   0.0920240633      SALINE
## 84   0.0860022708    SANGAMON
## 85   0.0215368204    SCHUYLER
## 86  -0.0224760426       SCOTT
## 87   0.0249810386      SHELBY
## 88   0.0324385486       STARK
## 89   0.0598519133  STEPHENSON
## 90   0.0399708018    TAZEWELL
## 91  -0.0599806613       UNION
## 92   0.0572966517   VERMILION
## 93   0.1091509767      WABASH
## 94   0.0406126587      WARREN
## 95  -0.1551419151  WASHINGTON
## 96  -0.1299853400       WAYNE
## 97   0.0551008245       WHITE
## 98  -0.0288185389   WHITESIDE
## 99  -0.0702678619        WILL
## 100  0.0871343160  WILLIAMSON
## 101  0.0747176896   WINNEBAGO
## 102  0.0095765416    WOODFORD

IL EDU VS % SSE

#Calculate SSE
pred_ilEdu <- predict(ilEduMod, ilPlotSrc)
#Undo log transformation
pred_ilEdu <- exp(pred_ilEdu)-1

mean((ilPlotSrc$avg - pred_ilEdu)^2)

## [1] 1.239245

VAM 1958 vs % 1950

il1950 <- subset(df50,stateAbb == "IL" )
vam_il1957 <- subset(vam, year == 1958 & state == "Illinois")
ilVam_mod <- lm(vam_il1957$VAM ~ il1950$percent_farm_tractor)
summary(ilVam_mod)

##
## Call:
## lm(formula = vam_il1957$VAM ~ il1950$percent_farm_tractor)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -360.14 -182.35  -65.27  114.67  678.66
##
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)
## (Intercept)                    -37.6      133.4  -0.282   0.7787
## il1950$percent_farm_tractor    455.0      186.6   2.438   0.0169 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 254.5 on 82 degrees of freedom
##   (18 observations deleted due to missingness)
## Multiple R-squared:  0.06759,	Adjusted R-squared:  0.05622
## F-statistic: 5.944 on 1 and 82 DF,  p-value: 0.01693

plot(il1950$percent_farm_tractor, vam_il1957$VAM, main = "Illinois - 1957 VAM vs 1950 %")
abline(ilVam_mod, col = "red")

VAM 1947 vs % 1930

il1930 <- subset(df30, stateAbb == "IL")
ilVam_mod1947 <- lm(il_vam$VAM ~ il1930$percent_farm_tractor)
summary(ilVam_mod1947)

##
## Call:
## lm(formula = il_vam$VAM ~ il1930$percent_farm_tractor)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -360.67 -162.22  -84.38  119.71  713.75
##
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)
## (Intercept)                    90.09      52.08   1.730  0.08673 .
## il1930$percent_farm_tractor   436.14     152.79   2.855  0.00524 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 241.8 on 100 degrees of freedom
## Multiple R-squared:  0.07535,	Adjusted R-squared:  0.0661
## F-statistic: 8.149 on 1 and 100 DF,  p-value: 0.00524

plot(il1930$percent_farm_tractor, il_vam$VAM, main = "Illinois - 1947 VAM vs 1930 %")
abline(ilVam_mod1947, col = "red")

IOWA

#Iowa Edu
ia_edu <- eduDataFunc("Iowa")

#Iowa Tractor
ia_tract <- tractDataFunc("IA")

plot(ia_tract$percent_farm_tractor, ia_edu$avg, xlab = "1930 % Tractor", ylab = "1940 Avg. Edu", main = "Iowa")

iaPlotSrc <- merge(ia_edu, ia_tract, by = "county")

#LOESS REGRESSION on EDU vs % Tractor

ggplot(iaPlotSrc,aes(x = percent_farm_tractor, y = avg)) + geom_point()  + geom_smooth() + xlab("1930's % Tractor") + ylab("1940 Avg. Edu") + ggtitle("Iowa")

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

#Transform response on Log Scale
iaPlotSrc$logAvg <- log(iaPlotSrc$avg)

plot(iaPlotSrc$percent_farm_tractor, iaPlotSrc$logAvg, main = "Log-Linear")

iaEduMod <- lm(logAvg ~ percent_farm_tractor, data = iaPlotSrc)
summary(iaEduMod)

##
## Call:
## lm(formula = logAvg ~ percent_farm_tractor, data = iaPlotSrc)
##
## Residuals:
##      Min       1Q   Median       3Q      Max
## -0.07421 -0.02169  0.00363  0.02606  0.07078
##
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)
## (Intercept)           2.30712    0.01167 197.644   <2e-16 ***
## percent_farm_tractor  0.09378    0.03806   2.464   0.0155 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03359 on 97 degrees of freedom
## Multiple R-squared:  0.05892,	Adjusted R-squared:  0.04922
## F-statistic: 6.073 on 1 and 97 DF,  p-value: 0.01548

#Residuals

iaPlotSrc$modResid <- resid(iaEduMod)
iaPlotSrc$fitted <- predict(iaEduMod)
ggplot(iaPlotSrc, aes(x = fitted, y = modResid)) + geom_point() + geom_hline(yintercept = 0, colour = "red") + ggtitle(label = "IA Edu vs Tractor Residuals")

#SCATTER DIAGRAM AND REGRESSION FOR Indiana VAM VS EDU

ia_vam <- subset(vam, state == "Iowa" & year == 1947)
rownames(ia_vam) <- 1:nrow(ia_vam)
ia_vam[is.na(ia_vam)] <- 0
plot(ia_tract$percent_farm_tractor, ia_vam$VAM, xlab = "% 1940", ylab = "VAM 1947", ylim = c(0,1000), main = "Iowa VAM vs % Tractor")
abline(lm(ia_vam$VAM ~ ia_tract$percent_farm_tractor))

IA EDU VS % Sum of Squares Error

#Calculate SSE
pred_iaEdu <- predict(iaEduMod, iaPlotSrc)
#Undo log transformation
pred_iaEdu <- exp(pred_iaEdu)-1

mean((iaPlotSrc$avg - pred_iaEdu)^2)

## [1] 1.128052

VAM 1958 vs % 1950

ia1950 <- subset(df50,stateAbb == "IA" )
vam_ia1957 <- subset(vam, year == 1958 & state == "Iowa")
iaVam_mod <- lm(vam_ia1957$VAM ~ ia1950$percent_farm_tractor)
summary(iaVam_mod)

##
## Call:
## lm(formula = vam_ia1957$VAM ~ ia1950$percent_farm_tractor)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -186.93 -116.61  -54.65   28.86  770.95
##
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)
## (Intercept)                   -132.8      197.6  -0.672    0.503
## ia1950$percent_farm_tractor    384.9      249.0   1.546    0.126
##
## Residual standard error: 190 on 86 degrees of freedom
##   (11 observations deleted due to missingness)
## Multiple R-squared:  0.02703,	Adjusted R-squared:  0.01572
## F-statistic: 2.389 on 1 and 86 DF,  p-value: 0.1258

plot(ia1950$percent_farm_tractor, vam_ia1957$VAM, main = "Iowa - 1957 VAM vs 1950 %")
abline(iaVam_mod, col = "red")

VAM 1947 vs % 1930

ia1930 <- subset(df30, stateAbb == "IA")
iaVam_mod1947 <- lm(ia_vam$VAM ~ ia1930$percent_farm_tractor)
summary(iaVam_mod1947)

##
## Call:
## lm(formula = ia_vam$VAM ~ ia1930$percent_farm_tractor)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -132.30 -105.02  -80.41   -4.50  797.25
##
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)
## (Intercept)                    97.02      66.75   1.454    0.149
## ia1930$percent_farm_tractor    98.86     217.61   0.454    0.651
##
## Residual standard error: 192.1 on 97 degrees of freedom
## Multiple R-squared:  0.002123,	Adjusted R-squared:  -0.008164
## F-statistic: 0.2064 on 1 and 97 DF,  p-value: 0.6506

plot(ia1930$percent_farm_tractor, ia_vam$VAM, main = "Iowa - 1947 VAM vs 1930 %")
abline(iaVam_mod1947, col = "red")

#Iowa - Median

iaMedian <- eduMedianFunc("Iowa")

plot(jitter(ia_tract$percent_farm_tractor), jitter(iaMedian$med), xlab = "% 1930", ylab ="Median years education", main = "Iowa - Median")

iaMedian$logMed <- log(iaMedian$med)

plot(jitter(ia_tract$percent_farm_tractor), jitter(iaMedian$logMed), xlab = "% 1930", ylab ="Median years education", main = "Iowa - LogMedian")

Iowa - Median Model

plotSrc_med_ia <- merge(iaMedian, ia_tract, by = "county")
iaEduMod_med <- lm(logMed ~ percent_farm_tractor, data = plotSrc_med_ia)
summary(iaEduMod_med)

##
## Call:
## lm(formula = logMed ~ percent_farm_tractor, data = plotSrc_med_ia)
##
## Residuals:
##      Min       1Q   Median       3Q      Max
## -0.17834 -0.02710  0.01407  0.02660  0.12513
##
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)
## (Intercept)           2.33110    0.02033 114.680   <2e-16 ***
## percent_farm_tractor  0.15680    0.06627   2.366     0.02 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.05849 on 97 degrees of freedom
## Multiple R-squared:  0.05457,	Adjusted R-squared:  0.04482
## F-statistic: 5.599 on 1 and 97 DF,  p-value: 0.01996

ia_med_df <- as.data.frame(resid(iaEduMod_med))
ia_med_df$county <- plotSrc_med_ia$name
colnames(ia_med_df) <- c( "residual", "county")

Iowa Median Model County Residuals

ia_med_df

##         residual        county
## 1  -0.0698771308         ADAIR
## 2  -0.0626695952         ADAMS
## 3  -0.0577524624     ALLAMAKEE
## 4  -0.0425195858     APPANOOSE
## 5  -0.0703218871       AUDUBON
## 6   0.0135565295        BENTON
## 7   0.0168034522    BLACK HAWK
## 8   0.0183828271         BOONE
## 9   0.0166698887        BREMER
## 10  0.0393247618      BUCHANAN
## 11  0.0876183606   BUENA VISTA
## 12 -0.0639751372        BUTLER
## 13  0.0068539027       CALHOUN
## 14 -0.0807633513       CARROLL
## 15  0.0240589330          CASS
## 16  0.0178388902         CEDAR
## 17  0.0205481719   CERRO GORDO
## 18  0.0012142122      CHEROKEE
## 19 -0.0586279740     CHICKASAW
## 20  0.0326358223        CLARKE
## 21  0.0875095949          CLAY
## 22 -0.0779393929       CLAYTON
## 23  0.0130968125       CLINTON
## 24 -0.0896821411      CRAWFORD
## 25  0.0067905428        DALLAS
## 26 -0.0489229649         DAVIS
## 27  0.0487135367       DECATUR
## 28 -0.0626068575      DELAWARE
## 29  0.0167590460    DES MOINES
## 30  0.0142671137     DICKINSON
## 31 -0.0653086956       DUBUQUE
## 32  0.0084627390         EMMET
## 33  0.0379903618       FAYETTE
## 34  0.0193618671         FLOYD
## 35  0.0102304642      FRANKLIN
## 36  0.0313653750       FREMONT
## 37  0.0089974723        GREENE
## 38  0.0144647967        GRUNDY
## 39  0.0279032762       GUTHRIE
## 40  0.0021000562      HAMILTON
## 41 -0.0890211419       HANCOCK
## 42  0.0154828035        HARDIN
## 43  0.0202798136      HARRISON
## 44  0.0247915724         HENRY
## 45 -0.0648661216        HOWARD
## 46  0.0007743202      HUMBOLDT
## 47  0.0078110812           IDA
## 48  0.0084270726          IOWA
## 49 -0.1111447334       JACKSON
## 50  0.0226355695        JASPER
## 51  0.0375409642     JEFFERSON
## 52  0.0178091783       JOHNSON
## 53 -0.0669817399         JONES
## 54  0.0338417038        KEOKUK
## 55 -0.0001827888       KOSSUTH
## 56  0.0275708975           LEE
## 57  0.1251309801          LINN
## 58  0.0104525307        LOUISA
## 59  0.0466248762         LUCAS
## 60 -0.1783359639 LYON/BUNCOMBE
## 61  0.0256228815       MADISON
## 62 -0.0621280035       MAHASKA
## 63 -0.0591260627        MARION
## 64  0.0109899381      MARSHALL
## 65  0.0186983518         MILLS
## 66  0.0217683251      MITCHELL
## 67  0.0002400920        MONONA
## 68  0.0518968927        MONROE
## 69  0.1055478112    MONTGOMERY
## 70 -0.0777806925     MUSCATINE
## 71  0.0042294249       O BRIEN
## 72 -0.0844676618       OSCEOLA
## 73  0.0227912237          PAGE
## 74  0.0126635703     PALO ALTO
## 75 -0.0917707364      PLYMOUTH
## 76 -0.0116855854    POCAHONTAS
## 77  0.1114975678          POLK
## 78  0.0099677272 POTTAWATTAMIE
## 79  0.0967193630     POWESHIEK
## 80  0.0373701459      RINGGOLD
## 81  0.0026975988           SAC
## 82  0.0035293703         SCOTT
## 83  0.0146905446        SHELBY
## 84 -0.1742005144         SIOUX
## 85  0.1005775055         STORY
## 86  0.0161494265          TAMA
## 87  0.0400091263        TAYLOR
## 88  0.0384390773         UNION
## 89  0.0414439818     VAN BUREN
## 90  0.0353212688       WAPELLO
## 91  0.1232005205        WARREN
## 92  0.0191529246    WASHINGTON
## 93  0.0420458200         WAYNE
## 94 -0.0061101406       WEBSTER
## 95  0.0212120364     WINNEBAGO
## 96 -0.1694246938    WINNESHIEK
## 97  0.0140698129      WOODBURY
## 98  0.0144878191         WORTH
## 99 -0.0015285639        WRIGHT

VAM REGRESSION

Regressing VAM on Logistic Params for years 1954 and 1958

params <- read.csv("Tractor_Raw_Data/TractorCoef.csv")
vam <- read.csv("vamFipsData.csv")
params[,1] <- NULL
vam[,1] <- NULL

Merge params DF and vam DF

INFO:

The dataset i’m using matches each county and year to their corresponding logistic params.(Slope, Ceiling, Mid) Any counties with a Negative slope have been removed Any counties with a ceiling > 1 have been removed. Any counties with a midpoint before 1900 or after 1980 have been removed Any counties with VAM as NA have been removed Any counties with VAM as 0 have been removed

#remove high ceilings
df <- df[df$Ceiling <= 1, ]
df <- df[df$Mid > 1940,]
df <- df[df$Mid < 1960,]
df <- df[!is.na(df$VAM),]
#Remove response variables that are 0
df <- df[which(df$VAM > 0),]
write.csv(df, "filtered_df.csv")

The next step necessary for fitting a logistic is to create a binary response

I’ve set the response as a binary value(1 = “Success”, 0 = “Failure”), this value was determined by wether a county had VAM of > 250 or not. This is arbritary and can be played with to see if it changes the result. I chose 250 b/c the range of VAM is (1,999) however majority of these values are below 500 so it seemed like a good starting point.

df$response <- ifelse(df$VAM >= 250, 1, 0)

Before subsetting the data and get into modeling, I want to remove any N/A’s.

unique(is.na(df))

##     fips Slope Ceiling   Mid County.x State   VAM  year response
## 12 FALSE FALSE   FALSE FALSE    FALSE FALSE FALSE FALSE    FALSE

As seen above, none of the columns contain any N/A values

Below is a glimpse at the current dataset that will be used in the modeling. Next step is to reduce the observations to only rows that correspond to the years 1954 or 1958

head(df)

##    fips     Slope   Ceiling      Mid       County.x State VAM year response
## 12 1009 0.2020403 0.9049963 1956.463         BLOUNT    AL  15 1929        0
## 13 1009 0.2020403 0.9049963 1956.463         BLOUNT    AL 140 1958        0
## 14 1009 0.2020403 0.9049963 1956.463         BLOUNT    AL  38 1947        0
## 21 1015 0.1737442 0.9187005 1953.665 CALHOUN/BENTON    AL 216 1929        0
## 22 1015 0.1737442 0.9187005 1953.665 CALHOUN/BENTON    AL 751 1958        1
## 23 1015 0.1737442 0.9187005 1953.665 CALHOUN/BENTON    AL 797 1954        1

df.54 <- subset(df, year == 1954)
rownames(df.54) <- 1:nrow(df.54)
df.58 <- subset(df, year == 1958)
rownames(df.58) <- 1:nrow(df.58)

Glimpse into 1954 dataset

head(df.54)

##   fips     Slope   Ceiling      Mid       County.x State VAM year response
## 1 1015 0.1737442 0.9187005 1953.665 CALHOUN/BENTON    AL 797 1954        1
## 2 1043 0.2265686 0.8037223 1954.990        CULLMAN    AL  76 1954        0
## 3 1045 0.1738427 0.9419194 1956.008           DALE    AL 128 1954        0
## 4 1049 0.1862319 0.8949618 1956.816        DE KALB    AL 140 1954        0
## 5 1053 0.1624123 0.9736148 1957.202       ESCAMBIA    AL 298 1954        1
## 6 1061 0.2448450 0.8566386 1953.288         GENEVA    AL 244 1954        0

Glimpse into 1958 dataset

head(df.58)

##   fips     Slope   Ceiling      Mid       County.x State VAM year response
## 1 1009 0.2020403 0.9049963 1956.463         BLOUNT    AL 140 1958        0
## 2 1015 0.1737442 0.9187005 1953.665 CALHOUN/BENTON    AL 751 1958        1
## 3 1019 0.1804967 0.8845261 1949.885       CHEROKEE    AL  86 1958        0
## 4 1043 0.2265686 0.8037223 1954.990        CULLMAN    AL 109 1958        0
## 5 1045 0.1738427 0.9419194 1956.008           DALE    AL 289 1958        1
## 6 1049 0.1862319 0.8949618 1956.816        DE KALB    AL 188 1958        0

#1954 Training and Test Set

smp_size <- floor(.7 * nrow(df.54))
set.seed(123)
train_ind <- sample(seq_len(nrow(df.54)), size = smp_size)

df.54.train <- df.54[train_ind,]

df.54.test <- df.54[-train_ind,]

1954 Model w/o Transformations and Plots

Below are some exploratory plots containing the predictor features for year 1954.

The main one that I want to point out is [VAM vs. Slope], it seems counter-intuitive to me that as the slope increase for the county, the VAM is decreasing.

Maybe you have some other insight into this??

## `geom_smooth()` using formula 'y ~ x'

##        fips       Slope     Ceiling         Mid    County.x       State
##   "integer"   "numeric"   "numeric"   "numeric" "character" "character"
##         VAM        year    response
##   "numeric"   "integer"   "numeric"

54 Model Summary

Below is the actual modeling for year 1954 on training set

model <- glm(response ~ Slope + Ceiling + Mid,  data = df.54.train, family=binomial(link = "logit"))
summary(model)

##
## Call:
## glm(formula = response ~ Slope + Ceiling + Mid, family = binomial(link = "logit"),
##     data = df.54.train)
##
## Deviance Residuals:
##     Min       1Q   Median       3Q      Max
## -1.2309  -0.9268  -0.7328   1.2693   2.1178
##
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept) 43.62342   35.72351   1.221    0.222
## Slope       -8.05847    1.99785  -4.034 5.49e-05 ***
## Ceiling      1.69938    1.28813   1.319    0.187
## Mid         -0.02283    0.01835  -1.244    0.213
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
##     Null deviance: 796.35  on 629  degrees of freedom
## Residual deviance: 760.67  on 626  degrees of freedom
## AIC: 768.67
##
## Number of Fisher Scoring iterations: 4

predicted.54 <- plogis(predict(model, df.54.test))

Looking at the model summary above ^, this tells me that as both slope and mid increase independent of one another, VAM will decrease. However, as the Ceiling parameter increases, VAM will increase with. I’m not sure how to interpret this, maybe to be discussed??

## [1] 0.2593

This model is able to predict Success counties 30% of the time, doesn’t seem very good but at the same time we dont have much to work with

#Sensitivity(Truth Detection Rate)
sensitivity(df.54.test$response, predicted.54, threshold = optCutOff)

## [1] 0.1666667

confusionMatrix(df.54.test$response, predicted.54, threshold = optCutOff)

##     0  1
## 0 187 65
## 1   5 13

1954 Model w/ Transformations

Exploratory plots for predictor features log-transformed

trans54.df.train <- df.54.train
trans54.df.test <- df.54.test
trans54.df.train$Ceiling <- log(trans54.df.train$Ceiling)
trans54.df.test$Ceiling <- log(trans54.df.test$Ceiling)
trans54.df.train$Slope <- log(trans54.df.train$Slope)
trans54.df.test$Slope <- log(trans54.df.test$Slope)


#Plots for Transformed Predictors in Year 1954
plot(trans54.df.train$Slope, trans54.df.train$VAM, main = "1954", xlab = "Slope")

plot(trans54.df.train$Ceiling, trans54.df.train$VAM, main = "1954", xlab = "Ceiling")

plot(trans54.df.train$Mid, trans54.df.train$VAM, main = "1954", xlab = "Mid")

Modeling for transformed predicgtors for year 1954

We see below, same general results, VAM decreases with SLope and Mid but increases with Ceiling

model.trans <- glm(response ~ Slope + Ceiling + Mid,  data = trans54.df.train, family=binomial(link="logit"))
summary(model.trans)

##
## Call:
## glm(formula = response ~ Slope + Ceiling + Mid, family = binomial(link = "logit"),
##     data = trans54.df.train)
##
## Deviance Residuals:
##     Min       1Q   Median       3Q      Max
## -1.3516  -0.9132  -0.7254   1.2635   2.0038
##
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept) 37.96754   36.14104   1.051    0.293
## Slope       -1.45589    0.34681  -4.198 2.69e-05 ***
## Ceiling      1.54351    1.07370   1.438    0.151
## Mid         -0.02112    0.01848  -1.142    0.253
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
##     Null deviance: 796.35  on 629  degrees of freedom
## Residual deviance: 760.55  on 626  degrees of freedom
## AIC: 768.55
##
## Number of Fisher Scoring iterations: 4

Evaluating this transformed model’s goodness of fit below

predicted.54.trans <- plogis(predict(model.trans, trans54.df.test))

#Find Optimal Prediction cutoff
optCutOff <- optimalCutoff(trans54.df.test$response, predicted.54.trans)

#Misclassification Error
misClassError(trans54.df.test$response, predicted.54.trans, threshold = optCutOff)

## [1] 0.2704

#ROC Curve
plotROC(trans54.df.test, predicted.54.trans)

#Sensitivity(Truth Detection Rate)
sensitivity(trans54.df.test$response, predicted.54.trans, threshold = optCutOff)

## [1] 0.1025641

The transformed model above is able to predict Success counties 21% of the time so this is worse than the non-transformed model

1958

Model for year 1958 below

#1958 Training and Test Set

smp_size <- floor(.7 * nrow(df.58))
set.seed(123)
train_ind <- sample(seq_len(nrow(df.58)), size = smp_size)

df.58.train <- df.58[train_ind,]

df.58.test <- df.58[-train_ind,]

Exploratory plots for year 1958

## `geom_smooth()` using formula 'y ~ x'

1958 Modeling

model <- glm(response ~ Slope + Ceiling + Mid,  data = df.58.train, family="binomial")
summary(model)

##
## Call:
## glm(formula = response ~ Slope + Ceiling + Mid, family = "binomial",
##     data = df.58.train)
##
## Deviance Residuals:
##     Min       1Q   Median       3Q      Max
## -1.2986  -1.0471  -0.8424   1.2386   1.9531
##
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept) 27.75953   35.17439   0.789  0.43000
## Slope       -5.27971    1.69721  -3.111  0.00187 **
## Ceiling      1.53451    1.21264   1.265  0.20572
## Mid         -0.01466    0.01807  -0.811  0.41735
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
##     Null deviance: 795.93  on 587  degrees of freedom
## Residual deviance: 774.04  on 584  degrees of freedom
## AIC: 782.04
##
## Number of Fisher Scoring iterations: 4

predicted.58 <- plogis(predict(model, df.58.test))

#Find Optimal Prediction cutoff
optCutOff <- optimalCutoff(df.58.test$response, predicted.58)

#Misclassification Error
misClassError(df.58.test$response, predicted.58, threshold = optCutOff)

## [1] 0.3597

#ROC Curve
plotROC(df.58.test, predicted.58)

#Sensitivity(Truth Detection Rate)
sensitivity(df.58.test$response, predicted.58, threshold = optCutOff)

## [1] 0.3396226

Interestingly enough thoughm the 1958 model is able to predict Success counties 47% of the time, this is by far the best.

MidWestern States training and test

#Oh, MI, MN, IL, IN, WI
midWest <- subset(df, State == "OH" | State == "MI" | State == "MN" | State == "IL" | State == "IN" | State == "WI")
smp_size <- floor(.7 * nrow(midWest))
set.seed(123)
train_ind <- sample(seq_len(nrow(midWest)), size = smp_size)

midWest.train <- midWest[train_ind,]

midWest.test <- midWest[-train_ind,]

Exploring only midwestern states

#Plots for Transformed Predictors in Year 1954
plot(midWest$Slope, midWest$VAM)

plot(midWest$Ceiling, midWest$VAM)

plot(midWest$Mid, midWest$VAM)

#Response on X-Axis
plot(midWest$VAM, midWest$Ceiling)

model_midWest <- glm(response ~ Slope + Ceiling + Mid,  data = midWest.train, family=binomial(link = "logit"))
summary(model_midWest)

##
## Call:
## glm(formula = response ~ Slope + Ceiling + Mid, family = binomial(link = "logit"),
##     data = midWest.train)
##
## Deviance Residuals:
##     Min       1Q   Median       3Q      Max
## -1.1438  -0.9978  -0.8943   1.3097   1.6475
##
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept) 144.32865   58.88672   2.451   0.0142 *
## Slope        -1.01061    2.66422  -0.379   0.7044
## Ceiling      -0.53062    2.01071  -0.264   0.7919
## Mid          -0.07411    0.03032  -2.444   0.0145 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
##     Null deviance: 833.46  on 626  degrees of freedom
## Residual deviance: 824.64  on 623  degrees of freedom
## AIC: 832.64
##
## Number of Fisher Scoring iterations: 4

predicted.midWest <- plogis(predict(model_midWest, midWest.test))

#Find Optimal Prediction cutoff
optCutOff <- optimalCutoff(midWest.test$response, predicted.midWest)

#Misclassification Error
misClassError(midWest.test$response, predicted.midWest, threshold = optCutOff)

## [1] 0.3815

#ROC Curve
plotROC(midWest.test, predicted.midWest)

#Sensitivity(Truth Detection Rate)
sensitivity(midWest.test$response, predicted.midWest, threshold = optCutOff)

## [1] 0.1724138

Midwest model is no good, 12% sucess in prediction, however there aren’t many observations so this was to be expected.

Tractor Diffusion Region

## Parsed with column specification:
## cols(
##   X1 = col_double(),
##   stateAbb = col_character(),
##   percent = col_double(),
##   year = col_double()
## )

Visualize diffusion by the 3 regions

————————————————————————————————————————————————————————————————- Back To Top

Ohio Detailed Analysis

## Parsed with column specification:
## cols(
##   X1 = col_double(),
##   X = col_double(),
##   year = col_double(),
##   state = col_double(),
##   county = col_double(),
##   name = col_character(),
##   percent_farm_tractor = col_double(),
##   stateAbb = col_character(),
##   fips = col_double(),
##   colorID = col_double()
## )
## Parsed with column specification:
## cols(
##   X1 = col_double(),
##   X = col_double(),
##   year = col_double(),
##   state = col_double(),
##   county = col_double(),
##   name = col_character(),
##   percent_farm_tractor = col_double(),
##   stateAbb = col_character(),
##   fips = col_double(),
##   colorID = col_double()
## )

Exploratory Plots of Average Education by County Vs. % of Farms with Tractors

Ohio 1940 Average Education Vs. Ohio 1930 % of Farms with Tractor

Linear Model 1940 Avg. Edu vs 1930 % Farm w/ Tractor

ohEduModLinear <- lm(avg ~ percent_farm_tractor, data = df_40Edu30Tractor)
plot(df_40Edu30Tractor$percent_farm_tractor, df_40Edu30Tractor$avg, main = "Linear")
abline(ohEduModLinear, col = "red")

summary(ohEduModLinear)

##
## Call:
## lm(formula = avg ~ percent_farm_tractor, data = df_40Edu30Tractor)
##
## Residuals:
##      Min       1Q   Median       3Q      Max
## -1.14966 -0.22211  0.06822  0.28365  0.67705
##
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)
## (Intercept)            9.7760     0.0882 110.833  < 2e-16 ***
## percent_farm_tractor   2.0083     0.3401   5.906 7.03e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.383 on 85 degrees of freedom
## Multiple R-squared:  0.2909,	Adjusted R-squared:  0.2826
## F-statistic: 34.88 on 1 and 85 DF,  p-value: 7.025e-08

Log-Linear Model 1940 Avg. Edu vs 1930 % Farm w/ Tractor

df_40Edu30Tractor$logAvg <- log(df_40Edu30Tractor$avg)
ohEduModLogLinear <- lm(logAvg ~ percent_farm_tractor, data = df_40Edu30Tractor)
plot(df_40Edu30Tractor$percent_farm_tractor, df_40Edu30Tractor$logAvg, main = "Log-Linear")
abline(ohEduModLogLinear, col = "red")

summary(ohEduModLogLinear)

##
## Call:
## lm(formula = logAvg ~ percent_farm_tractor, data = df_40Edu30Tractor)
##
## Residuals:
##       Min        1Q    Median        3Q       Max
## -0.121302 -0.021393  0.007532  0.027990  0.067698
##
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)
## (Intercept)          2.279001   0.008841  257.76  < 2e-16 ***
## percent_farm_tractor 0.200427   0.034087    5.88 7.85e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03839 on 85 degrees of freedom
## Multiple R-squared:  0.2891,	Adjusted R-squared:  0.2808
## F-statistic: 34.57 on 1 and 85 DF,  p-value: 7.848e-08

Linear-Log Model 1940 Avg. Edu vs 1930 % Farm w/ Tractor

df_40Edu30Tractor$logPercent <- log(df_40Edu30Tractor$percent_farm_tractor)
ohEduModLinearLog <- lm(avg ~ logPercent, data = df_40Edu30Tractor)
plot(df_40Edu30Tractor$logPercent, df_40Edu30Tractor$avg, main = "Linear-Log")
abline(ohEduModLinearLog, col = "red")

summary(ohEduModLinearLog)

##
## Call:
## lm(formula = avg ~ logPercent, data = df_40Edu30Tractor)
##
## Residuals:
##      Min       1Q   Median       3Q      Max
## -1.19632 -0.21516  0.05499  0.27234  0.98138
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.80745    0.09983 108.255  < 2e-16 ***
## logPercent   0.33845    0.05417   6.248 1.59e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3766 on 85 degrees of freedom
## Multiple R-squared:  0.3147,	Adjusted R-squared:  0.3067
## F-statistic: 39.04 on 1 and 85 DF,  p-value: 1.589e-08

Linear Model VAM 1947 vs Percent 1930

vam47Tractor30Mod <- lm(VAM ~ percent_farm_tractor, data = vam1947Tractor30)
plot(vam1947Tractor30$percent_farm_tractor, vam1947Tractor30$VAM, main = "Linear")
abline(vam47Tractor30Mod, col = "red")

summary(vam47Tractor30Mod)

##
## Call:
## lm(formula = VAM ~ percent_farm_tractor, data = vam1947Tractor30)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -462.10 -208.32  -45.39  235.30  556.35
##
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)
## (Intercept)            208.46      62.12   3.356 0.001226 **
## percent_farm_tractor   873.07     241.54   3.615 0.000531 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 267.9 on 78 degrees of freedom
## Multiple R-squared:  0.1435,	Adjusted R-squared:  0.1325
## F-statistic: 13.07 on 1 and 78 DF,  p-value: 0.000531

Residuals from VAM 47 vs. Percent 30

vam47Residuals <-data.frame(vam47Tractor30Mod$residuals, vam1947Tractor30$county, vam1947Tractor30)
vam47Residuals

##    vam47Tractor30Mod.residuals vam1947Tractor30.county  fips county       avg
## 1                  -214.994273                      10 39001     10  9.209961
## 2                   428.070892                      30 39003     30 10.623929
## 3                   472.063509                      50 39005     50 10.622720
## 4                    84.058441                      70 39007     70 10.609148
## 5                  -169.993062                      90 39009     90 10.169675
## 6                    61.007663                     110 39011    110 10.507224
## 7                   -58.614924                     130 39013    130 10.050313
## 8                  -206.340345                     150 39015    150  9.738122
## 9                   401.695552                     170 39017    170 10.196781
## 10                   -8.633830                     190 39019    190 10.186483
## 11                 -125.520252                     210 39021    210 10.054731
## 13                 -265.328863                     250 39025    250  9.655556
## 14                 -316.779148                     270 39027    270 10.287815
## 15                  266.818266                     290 39029    290 10.306936
## 16                  435.831153                     310 39031    310 10.322308
## 17                  384.628013                     330 39033    330 10.851064
## 19                 -328.143794                     370 39037    370 10.221900
## 20                  -55.972853                     390 39039    390 10.341207
## 21                 -148.513357                     410 39041    410 10.596730
## 22                  306.763656                     430 39043    430 10.685484
## 23                   82.250417                     450 39045    450 10.074779
## 24                 -395.688823                     470 39047    470  9.944751
## 25                   79.681637                     490 39049    490 10.758445
## 26                 -157.187923                     510 39051    510 10.567867
## 27                 -214.256282                     530 39053    530  9.078111
## 28                 -138.113593                     550 39055    550 10.072043
## 29                 -327.773284                     570 39057    570 10.698218
## 30                  -36.279397                     590 39059    590 10.159635
## 32                  -58.745962                     630 39063    630 10.913314
## 33                 -293.189494                     650 39065    650 10.281763
## 34                  -99.582105                     670 39067    670  9.872080
## 35                 -462.100049                     690 39069    690 10.158685
## 36                 -139.027462                     710 39071    710 10.189036
## 37                   41.890732                     730 39073    730  9.911968
## 38                 -241.680920                     750 39075    750  9.486194
## 39                 -155.190608                     770 39077    770 10.674342
## 40                   76.527608                     790 39079    790  9.540707
## 41                  234.802219                     810 39081    810 10.093175
## 42                  325.634939                     830 39083    830 10.511027
## 43                  280.482618                     850 39085    850 10.686385
## 44                  368.636885                     870 39087    870  9.411918
## 45                  117.350774                     890 39089    890 10.361612
## 46                 -274.716377                     910 39091    910 10.684086
## 47                  349.053411                     930 39093    930 10.763259
## 48                  518.069785                     950 39095    950 10.592837
## 49                 -370.307338                     970 39097    970  9.682366
## 50                  522.121661                     990 39099    990 10.567978
## 51                  203.560118                    1010 39101   1010 10.490716
## 52                    8.646107                    1030 39103   1030 10.699282
## 53                 -201.757571                    1050 39105   1050  9.727691
## 54                  -93.612800                    1070 39107   1070  9.997748
## 55                  246.516899                    1090 39109   1090 10.464348
## 56                 -238.855979                    1110 39111   1110  9.436264
## 58                 -187.791582                    1150 39115   1150 10.410557
## 59                 -343.259852                    1170 39117   1170 10.576296
## 60                  320.415170                    1190 39119   1190 10.002766
## 61                 -178.672979                    1210 39121   1210 10.497222
## 62                  -62.817490                    1230 39123   1230 10.310724
## 63                 -456.480284                    1250 39125   1250 10.157001
## 64                  -45.595543                    1270 39127   1270 10.021769
## 65                 -283.910975                    1310 39131   1310  8.820574
## 66                   76.702578                    1330 39133   1330 10.780847
## 67                 -442.686375                    1350 39135   1350 10.296046
## 68                 -381.604269                    1370 39137   1370  9.923810
## 70                  -91.948146                    1410 39141   1410  9.166793
## 71                  241.547837                    1430 39143   1430 10.595465
## 72                  169.538758                    1450 39145   1450  9.300681
## 73                  397.767524                    1470 39147   1470 10.475133
## 74                  236.782384                    1490 39149   1490 10.466939
## 75                  556.350539                    1510 39151   1510 10.589506
## 78                  259.489252                    1570 39157   1570 10.274903
## 79                 -215.824224                    1590 39159   1590 10.505435
## 80                 -126.027018                    1610 39161   1610 10.594572
## 81                   -3.748685                    1630 39163   1630  9.421053
## 82                   83.648255                    1650 39165   1650 10.028030
## 83                   77.822668                    1670 39167   1670 10.202782
## 84                  139.775195                    1690 39169   1690 10.387983
## 85                  -45.180167                    1710 39171   1710 10.532258
## 86                  103.929477                    1730 39173   1730 10.572193
## 87                 -297.484334                    1750 39175   1750 10.314286
##    year.x state.x       name percent_farm_tractor stateAbb colorID.x   logAvg
## 1    1930      39      ADAMS           0.05903146       OH         1 2.220286
## 2    1930      39      ALLEN           0.22503726       OH         4 2.363109
## 3    1930      39    ASHLAND           0.19412039       OH         3 2.362995
## 4    1930      39  ASHTABULA           0.22963305       OH         4 2.361717
## 5    1930      39     ATHENS           0.06819313       OH         2 2.319410
## 6    1930      39   AUGLAIZE           0.31674026       OH         5 2.352063
## 7    1930      39    BELMONT           0.07692308       OH         2 2.307604
## 8    1930      39      BROWN           0.08691702       OH         2 2.276048
## 9    1930      39     BUTLER           0.36978534       OH         6 2.322072
## 10   1930      39    CARROLL           0.08954394       OH         2 2.321062
## 11   1930      39  CHAMPAIGN           0.27038425       OH         4 2.308043
## 13   1930      39   CLERMONT           0.13500993       OH         2 2.267533
## 14   1930      39    CLINTON           0.31764146       OH         5 2.330960
## 15   1930      39 COLUMBIANA           0.14056832       OH         3 2.332817
## 16   1930      39  COSHOCTON           0.08442982       OH         2 2.334307
## 17   1930      39   CRAWFORD           0.29655823       OH         5 2.384263
## 19   1930      39      DARKE           0.29056995       OH         5 2.324533
## 20   1930      39   DEFIANCE           0.39689471       OH         6 2.336137
## 21   1930      39   DELAWARE           0.22341568       OH         4 2.360545
## 22   1930      39       ERIE           0.36512580       OH         6 2.368886
## 23   1930      39  FAIRFIELD           0.20536013       OH         4 2.310035
## 24   1930      39    FAYETTE           0.40344168       OH         6 2.297045
## 25   1930      39   FRANKLIN           0.30222372       OH         5 2.375691
## 26   1930      39     FULTON           0.32727273       OH         5 2.357818
## 27   1930      39     GALLIA           0.03069700       OH         1 2.205866
## 28   1930      39     GEAUGA           0.25159236       OH         4 2.309764
## 29   1930      39     GREENE           0.30389016       OH         5 2.370077
## 30   1930      39   GUERNSEY           0.05706727       OH         1 2.318423
## 32   1930      39    HANCOCK           0.38403563       OH         6 2.389984
## 33   1930      39     HARDIN           0.27458694       OH         5 2.330372
## 34   1930      39   HARRISON           0.08375778       OH         2 2.289711
## 35   1930      39      HENRY           0.44056493       OH         7 2.318329
## 36   1930      39   HIGHLAND           0.16788079       OH         3 2.321312
## 37   1930      39    HOCKING           0.06030856       OH         1 2.293743
## 38   1930      39     HOLMES           0.14228546       OH         3 2.249838
## 39   1930      39      HURON           0.35018345       OH         6 2.367843
## 40   1930      39    JACKSON           0.04468912       OH         1 2.255568
## 41   1930      39  JEFFERSON           0.12913288       OH         2 2.311859
## 42   1930      39       KNOX           0.13619677       OH         2 2.352425
## 43   1930      39       LAKE           0.31161473       OH         5 2.368970
## 44   1930      39   LAWRENCE           0.03998243       OH         1 2.241977
## 45   1930      39    LICKING           0.13881301       OH         3 2.338108
## 46   1930      39      LOGAN           0.23510204       OH         4 2.368755
## 47   1930      39     LORAIN           0.37853851       OH         6 2.376138
## 48   1930      39      LUCAS           0.30750605       OH         5 2.360178
## 49   1930      39    MADISON           0.34115409       OH         6 2.270306
## 50   1930      39   MAHONING           0.25361367       OH         4 2.357828
## 51   1930      39     MARION           0.37223123       OH         6 2.350491
## 52   1930      39     MEDINA           0.27935677       OH         5 2.370177
## 53   1930      39      MEIGS           0.04959729       OH         1 2.274977
## 54   1930      39     MERCER           0.30599711       OH         5 2.302360
## 55   1930      39      MIAMI           0.31272085       OH         5 2.347974
## 56   1930      39     MONROE           0.03482003       OH         1 2.244560
## 58   1930      39     MORGAN           0.03703704       OH         1 2.342820
## 59   1930      39     MORROW           0.15440238       OH         3 2.358615
## 60   1930      39  MUSKINGUM           0.08147014       OH         2 2.302862
## 61   1930      39      NOBLE           0.02201122       OH         1 2.351111
## 62   1930      39     OTTAWA           0.43565976       OH         7 2.333185
## 63   1930      39   PAULDING           0.43870968       OH         7 2.318163
## 64   1930      39      PERRY           0.10553544       OH         2 2.304760
## 65   1930      39       PIKE           0.09673367       OH         2 2.177087
## 66   1930      39    PORTAGE           0.26211135       OH         4 2.377771
## 67   1930      39     PREBLE           0.33586132       OH         5 2.331760
## 68   1930      39     PUTNAM           0.29911504       OH         5 2.294937
## 70   1930      39       ROSS           0.21131448       OH         4 2.215588
## 71   1930      39   SANDUSKY           0.36766376       OH         6 2.360426
## 72   1930      39     SCIOTO           0.12027158       OH         2 2.230088
## 73   1930      39     SENECA           0.38230211       OH         6 2.349004
## 74   1930      39     SHELBY           0.37197665       OH         6 2.348222
## 75   1930      39      STARK           0.25793304       OH         4 2.359863
## 78   1930      39 TUSCARAWAS           0.13636364       OH         3 2.329704
## 79   1930      39      UNION           0.24324324       OH         4 2.351893
## 80   1930      39   VAN WERT           0.29730849       OH         5 2.360342
## 81   1930      39     VINTON           0.05302326       OH         1 2.242947
## 82   1930      39     WARREN           0.27248201       OH         5 2.305384
## 83   1930      39 WASHINGTON           0.05809575       OH         1 2.322660
## 84   1930      39      WAYNE           0.24255668       OH         4 2.340650
## 85   1930      39   WILLIAMS           0.29061161       OH         5 2.354443
## 86   1930      39       WOOD           0.47947411       OH         8 2.358227
## 87   1930      39    WYANDOT           0.27950617       OH         5 2.333530
##    logPercent   X.1     X        polyname RUCC            County VAM year.y
## 1  -2.8296848 10023 10023      ohio,adams    9      Adams County  45   1947
## 2  -1.4914893 10028 10028      ohio,allen    3      Allen County 833   1947
## 3  -1.6392767 10033 10033    ohio,ashland    6    Ashland County 850   1947
## 4  -1.4712727 10038 10038  ohio,ashtabula    4  Ashtabula County 493   1947
## 5  -2.6854114 10043 10043     ohio,athens    7     Athens County  98   1947
## 6  -1.1496732 10047 10047   ohio,auglaize    6   Auglaize County 546   1947
## 7  -2.5649494 10052 10052    ohio,belmont    2    Belmont County 217   1947
## 8  -2.4428014 10057 10057      ohio,brown    9      Brown County  78   1947
## 9  -0.9948326 10059 10059     ohio,butler    3     Butler County 933   1947
## 10 -2.4130259 10067 10067    ohio,carroll    6    Carroll County 278   1947
## 11 -1.3079112 10073 10073  ohio,champaign    6  Champaign County 319   1947
## 13 -2.0024070 10079 10079   ohio,clermont    8   Clermont County  61   1947
## 14 -1.1468320 10088 10088    ohio,clinton    6    Clinton County 169   1947
## 15 -1.9620617 10092 10092 ohio,columbiana    4 Columbiana County 598   1947
## 16 -2.4718346 10098 10098  ohio,coshocton    7  Coshocton County 718   1947
## 17 -1.2155117 10102 10102   ohio,crawford    7   Crawford County 852   1947
## 19 -1.2359109 10112 10112      ohio,darke    6      Darke County 134   1947
## 20 -0.9240842 10116 10116   ohio,defiance    7   Defiance County 499   1947
## 21 -1.4987212 10121 10121   ohio,delaware    6   Delaware County 255   1947
## 22 -1.0075133 10125 10125       ohio,erie    4       Erie County 834   1947
## 23 -1.5829901 10133 10133  ohio,fairfield    4  Fairfield County 470   1947
## 24 -0.9077233 10136 10136    ohio,fayette    6    Fayette County 165   1947
## 25 -1.1965877 10143 10143   ohio,franklin    2   Franklin County 552   1947
## 26 -1.1169614 10147 10147     ohio,fulton    6     Fulton County 337   1947
## 27 -3.4835903 10151 10151     ohio,gallia    6     Gallia County  21   1947
## 28 -1.3799451 10158 10158     ohio,geauga    4     Geauga County 290   1947
## 29 -1.1910890 10162 10162     ohio,greene    2     Greene County 146   1947
## 30 -2.8635245 10167 10167   ohio,guernsey    6   Guernsey County 222   1947
## 32 -0.9570199 10174 10174    ohio,hancock    4    Hancock County 485   1947
## 33 -1.2924873 10183 10183     ohio,hardin    6     Hardin County 155   1947
## 34 -2.4798262 10185 10185   ohio,harrison    6   Harrison County 182   1947
## 35 -0.8196974 10193 10193      ohio,henry    6      Henry County 131   1947
## 36 -1.7845011 10196 10196   ohio,highland    7   Highland County 216   1947
## 37 -2.8082813 10202 10202    ohio,hocking    7    Hocking County 303   1947
## 38 -1.9499200 10207 10207     ohio,holmes    6     Holmes County  91   1947
## 39 -1.0492981 10213 10213      ohio,huron    6      Huron County 359   1947
## 40 -3.1080252 10217 10217    ohio,jackson    6    Jackson County 324   1947
## 41 -2.0469134 10223 10223  ohio,jefferson    2  Jefferson County 556   1947
## 42 -1.9936546 10228 10228       ohio,knox    7       Knox County 653   1947
## 43 -1.1659877 10233 10233       ohio,lake    1       Lake County 761   1947
## 44 -3.2193153 10238 10238   ohio,lawrence    3   Lawrence County 612   1947
## 45 -1.9746275 10243 10243    ohio,licking    4    Licking County 447   1947
## 46 -1.4477356 10248 10248      ohio,logan    7      Logan County 139   1947
## 47 -0.9714375 10250 10250     ohio,lorain    3     Lorain County 888   1947
## 48 -1.1792605 10258 10258      ohio,lucas    2      Lucas County 995   1947
## 49 -1.0754210 10259 10259    ohio,madison    6    Madison County 136   1947
## 50 -1.3719432 10268 10268   ohio,mahoning    2   Mahoning County 952   1947
## 51 -0.9882400 10273 10273     ohio,marion    5     Marion County 737   1947
## 52 -1.2752656 10278 10278     ohio,medina    6     Medina County 461   1947
## 53 -3.0038191 10280 10280      ohio,meigs    7      Meigs County  50   1947
## 54 -1.1841796 10288 10288     ohio,mercer    7     Mercer County 382   1947
## 55 -1.1624443 10291 10291      ohio,miami    6      Miami County 728   1947
## 56 -3.3575624 10297 10297     ohio,monroe    8     Monroe County   0   1947
## 58 -3.2958369 10305 10305     ohio,morgan    9     Morgan County  53   1947
## 59 -1.8681932 10309 10309     ohio,morrow    9     Morrow County   0   1947
## 60 -2.5075187 10314 10314  ohio,muskingum    5  Muskingum County 600   1947
## 61 -3.8162029 10320 10320      ohio,noble    8      Noble County  49   1947
## 62 -0.8308937 10324 10324     ohio,ottawa    6     Ottawa County 526   1947
## 63 -0.8239174 10329 10329   ohio,paulding    9   Paulding County 135   1947
## 64 -2.2487085 10336 10336      ohio,perry    7      Perry County 255   1947
## 65 -2.3357938 10347 10347       ohio,pike    9       Pike County   9   1947
## 66 -1.3389859 10353 10353    ohio,portage    6    Portage County 514   1947
## 67 -1.0910569 10357 10357     ohio,preble    6     Preble County  59   1947
## 68 -1.2069270 10362 10362     ohio,putnam    6     Putnam County  88   1947
## 70 -1.5544078 10371 10371       ohio,ross    5       Ross County 301   1947
## 71 -1.0005864 10374 10374   ohio,sandusky    7   Sandusky County 771   1947
## 72 -2.1180029 10381 10381     ohio,scioto    4     Scioto County 483   1947
## 73 -0.9615441 10386 10386     ohio,seneca    7     Seneca County 940   1947
## 74 -0.9889242 10391 10391     ohio,shelby    7     Shelby County 770   1947
## 75 -1.3550553 10397 10397      ohio,stark    2      Stark County 990   1947
## 78 -1.9924302 10412 10412 ohio,tuscarawas    6 Tuscarawas County 587   1947
## 79 -1.4136933 10417 10417      ohio,union    6      Union County 205   1947
## 80 -1.2129850 10423 10423   ohio,van wert    6   Van Wert County 342   1947
## 81 -2.9370247 10428 10428     ohio,vinton    9     Vinton County 251   1947
## 82 -1.3001827 10432 10432     ohio,warren    6     Warren County 530   1947
## 83 -2.8456628 10438 10438 ohio,washington    7 Washington County 337   1947
## 84 -1.4165199 10443 10443      ohio,wayne    6      Wayne County 560   1947
## 85 -1.2357676 10448 10448   ohio,williams    7   Williams County 417   1947
## 86 -0.7350654 10451 10451       ohio,wood    6       Wood County 731   1947
## 87 -1.2747309 10457 10457    ohio,wyandot    7    Wyandot County 155   1947
##    state.y state.low        county.low colorID.y   color
## 1     Ohio      ohio      adams county         1 #FF0000
## 2     Ohio      ohio      allen county        13 #3F0CF9
## 3     Ohio      ohio    ashland county        13 #3F0CF9
## 4     Ohio      ohio  ashtabula county         8 #A61DE0
## 5     Ohio      ohio     athens county         2 #F20420
## 6     Ohio      ohio   auglaize county         9 #951DF1
## 7     Ohio      ohio    belmont county         4 #D90C60
## 8     Ohio      ohio      brown county         2 #F20420
## 9     Ohio      ohio     butler county        14 #2A08FB
## 10    Ohio      ohio    carroll county         5 #CC1180
## 11    Ohio      ohio  champaign county         5 #CC1180
## 13    Ohio      ohio   clermont county         1 #FF0000
## 14    Ohio      ohio    clinton county         3 #E50840
## 15    Ohio      ohio columbiana county         9 #951DF1
## 16    Ohio      ohio  coshocton county        11 #6A15F5
## 17    Ohio      ohio   crawford county        13 #3F0CF9
## 19    Ohio      ohio      darke county         2 #F20420
## 20    Ohio      ohio   defiance county         8 #A61DE0
## 21    Ohio      ohio   delaware county         4 #D90C60
## 22    Ohio      ohio       erie county        13 #3F0CF9
## 23    Ohio      ohio  fairfield county         7 #B319C0
## 24    Ohio      ohio    fayette county         3 #E50840
## 25    Ohio      ohio   franklin county         9 #951DF1
## 26    Ohio      ohio     fulton county         5 #CC1180
## 27    Ohio      ohio     gallia county         1 #FF0000
## 28    Ohio      ohio     geauga county         5 #CC1180
## 29    Ohio      ohio     greene county         3 #E50840
## 30    Ohio      ohio   guernsey county         4 #D90C60
## 32    Ohio      ohio    hancock county         8 #A61DE0
## 33    Ohio      ohio     hardin county         3 #E50840
## 34    Ohio      ohio   harrison county         3 #E50840
## 35    Ohio      ohio      henry county         2 #F20420
## 36    Ohio      ohio   highland county         4 #D90C60
## 37    Ohio      ohio    hocking county         5 #CC1180
## 38    Ohio      ohio     holmes county         2 #F20420
## 39    Ohio      ohio      huron county         6 #BF15A0
## 40    Ohio      ohio    jackson county         5 #CC1180
## 41    Ohio      ohio  jefferson county         9 #951DF1
## 42    Ohio      ohio       knox county        10 #8019F3
## 43    Ohio      ohio       lake county        12 #5511F7
## 44    Ohio      ohio   lawrence county         9 #951DF1
## 45    Ohio      ohio    licking county         7 #B319C0
## 46    Ohio      ohio      logan county         3 #E50840
## 47    Ohio      ohio     lorain county        14 #2A08FB
## 48    Ohio      ohio      lucas county        15 #1504FD
## 49    Ohio      ohio    madison county         2 #F20420
## 50    Ohio      ohio   mahoning county        14 #2A08FB
## 51    Ohio      ohio     marion county        11 #6A15F5
## 52    Ohio      ohio     medina county         7 #B319C0
## 53    Ohio      ohio      meigs county         1 #FF0000
## 54    Ohio      ohio     mercer county         6 #BF15A0
## 55    Ohio      ohio      miami county        11 #6A15F5
## 56    Ohio      ohio     monroe county         1 #FF0000
## 58    Ohio      ohio     morgan county         1 #FF0000
## 59    Ohio      ohio     morrow county         1 #FF0000
## 60    Ohio      ohio  muskingum county         9 #951DF1
## 61    Ohio      ohio      noble county         1 #FF0000
## 62    Ohio      ohio     ottawa county         8 #A61DE0
## 63    Ohio      ohio   paulding county         2 #F20420
## 64    Ohio      ohio      perry county         4 #D90C60
## 65    Ohio      ohio       pike county         1 #FF0000
## 66    Ohio      ohio    portage county         8 #A61DE0
## 67    Ohio      ohio     preble county         1 #FF0000
## 68    Ohio      ohio     putnam county         2 #F20420
## 70    Ohio      ohio       ross county         5 #CC1180
## 71    Ohio      ohio   sandusky county        12 #5511F7
## 72    Ohio      ohio     scioto county         8 #A61DE0
## 73    Ohio      ohio     seneca county        14 #2A08FB
## 74    Ohio      ohio     shelby county        12 #5511F7
## 75    Ohio      ohio      stark county        15 #1504FD
## 78    Ohio      ohio tuscarawas county         9 #951DF1
## 79    Ohio      ohio      union county         4 #D90C60
## 80    Ohio      ohio   van wert county         6 #BF15A0
## 81    Ohio      ohio     vinton county         4 #D90C60
## 82    Ohio      ohio     warren county         8 #A61DE0
## 83    Ohio      ohio washington county         5 #CC1180
## 84    Ohio      ohio      wayne county         9 #951DF1
## 85    Ohio      ohio   williams county         7 #B319C0
## 86    Ohio      ohio       wood county        11 #6A15F5
## 87    Ohio      ohio    wyandot county         3 #E50840

Linear Model VAM 1947 vs Percent 1930

vam54Tractor30Mod <- lm(VAM ~ percent_farm_tractor, data = vam1954Tractor30)
plot(vam1954Tractor30$percent_farm_tractor, vam1954Tractor30$VAM, main = "Linear")
abline(vam54Tractor30Mod, col = "red")

summary(vam54Tractor30Mod)

##
## Call:
## lm(formula = VAM ~ percent_farm_tractor, data = vam1954Tractor30)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -454.38 -169.97  -31.25  200.52  568.19
##
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)
## (Intercept)            362.38      66.15   5.478 1.01e-06 ***
## percent_farm_tractor   668.33     278.94   2.396   0.0199 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 260.4 on 57 degrees of freedom
## Multiple R-squared:  0.0915,	Adjusted R-squared:  0.07556
## F-statistic: 5.741 on 1 and 57 DF,  p-value: 0.01988

Residuals from VAM 47 vs. Percent 30

vam54Residuals <-data.frame(vam54Tractor30Mod$residuals, vam1954Tractor30$county, vam1954Tractor30)
vam54Residuals

##    vam54Tractor30Mod.residuals vam1954Tractor30.county  fips county       avg
## 1                  -333.830662                      10 39001     10  9.209961
## 3                   458.885693                      50 39005     50 10.622720
## 4                   261.151609                      70 39007     70 10.609148
## 5                  -143.953658                      90 39009     90 10.169675
## 6                   424.935463                     110 39011    110 10.507224
## 7                  -145.788121                     130 39013    130 10.050313
## 8                  -299.467348                     150 39015    150  9.738122
## 10                  131.777011                     190 39019    190 10.186483
## 11                  -56.083542                     210 39021    210 10.054731
## 13                 -393.609159                     250 39025    250  9.655556
## 14                 -145.666833                     270 39027    270 10.287815
## 15                  351.676015                     290 39029    290 10.306936
## 16                  568.194913                     310 39031    310 10.322308
## 19                 -115.574195                     370 39037    370 10.221900
## 21                   65.306846                     410 39041    410 10.596730
## 23                  323.373865                     450 39045    450 10.074779
## 24                 -293.009476                     470 39047    470  9.944751
## 26                  -13.103671                     510 39051    510 10.567867
## 27                 -341.893966                     530 39053    530  9.078111
## 28                  -56.524400                     550 39055    550 10.072043
## 29                 -180.476459                     570 39057    570 10.698218
## 30                   -6.517941                     590 39059    590 10.159635
## 32                  107.960123                     630 39063    630 10.913314
## 33                 -140.892311                     650 39065    650 10.281763
## 34                 -134.355942                     670 39067    670  9.872080
## 35                 -292.819961                     690 39069    690 10.158685
## 36                 -188.577663                     710 39071    710 10.189036
## 37                   22.315820                     730 39073    730  9.911968
## 38                 -159.471598                     750 39075    750  9.486194
## 39                   61.584464                     770 39077    770 10.674342
## 40                  -31.245282                     790 39079    790  9.540707
## 42                  364.597642                     830 39083    830 10.511027
## 46                  -35.503469                     910 39091    910 10.684086
## 49                 -454.380967                     970 39097    970  9.682366
## 51                  360.849329                    1010 39101   1010 10.490716
## 52                  188.919881                    1030 39103   1030 10.699282
## 53                 -291.525545                    1050 39105   1050  9.727691
## 54                  212.115409                    1070 39107   1070  9.997748
## 56                 -369.649499                    1110 39111   1110  9.436264
## 58                  100.868815                    1150 39115   1150 10.410557
## 59                   89.430331                    1170 39117   1170 10.576296
## 61                 -158.089019                    1210 39121   1210 10.497222
## 62                   15.458297                    1230 39123   1230 10.310724
## 63                 -119.580046                    1250 39125   1250 10.157001
## 64                 -144.910548                    1270 39127   1270 10.021769
## 65                  318.971916                    1310 39131   1310  8.820574
## 66                  353.445466                    1330 39133   1330 10.780847
## 67                 -437.843665                    1350 39135   1350 10.296046
## 68                  310.714882                    1370 39137   1370  9.923810
## 70                   82.394415                    1410 39141   1410  9.166793
## 72                  111.240883                    1450 39145   1450  9.300681
## 78                  385.486120                    1570 39157   1570 10.274903
## 79                 -210.944458                    1590 39159   1590 10.505435
## 80                   14.922253                    1610 39161   1610 10.594572
## 81                 -120.815214                    1630 39163   1630  9.421053
## 82                 -279.485532                    1650 39165   1650 10.028030
## 83                  408.794698                    1670 39167   1670 10.202782
## 85                  243.397962                    1710 39171   1710 10.532258
## 87                 -243.179970                    1750 39175   1750 10.314286
##    year.x state.x       name percent_farm_tractor stateAbb colorID.x   logAvg
## 1    1930      39      ADAMS           0.05903146       OH         1 2.220286
## 3    1930      39    ASHLAND           0.19412039       OH         3 2.362995
## 4    1930      39  ASHTABULA           0.22963305       OH         4 2.361717
## 5    1930      39     ATHENS           0.06819313       OH         2 2.319410
## 6    1930      39   AUGLAIZE           0.31674026       OH         5 2.352063
## 7    1930      39    BELMONT           0.07692308       OH         2 2.307604
## 8    1930      39      BROWN           0.08691702       OH         2 2.276048
## 10   1930      39    CARROLL           0.08954394       OH         2 2.321062
## 11   1930      39  CHAMPAIGN           0.27038425       OH         4 2.308043
## 13   1930      39   CLERMONT           0.13500993       OH         2 2.267533
## 14   1930      39    CLINTON           0.31764146       OH         5 2.330960
## 15   1930      39 COLUMBIANA           0.14056832       OH         3 2.332817
## 16   1930      39  COSHOCTON           0.08442982       OH         2 2.334307
## 19   1930      39      DARKE           0.29056995       OH         5 2.324533
## 21   1930      39   DELAWARE           0.22341568       OH         4 2.360545
## 23   1930      39  FAIRFIELD           0.20536013       OH         4 2.310035
## 24   1930      39    FAYETTE           0.40344168       OH         6 2.297045
## 26   1930      39     FULTON           0.32727273       OH         5 2.357818
## 27   1930      39     GALLIA           0.03069700       OH         1 2.205866
## 28   1930      39     GEAUGA           0.25159236       OH         4 2.309764
## 29   1930      39     GREENE           0.30389016       OH         5 2.370077
## 30   1930      39   GUERNSEY           0.05706727       OH         1 2.318423
## 32   1930      39    HANCOCK           0.38403563       OH         6 2.389984
## 33   1930      39     HARDIN           0.27458694       OH         5 2.330372
## 34   1930      39   HARRISON           0.08375778       OH         2 2.289711
## 35   1930      39      HENRY           0.44056493       OH         7 2.318329
## 36   1930      39   HIGHLAND           0.16788079       OH         3 2.321312
## 37   1930      39    HOCKING           0.06030856       OH         1 2.293743
## 38   1930      39     HOLMES           0.14228546       OH         3 2.249838
## 39   1930      39      HURON           0.35018345       OH         6 2.367843
## 40   1930      39    JACKSON           0.04468912       OH         1 2.255568
## 42   1930      39       KNOX           0.13619677       OH         2 2.352425
## 46   1930      39      LOGAN           0.23510204       OH         4 2.368755
## 49   1930      39    MADISON           0.34115409       OH         6 2.270306
## 51   1930      39     MARION           0.37223123       OH         6 2.350491
## 52   1930      39     MEDINA           0.27935677       OH         5 2.370177
## 53   1930      39      MEIGS           0.04959729       OH         1 2.274977
## 54   1930      39     MERCER           0.30599711       OH         5 2.302360
## 56   1930      39     MONROE           0.03482003       OH         1 2.244560
## 58   1930      39     MORGAN           0.03703704       OH         1 2.342820
## 59   1930      39     MORROW           0.15440238       OH         3 2.358615
## 61   1930      39      NOBLE           0.02201122       OH         1 2.351111
## 62   1930      39     OTTAWA           0.43565976       OH         7 2.333185
## 63   1930      39   PAULDING           0.43870968       OH         7 2.318163
## 64   1930      39      PERRY           0.10553544       OH         2 2.304760
## 65   1930      39       PIKE           0.09673367       OH         2 2.177087
## 66   1930      39    PORTAGE           0.26211135       OH         4 2.377771
## 67   1930      39     PREBLE           0.33586132       OH         5 2.331760
## 68   1930      39     PUTNAM           0.29911504       OH         5 2.294937
## 70   1930      39       ROSS           0.21131448       OH         4 2.215588
## 72   1930      39     SCIOTO           0.12027158       OH         2 2.230088
## 78   1930      39 TUSCARAWAS           0.13636364       OH         3 2.329704
## 79   1930      39      UNION           0.24324324       OH         4 2.351893
## 80   1930      39   VAN WERT           0.29730849       OH         5 2.360342
## 81   1930      39     VINTON           0.05302326       OH         1 2.242947
## 82   1930      39     WARREN           0.27248201       OH         5 2.305384
## 83   1930      39 WASHINGTON           0.05809575       OH         1 2.322660
## 85   1930      39   WILLIAMS           0.29061161       OH         5 2.354443
## 87   1930      39    WYANDOT           0.27950617       OH         5 2.333530
##    logPercent   X.1     X        polyname RUCC            County VAM year.y
## 1  -2.8296848 10021 10021      ohio,adams    9      Adams County  68   1954
## 3  -1.6392767 10031 10031    ohio,ashland    6    Ashland County 951   1954
## 4  -1.4712727 10036 10036  ohio,ashtabula    4  Ashtabula County 777   1954
## 5  -2.6854114 10042 10042     ohio,athens    7     Athens County 264   1954
## 6  -1.1496732 10044 10044   ohio,auglaize    6   Auglaize County 999   1954
## 7  -2.5649494 10051 10051    ohio,belmont    2    Belmont County 268   1954
## 8  -2.4428014 10058 10058      ohio,brown    9      Brown County 121   1954
## 10 -2.4130259 10066 10066    ohio,carroll    6    Carroll County 554   1954
## 11 -1.3079112 10071 10071  ohio,champaign    6  Champaign County 487   1954
## 13 -2.0024070 10081 10081   ohio,clermont    8   Clermont County  59   1954
## 14 -1.1468320 10087 10087    ohio,clinton    6    Clinton County 429   1954
## 15 -1.9620617 10089 10089 ohio,columbiana    4 Columbiana County 808   1954
## 16 -2.4718346 10095 10095  ohio,coshocton    7  Coshocton County 987   1954
## 19 -1.2359109 10110 10110      ohio,darke    6      Darke County 441   1954
## 21 -1.4987212 10123 10123   ohio,delaware    6   Delaware County 577   1954
## 23 -1.5829901 10131 10131  ohio,fairfield    4  Fairfield County 823   1954
## 24 -0.9077233 10138 10138    ohio,fayette    6    Fayette County 339   1954
## 26 -1.1169614 10144 10144     ohio,fulton    6     Fulton County 568   1954
## 27 -3.4835903 10149 10149     ohio,gallia    6     Gallia County  41   1954
## 28 -1.3799451 10155 10155     ohio,geauga    4     Geauga County 474   1954
## 29 -1.1910890 10159 10159     ohio,greene    2     Greene County 385   1954
## 30 -2.8635245 10165 10165   ohio,guernsey    6   Guernsey County 394   1954
## 32 -0.9570199 10178 10178    ohio,hancock    4    Hancock County 727   1954
## 33 -1.2924873 10181 10181     ohio,hardin    6     Hardin County 405   1954
## 34 -2.4798262 10187 10187   ohio,harrison    6   Harrison County 284   1954
## 35 -0.8196974 10190 10190      ohio,henry    6      Henry County 364   1954
## 36 -1.7845011 10198 10198   ohio,highland    7   Highland County 286   1954
## 37 -2.8082813 10200 10200    ohio,hocking    7    Hocking County 425   1954
## 38 -1.9499200 10205 10205     ohio,holmes    6     Holmes County 298   1954
## 39 -1.0492981 10210 10210      ohio,huron    6      Huron County 658   1954
## 40 -3.1080252 10214 10214    ohio,jackson    6    Jackson County 361   1954
## 42 -1.9936546 10225 10225       ohio,knox    7       Knox County 818   1954
## 46 -1.4477356 10245 10245      ohio,logan    7      Logan County 484   1954
## 49 -1.0754210 10263 10263    ohio,madison    6    Madison County 136   1954
## 51 -0.9882400 10272 10272     ohio,marion    5     Marion County 972   1954
## 52 -1.2752656 10277 10277     ohio,medina    6     Medina County 738   1954
## 53 -3.0038191 10279 10279      ohio,meigs    7      Meigs County 104   1954
## 54 -1.1841796 10287 10287     ohio,mercer    7     Mercer County 779   1954
## 56 -3.3575624 10296 10296     ohio,monroe    8     Monroe County  16   1954
## 58 -3.2958369 10308 10308     ohio,morgan    9     Morgan County 488   1954
## 59 -1.8681932 10312 10312     ohio,morrow    9     Morrow County 555   1954
## 61 -3.8162029 10322 10322      ohio,noble    8      Noble County 219   1954
## 62 -0.8308937 10328 10328     ohio,ottawa    6     Ottawa County 669   1954
## 63 -0.8239174 10333 10333   ohio,paulding    9   Paulding County 536   1954
## 64 -2.2487085 10335 10335      ohio,perry    7      Perry County 288   1954
## 65 -2.3357938 10346 10346       ohio,pike    9       Pike County 746   1954
## 66 -1.3389859 10349 10349    ohio,portage    6    Portage County 891   1954
## 67 -1.0910569 10354 10354     ohio,preble    6     Preble County 149   1954
## 68 -1.2069270 10360 10360     ohio,putnam    6     Putnam County 873   1954
## 70 -1.5544078 10369 10369       ohio,ross    5       Ross County 586   1954
## 72 -2.1180029 10382 10382     ohio,scioto    4     Scioto County 554   1954
## 78 -1.9924302 10411 10411 ohio,tuscarawas    6 Tuscarawas County 839   1954
## 79 -1.4136933 10414 10414      ohio,union    6      Union County 314   1954
## 80 -1.2129850 10419 10419   ohio,van wert    6   Van Wert County 576   1954
## 81 -2.9370247 10426 10426     ohio,vinton    9     Vinton County 277   1954
## 82 -1.3001827 10433 10433     ohio,warren    6     Warren County 265   1954
## 83 -2.8456628 10435 10435 ohio,washington    7 Washington County 810   1954
## 85 -1.2357676 10446 10446   ohio,williams    7   Williams County 800   1954
## 87 -1.2747309 10456 10456    ohio,wyandot    7    Wyandot County 306   1954
##    state.y state.low        county.low colorID.y   color
## 1     Ohio      ohio      adams county         1 #FF0000
## 3     Ohio      ohio    ashland county        14 #2A08FB
## 4     Ohio      ohio  ashtabula county        12 #5511F7
## 5     Ohio      ohio     athens county         4 #D90C60
## 6     Ohio      ohio   auglaize county        15 #1504FD
## 7     Ohio      ohio    belmont county         4 #D90C60
## 8     Ohio      ohio      brown county         2 #F20420
## 10    Ohio      ohio    carroll county         9 #951DF1
## 11    Ohio      ohio  champaign county         8 #A61DE0
## 13    Ohio      ohio   clermont county         1 #FF0000
## 14    Ohio      ohio    clinton county         7 #B319C0
## 15    Ohio      ohio columbiana county        12 #5511F7
## 16    Ohio      ohio  coshocton county        15 #1504FD
## 19    Ohio      ohio      darke county         7 #B319C0
## 21    Ohio      ohio   delaware county         9 #951DF1
## 23    Ohio      ohio  fairfield county        13 #3F0CF9
## 24    Ohio      ohio    fayette county         5 #CC1180
## 26    Ohio      ohio     fulton county         9 #951DF1
## 27    Ohio      ohio     gallia county         1 #FF0000
## 28    Ohio      ohio     geauga county         7 #B319C0
## 29    Ohio      ohio     greene county         6 #BF15A0
## 30    Ohio      ohio   guernsey county         6 #BF15A0
## 32    Ohio      ohio    hancock county        11 #6A15F5
## 33    Ohio      ohio     hardin county         6 #BF15A0
## 34    Ohio      ohio   harrison county         5 #CC1180
## 35    Ohio      ohio      henry county         6 #BF15A0
## 36    Ohio      ohio   highland county         5 #CC1180
## 37    Ohio      ohio    hocking county         7 #B319C0
## 38    Ohio      ohio     holmes county         5 #CC1180
## 39    Ohio      ohio      huron county        10 #8019F3
## 40    Ohio      ohio    jackson county         6 #BF15A0
## 42    Ohio      ohio       knox county        13 #3F0CF9
## 46    Ohio      ohio      logan county         8 #A61DE0
## 49    Ohio      ohio    madison county         2 #F20420
## 51    Ohio      ohio     marion county        15 #1504FD
## 52    Ohio      ohio     medina county        11 #6A15F5
## 53    Ohio      ohio      meigs county         2 #F20420
## 54    Ohio      ohio     mercer county        12 #5511F7
## 56    Ohio      ohio     monroe county         1 #FF0000
## 58    Ohio      ohio     morgan county         8 #A61DE0
## 59    Ohio      ohio     morrow county         9 #951DF1
## 61    Ohio      ohio      noble county         4 #D90C60
## 62    Ohio      ohio     ottawa county        10 #8019F3
## 63    Ohio      ohio   paulding county         8 #A61DE0
## 64    Ohio      ohio      perry county         5 #CC1180
## 65    Ohio      ohio       pike county        11 #6A15F5
## 66    Ohio      ohio    portage county        14 #2A08FB
## 67    Ohio      ohio     preble county         3 #E50840
## 68    Ohio      ohio     putnam county        13 #3F0CF9
## 70    Ohio      ohio       ross county         9 #951DF1
## 72    Ohio      ohio     scioto county         9 #951DF1
## 78    Ohio      ohio tuscarawas county        13 #3F0CF9
## 79    Ohio      ohio      union county         5 #CC1180
## 80    Ohio      ohio   van wert county         9 #951DF1
## 81    Ohio      ohio     vinton county         5 #CC1180
## 82    Ohio      ohio     warren county         4 #D90C60
## 83    Ohio      ohio washington county        12 #5511F7
## 85    Ohio      ohio   williams county        12 #5511F7
## 87    Ohio      ohio    wyandot county         5 #CC1180

High Education and VAM(1947, 1954)

##1947

plot(vam1947Tractor30$VAM, vam1947Tractor30$avg, main = "Education vs. VAM (1947)")

##1954

plot(vam1954Tractor30$VAM, vam1954Tractor30$avg, main = "Education vs. VAM (1954)")

Notice the lowest points going across the X-Axis(On Plot with line connecting outlier counties). These counties identified left-to-right are:

Pike County
Ross County
Madison County
Fayette County

The 2 Counties which are overlaying each other and can be seen to the right of the outlier line are

Paulding County
Henry County

The objective was to look at the difference in education from 1930 and 1940 and plot that against the % of Farms with Tractors in 1930. The problem I encountered was that the 1930’s Education dataset has not features which indicate the highest grade completed for an individual.

Taking a closer look at the outlier counties listed above(Pike, Ross, Madison, Fayette) but removing everyone who is currently in school, as they could continue their education and in that case the current represented value would not be accurate.

Highest Grade Completed by Age for the 4 outlier counties

pike <- subset(pike, school == "No, not in school")
ggplot(pike, aes(x = age, y = higrade, group = age)) + geom_boxplot() + xlab("Age Groups") + ylab("Highest Grade Completed") + ggtitle("Pike County")

ross <- subset(ross, school == "No, not in school")
ggplot(ross, aes(x = age, y = higrade, group = age)) + geom_boxplot() + xlab("Age Groups") + ylab("Highest Grade Completed") + ggtitle("Ross County")

madison <- subset(madison, school == "No, not in school")
ggplot(madison, aes(x = age, y = higrade, group = age)) + geom_boxplot() + xlab("Age Groups") + ylab("Highest Grade Completed") + ggtitle("Madison County")

fayette <- subset(fayette, school == "No, not in school")
ggplot(fayette, aes(x = age, y = higrade, group = age)) + geom_boxplot() + xlab("Age Groups") + ylab("Highest Grade Completed") + ggtitle("Fayette County")