Introduction

Transit Oriented Development has been a trend that promotes more housing and mixed-use amenities near public transportation. While such development can potentially increase efficiency of transit system operation, it can also be beneficial for land value capitalization which would generate more property tax revenue for the city. To examine whether TOD development would be suitable for our city, we studied TOD in Chicago by comparing socio-demographic information, rent, and crime data between TOD and non-TOD areas across 2009 and 2017. We also examined the relationship between distance to transit stations and rent.

1. Set up

First, we prepared the data we needed for the analysis. We used 2009 and 2017 ACS 5-year data and tracts for Chicago, IL by selecting those from Cook County, IL, and used 2015 CTA ‘L’-lines from Chicago Data Portal. ## 1.1 Setting up libraries, map theme, quintile map breaks, and color pallete

#setwd("C:/Users/m1861/OneDrive/Documents/GitHub/MUSA508_Assignment1")
#install.packages("rgdal")
#install.packages("tmap")
#install.packages("RSocrata")
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
library(knitr)
library(RSocrata)
library(tidyverse)
library(sf)
library(ggplot2)
library(tidycensus)
library(kableExtra)
library(tmap)
library(rgdal)
library(rmarkdown)

options(scipen=999)
options(tigris_class = "sf")

#census_api_key("7e8429f6b3dde9bda3914ce0fdd7f57e1c3d445e", overwrite = TRUE)

mapTheme <- function(base_size = 12) {
  theme(
    text = element_text( color = "black"),
    plot.title = element_text(size = 16,colour = "black"),
    plot.subtitle=element_text(face="italic"),
    plot.caption=element_text(hjust=0),
    axis.ticks = element_blank(),
    panel.background = element_blank(),axis.title = element_blank(),
    axis.text = element_blank(),
    axis.title.x = element_blank(),
    axis.title.y = element_blank(),
    panel.grid.minor = element_blank(),
    panel.border = element_rect(colour = "black", fill=NA, size=2),
    strip.text.x = element_text(size = 14))
}

plotTheme <- function(base_size = 12) {
  theme(
    text = element_text( color = "black"),
    plot.title = element_text(size = 16,colour = "black"),
    plot.subtitle = element_text(face="italic"),
    plot.caption = element_text(hjust=0),
    axis.ticks = element_blank(),
    panel.background = element_blank(),
    panel.grid.major = element_line("grey80", size = 0.1),
    panel.grid.minor = element_blank(),
    panel.border = element_rect(colour = "black", fill=NA, size=2),
    strip.background = element_rect(fill = "grey80", color = "white"),
    strip.text = element_text(size=12),
    axis.title = element_text(size=12),
    axis.text = element_text(size=10),
    plot.background = element_blank(),
    legend.background = element_blank(),
    legend.title = element_text(colour = "black", face = "italic"),
    legend.text = element_text(colour = "black", face = "italic"),
    strip.text.x = element_text(size = 14)
    )
}

#qBr and q5 functions are used to create quintile map breaks
qBr <- function(df, variable, rnd) {
  if (missing(rnd)) {
    as.character(quantile(round(df[[variable]],0),
                          c(.01,.2,.4,.6,.8), na.rm=T))
  } else if (rnd == FALSE | rnd == F) {
    as.character(formatC(quantile(df[[variable]]), digits = 3),
                          c(.01,.2,.4,.6,.8), na.rm=T)
  }
}

qBr2 <- function(df, variable, rnd) { # quintile break for percentages
  if (missing(rnd)) {
    as.character(quantile(round(df[[variable]],2),
                          c(.01,.2,.4,.6,.8), na.rm=T))# breaks at the 1st, 20th, 40th, 60th, 80th
  } else if (rnd == FALSE | rnd == F) {
    as.character(formatC(quantile(df[[variable]]), digits = 3),
                 c(.01,.2,.4,.6,.8), na.rm=T)
  }
}

q5 <- function(variable) {as.factor(ntile(variable, 5))}

palette5 <- c("#f0f9e8","#bae4bc","#7bccc4","#43a2ca","#0868ac")

1.2 Downloading and wrangling Census data for Chicago, IL

#2009
acs_variable_list.2009 <- load_variables(2009,#year
                                         "acs5",#5 year ACS estimates
                                         cache = TRUE)
initial_tracts09 <-  
  get_acs(geography = "tract", 
          variables = c("B01001_001E", #total pop
                        "B19326_001E", #median inc
                        "B10057_001E", #total marriage
                        "B10057_002E", #now married
                        "B01002_001E", #median age
                        "B08141_001E", #total vehicle user 
                        "B08141_002E", #no vehicle available
                        "B25058_001E"),#median contract rent
          year=2009, 
          state=17, 
          county=031, survey = "acs5",
          geometry=T)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |===                                                                   |   5%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   8%
  |                                                                            
  |=======                                                               |   9%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |========                                                              |  12%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |============                                                          |  16%
  |                                                                            
  |=============                                                         |  18%
  |                                                                            
  |=============                                                         |  19%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |===============                                                       |  22%
  |                                                                            
  |================                                                      |  23%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |====================                                                  |  28%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |======================                                                |  31%
  |                                                                            
  |======================                                                |  32%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |=========================                                             |  35%
  |                                                                            
  |==========================                                            |  38%
  |                                                                            
  |============================                                          |  39%
  |                                                                            
  |=============================                                         |  41%
  |                                                                            
  |=============================                                         |  42%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |================================                                      |  46%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |==================================                                    |  48%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |===================================                                   |  51%
  |                                                                            
  |====================================                                  |  52%
  |                                                                            
  |======================================                                |  54%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |=========================================                             |  58%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |===========================================                           |  61%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |=============================================                         |  65%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |================================================                      |  69%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |======================================================                |  77%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |==========================================================            |  82%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |=============================================================         |  88%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |================================================================      |  92%
  |                                                                            
  |=================================================================     |  92%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |==================================================================    |  95%
  |                                                                            
  |====================================================================  |  96%
  |                                                                            
  |===================================================================== |  99%
  |                                                                            
  |======================================================================| 100%
tracts09 <- initial_tracts09 %>%
  st_transform('ESRI:102728') %>%
  dplyr::select( -NAME, -moe) %>%
  spread(variable, estimate) %>% # transpose 
  dplyr::select(-geometry) %>%
  rename(TotalPop = B01001_001,
         MedianIncome = B19326_001,
         TotalMarriage = B10057_001, 
         Married = B10057_002,
         MedianAge = B01002_001, 
         NoVehicle=B08141_002,
         MedianRent = B25058_001,
         TotalVehicle= B08141_001) %>%
  mutate(Married_perc = ifelse(TotalMarriage > 0, Married / TotalMarriage, 0),# prevents divide by 0 errors
         NoVehicle_perc = ifelse(TotalVehicle > 0, NoVehicle / TotalVehicle, 0), 
         year = "2009")%>%
dplyr::select(-Married, -NoVehicle, -TotalVehicle, -TotalMarriage)

#if need to select only chicago tracts, use this code:
ChicagoTracts <- read.csv("ChicagoTracts.csv")
ChicagoGEOID <- ChicagoTracts$GEOID10
tracts09 <- subset(tracts09, tracts09$GEOID %in% ChicagoGEOID)

#2017
initial_tracts17 <- 
  get_acs(geography = "tract", 
          variables = c("B01001_001E", #total pop
                        "B19326_001E", #median inc
                        "B10057_001E", #total marriage
                        "B10057_002E", #now married
                        "B01002_001E", #median age
                        "B08141_001E", #total vehicle user 
                        "B08141_002E", #no vehicle available
                        "B25058_001E"),#median contract rent
          year=2017, 
          state=17, 
          county=031, survey = "acs5",
          geometry=T) %>%
  st_transform('ESRI:102728')
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |===                                                                   |   5%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   8%
  |                                                                            
  |=======                                                               |   9%
  |                                                                            
  |=======                                                               |  11%
  |                                                                            
  |=========                                                             |  12%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |===========                                                           |  15%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |=============                                                         |  19%
  |                                                                            
  |===============                                                       |  22%
  |                                                                            
  |================                                                      |  23%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |====================                                                  |  28%
  |                                                                            
  |======================                                                |  31%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |==========================                                            |  38%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |==============================                                        |  42%
  |                                                                            
  |================================                                      |  45%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |==================================                                    |  48%
  |                                                                            
  |===================================                                   |  49%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |====================================                                  |  52%
  |                                                                            
  |======================================                                |  54%
  |                                                                            
  |======================================                                |  55%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |========================================                              |  58%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |==========================================                            |  59%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |===========================================                           |  61%
  |                                                                            
  |============================================                          |  62%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |==============================================                        |  66%
  |                                                                            
  |================================================                      |  69%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |=====================================================                 |  75%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |======================================================                |  78%
  |                                                                            
  |=======================================================               |  78%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |=========================================================             |  81%
  |                                                                            
  |==========================================================            |  82%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |===========================================================           |  85%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |=============================================================         |  88%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |================================================================      |  92%
  |                                                                            
  |=================================================================     |  92%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |===================================================================   |  95%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |===================================================================== |  98%
  |                                                                            
  |===================================================================== |  99%
  |                                                                            
  |======================================================================| 100%
tracts17 <- initial_tracts17 %>%
  dplyr::select( -NAME, -moe) %>% 
  spread(variable, estimate) %>% # transpose 
  dplyr::select(-geometry) %>%
  rename(TotalPop = B01001_001,
         MedianIncome = B19326_001,
         TotalMarriage = B10057_001, 
         Married = B10057_002,
         MedianAge = B01002_001, 
         NoVehicle=B08141_002,
         MedianRent = B25058_001,
         TotalVehicle= B08141_001) %>%
  mutate(Married_perc = ifelse(TotalMarriage > 0, Married / TotalMarriage, 0),# prevents divide by 0 errors
         NoVehicle_perc = ifelse(TotalVehicle > 0, NoVehicle / TotalVehicle, 0),  
         year = "2017") %>%
  dplyr::select(-Married, -NoVehicle, -TotalVehicle, -TotalMarriage)
tracts17 <- subset(tracts17, tracts17$GEOID %in% ChicagoGEOID)

#Combine them
Tracts0917 <- rbind(tracts09,tracts17)

1.3 Wrangling transit open data for Chicago Transportation Authority

CTAStops <- 
    st_read("CTA_RailStations.kml") %>% 
  rename(Station=Name)%>%
  mutate(CTA_Rail="CTA_Rail")%>%
  st_transform(st_crs(tracts09))  
## Reading layer `CTA_RailStations' from data source `C:\Users\m1861\OneDrive\Documents\GitHub\MUSA508_Assignment1\CTA_RailStations.kml' using driver `KML'
## Simple feature collection with 144 features and 2 fields
## geometry type:  POINT
## dimension:      XYZ
## bbox:           xmin: -87.90185 ymin: 41.72238 xmax: -87.60586 ymax: 42.07315
## z_range:        zmin: 0 zmax: 0
## geographic CRS: WGS 84
#ggplot() + 
#  geom_sf(data=st_union(tracts09)) +
#  geom_sf(data=CTAStops, 
#          aes(colour = CTA_Rail), # assign color to the CTA_Rail point attribute
#          show.legend = "point", size= 2) +
#  scale_colour_manual(values = c("blue")) +
#  labs(title="CTA Rail Stations", 
#       subtitle="Cook County,IL", 
#       caption="Figure x") +
#  mapTheme()

CTABuffers <- 
  rbind(
    st_buffer(CTAStops, 2640) %>%
      mutate(Legend = "Buffer") %>% # give individual buffers a legend column named "buffer"
      dplyr::select(Legend),# necessary step? because legend is already selected
    st_union(st_buffer(CTAStops, 2640)) %>%
      st_sf() %>% # the st_sf convert the union output into an sf layer
      mutate(Legend = "Unioned Buffer")) # give the 1 unioned buffer geom a column named "Unioned Buffer"

# Create an sf object with ONLY the unioned buffer
buffer <- filter(CTABuffers, Legend=="Unioned Buffer")

#population and CTA, not used for assignment
totalPop09 <- initial_tracts09 %>%
  filter(variable=="B01001_001")

# Do a centroid-in-polygon join to see which tracts have their centroid in the buffer
# Note the st_centroid call creating centroids for each feature
selectCentroids09 <-
  st_centroid(tracts09)[buffer,] %>% # get centroid of tracts within the buffer
  st_drop_geometry() %>% # converts the sf to a data frame
  left_join(dplyr::select(tracts09, GEOID)) %>% # join the sf back to the tracts09 based on GEOID
  st_sf() %>% # convert data frame to sf with polygon geometries
  dplyr::select(TotalPop) %>% # add the TopPop as an output column
  mutate(Selection_Type = "Select by Centroids") # add a column to identify the selection type

selectCentroids17 <-
  st_centroid(tracts17)[buffer,] %>% # get centroid of tracts within the buffer
  st_drop_geometry() %>% # converts the sf to a data frame
  left_join(dplyr::select(tracts17, GEOID)) %>% # join the sf back to the tracts09 based on GEOID
  st_sf() %>% # convert data frame to sf with polygon geometries
  dplyr::select(TotalPop) %>% # add the TopPop as an output column
  mutate(Selection_Type = "Select by Centroids")

1.4. Distinguishing TOD and non-TOD tracts

Tracts0917.group <- 
  rbind(
    st_centroid(Tracts0917)[buffer,] %>%
      st_drop_geometry() %>%
      left_join(Tracts0917) %>%
      st_sf() %>%
      mutate(TOD = "TOD"),# assign TOD in the TOD field for the buffered tracts
    st_centroid(Tracts0917)[buffer, op = st_disjoint] %>% # disjoint- inverse the selection
      st_drop_geometry() %>%
      left_join(Tracts0917) %>%
      st_sf() %>%
      mutate(TOD = "Non-TOD")) %>% #assign non-TOD in the TOD field for the buffered tracts
  mutate(MedianRent.inf = ifelse(year == "2009", MedianRent * 1.14, MedianRent))%>% # adjust for inflation to make it comparable across time
mutate(MedianIncome.inf = ifelse(year == "2009", MedianIncome * 1.14, MedianIncome))%>%
mutate(NoVehicle_perc1 = NoVehicle_perc * 100, NoVehicle_perc,
       Married_perc1 = Married_perc * 100, Married_perc)

# compare TOD and Non-TOD Areas Only 2009-2017
ggplot(Tracts0917.group)+
  geom_sf(data = st_union(tracts09))+# create basemap through union of tracts
  geom_sf(aes(fill = TOD)) +# apply fill to TOD only
  labs(title = "TOD map of Chicago, 2009 vs 2017",
       subtitle = "by Census tract",
       caption="Figure 1: TOD map of Chicago") +
  facet_wrap(~year)+ # maps side by side based on year attribute
  mapTheme() + # apply default format settings for map
  theme(plot.title = element_text(size=22))

2. Demographics information of residents living in TOD and Non-TOD tracts, 2009 vs 2017

Our hypothesis is that residents will pay higher rents if they wish to live in areas close to transit stations relative to places at distance. But first, we must understand the characteristics of people that are living near transit in Chicago right now. Therefore, we selected four indicators - age, median income, vehicle ownership, population - to make this interpretation.

4 small-multiple visualizations comparing 4 selected Census variables across time and space

As the red border denotes areas close to CTA stations, the maps (figure 1.1-1.4) illustrate the characteristics of citizens across space (TOD and Non-TOD tracts) and time (2009 and 2017).

med_income_0917_maps<- ggplot(Tracts0917.group)+
  geom_sf(data = st_union(tracts09))+
  geom_sf(aes(fill = q5(MedianIncome.inf))) + # inlcude the inflation-adjusted income
  geom_sf(data = buffer, fill = "transparent", color = "red")+ # buffer represents the TOD area
  scale_fill_manual(values = palette5,
                    labels = qBr(Tracts0917.group, "MedianIncome.inf"),
                    name = "Median Income\n(Quintile Breaks)") +
  labs(title = "Median Income 2009 vs 2017", subtitle = "Real Dollars; The red border denotes areas close to suway stations",
       caption="Figure 1.1: Median income, TOD vs non-TOD ") +
  facet_wrap(~year)+
  mapTheme() + 
  theme(plot.title = element_text(size=22))
med_income_0917_maps

novehicle_0917_maps<- ggplot(Tracts0917.group)+
  geom_sf(data = st_union(tracts09))+
  geom_sf(aes(fill = q5(NoVehicle_perc1))) + 
  geom_sf(data = buffer, fill = "transparent", color = "red")+
  scale_fill_manual(values = palette5,
                    labels = qBr2(Tracts0917.group, "NoVehicle_perc1"),
                    name = "Percentage No Vehicle\n(Quintile Breaks)") +
  labs(title = "No Vehicle Ownership 2009 vs 2017", subtitle = "as a %; The red border denotes areas close to suway stations",
       caption="Figure 1.2: No vehicle ownership, TOD vs non-TOD") +
  facet_wrap(~year)+
  mapTheme() + 
  theme(plot.title = element_text(size=22))
novehicle_0917_maps

med_age_0917_maps<- ggplot(Tracts0917.group)+
  geom_sf(data = st_union(tracts09))+
  geom_sf(aes(fill = q5(MedianAge))) + 
  geom_sf(data = buffer, fill = "transparent", color = "red")+
  scale_fill_manual(values = palette5,
                    labels = qBr(Tracts0917.group, "MedianAge"),
                    name = "Median Age\n(Quintile Breaks)") +
  labs(title = "Median Age 2009 vs 2017", subtitle = "Years Old; The red border denotes areas close to suway stations", caption="Figure 1.3: Median age, TOD vs non-TOD") +
  facet_wrap(~year)+
  mapTheme() + 
  theme(plot.title = element_text(size=22))
med_age_0917_maps

pop_0917_maps<-ggplot(Tracts0917.group)+
  geom_sf(data = st_union(tracts09))+
  geom_sf(aes(fill = q5(TotalPop))) + 
  geom_sf(data = buffer, fill = "transparent", color = "red")+ # buffer represents the TOD area
  scale_fill_manual(values = palette5,
                    labels = qBr(Tracts0917.group, "TotalPop"),
                    name = "Population\n(Quintile Breaks)") +
  labs(title = "Population 2009 vs 2017", subtitle = "by tract; The red border denotes areas close to suway stations", caption="Figure 1.4: Population, TOD vs. non-TOD ") +
  facet_wrap(~year)+
  mapTheme() + 
  theme(plot.title = element_text(size=22))
pop_0917_maps

We also generate an additional map illustrating median rent across space (TOD and Non-TOD tracts) and time (2009 and 2017)

rent_0917_maps<-ggplot(Tracts0917.group)+
  geom_sf(data = st_union(tracts09))+
  geom_sf(aes(fill = q5(MedianRent))) + 
  geom_sf(data = buffer, fill = "transparent", color = "red")+ # buffer represents the TOD area
  scale_fill_manual(values = palette5,
                    labels = qBr(Tracts0917.group, "TotalPop"),
                    name = "Median Rent\n(Quintile Breaks)") +
  labs(title = "Median Rent 2009 vs 2017", subtitle = "by tract",
       captions = "Figure 1,5: Median rent, TOD vs non-TOD") +
  facet_wrap(~year)+
  mapTheme() + 
  theme(plot.title = element_text(size=22))
rent_0917_maps

TOD indicator table

Table 1 summarizes the average age, average income, average share of no vehicle ownership, average population, and average rent of all transit-rich (TOD) tracts and non-TOD tracts, for both 2009 and 2017.

Tracts0917.Summary <- # variable for the summary stats
  st_drop_geometry(Tracts0917.group) %>%
  group_by(year, TOD) %>% # show diff across year and TOD field type (TOD or no TOD)
  summarize(Rent = mean(MedianRent.inf, na.rm = T), # na.rm remove missing values
            Population = mean(TotalPop, na.rm = T),
            Income = mean(MedianIncome.inf, na.rm = T),
            Percent_NoVehicle = mean(NoVehicle_perc1, na.rm = T),
            Age = mean(MedianAge, na.rm = T))

#kable(Tracts0917.Summary) %>% # generate table with kable
#  kable_styling() %>%
#  footnote(general_title = "\n",
#           general = "Summary Statistics for Selected Census Variables 2009-2017 for TOD and Non-TOD")

Tracts0917.Summary %>%
  unite(year.TOD, year, TOD, sep = ": ", remove = T) %>%
  gather(Variable, estimate, -year.TOD) %>% # note that "estimate" represent the value for each field
  mutate(estimate = round(estimate, 2)) %>% #"gather" converts data to long form using year.TOD as the grouping variable
  spread(year.TOD, estimate) %>%
  kable() %>%
  kable_styling() %>%
  footnote(general_title = "\n",
           general = "Table 1: Summary Statistics for Selected Census Variables 2009-2017 for TOD and No-TOD")
Variable 2009: Non-TOD 2009: TOD 2017: Non-TOD 2017: TOD
Age 34.05 32.09 35.79 33.74
Income 28564.40 34917.47 27917.09 34831.83
Percent_NoVehicle 12.39 18.11 13.59 21.81
Population 3670.91 3083.30 3617.77 3121.05
Rent 830.36 940.15 859.81 1023.54

Table 1: Summary Statistics for Selected Census Variables 2009-2017 for TOD and No-TOD

TOD indicator plots

The indicators are visualized in a grouped bar plot, as shown in Figure 2.

Tracts0917SummaryPlots<-Tracts0917.Summary %>%
  gather(Variable, estimate, -year, -TOD) %>% # move data into the long form using year and TOD as grouping variables
  ggplot(aes(year, estimate, fill = TOD)) + # year is the x axis, bar color filled by TOD
  geom_bar(stat = "identity", position = "dodge") + # tells that the y-axis estimate is provided, position the bars side-by-side
  facet_wrap(~Variable, scales = "free", ncol=5) +# small multiple plots across each variable, scales=free allows y-axis to vary with the scale of each variable
  scale_fill_manual(values = c("#bae4bc", "#0868ac")) +
  labs(title = "Indicator differences 2009 vs 2017 for TOD and Non-TOD",
       caption = "Figure 2: Bar plot for indicators") +
  plotTheme() + theme(legend.position="bottom",
                      strip.text.x = element_text(size=8))
Tracts0917SummaryPlots

From the maps, the table, and the plots, we understand more about TOD in Chicago. Between 2009 and 2017, the City incorporated more tracts. While its population and citizens’ income have not changed much, rents have generally increased and more people have relied less on private vehicles. In 2009, the rent difference between TOD and non-TOD tracts is 109.79 dollars, and in 2017 this has increased by 49% to 163.73 dollars. Meanwhile, in 2009, the income difference between TOD and non-TOD tracts is 6353.07 dollars, and in 2017 this has only increased by 9% to 6914.74 dollars. This shows that for people living near transit, rent has shared a larger percentage of their income.

Comparing TOD and non-TOD residents, for both years, although more people live in non-TOD neighborhoods, people living in transit-rich neighborhoods tend to be younger in age, higher in income, and more inclined to not owning a vehicle. This indicates that people living in transit-rich areas have paid more rents.

However, transit-access is not the only factor for higher rents of those areas. Some of the transit-rich areas are within the central business district and more affluent neighborhoods in Chicago. Rather than the transit facilities, the location and other amenities in these neighborhoods might be the reason of the higher rent. Additionally, since CTA was not built in 2009, we are not comparing the rents before and after the establishments of CTA.

3. Are Chicagoans willing to pay more for TOD?

Two graduated symbol maps of population and rent within 0.5 mile of each transit station

It is important to understand the relationship between transit access and rent in context of the location of the transit stations and the transit-rich neighborhood. With the blue dots denoting CTA stations and the blue circles denoting the estimate, these graduate symbol maps show the estimated population and median rent for tracts within 0.5 mile of each CTA station.

From Figure 3.1, we can see that population of areas near the neighborhood has increased. However, the North side of the city has seen a larger increase than the South side of the city.

CTAStops_new <- CTAStops
CTAStops_new <- st_join(CTAStops_new, Tracts0917.group, join = st_intersects) %>%
  dplyr::mutate(lat = sf::st_coordinates(.)[,1],
                lon = sf::st_coordinates(.)[,2]) %>%
  na.omit()

#pop
graduate_pop <- ggplot(CTAStops_new)+
  geom_sf(data = Tracts0917) +
  geom_point(data=CTAStops_new, aes(lat,lon, size=TotalPop), color="#0868ac", fill="black", alpha=0.5, na.rm = T) +
  scale_size_area(max_size = 4) +
  labs(title = "Graduate symbol map of population \nwithin 1/2mi. of CTA stations, 2009 vs 2017",
       caption = "Figure 3.1, graduate symbol map of population") +
  facet_wrap(~year)+ # maps side by side based on year attribute
  mapTheme() + # apply default format settings for map
  theme(plot.title = element_text(size=22)) +
  geom_sf(data=CTAStops, 
          aes(colour = CTA_Rail), # assign color to the CTA_Rail point attribute
          show.legend = "point", size= 1) +
  scale_colour_manual(values = c("blue"))  
graduate_pop

From Figure 3.2, we can see that the median rent of areas near the neighborhood has increased. Same as this for population, the North side of the city has seen a larger increase than the South side of the city.

#rent
graduate_rent <- ggplot(CTAStops_new)+
  geom_sf(data = Tracts0917) +
  geom_point(data=CTAStops_new, aes(lat,lon, size=MedianRent.inf), color="#0868ac", fill="black", alpha=0.5, na.rm = T) +
  scale_size_area(max_size = 4) +
  labs(title = "Graduate symbol map of median rent \nwithin 1/2mi. of CTA stations, 2009 vs 2017",
       caption = "Figure 3.2: graduate symbol map of median rent") +
  facet_wrap(~year)+ # maps side by side based on year attribute
  mapTheme() + # apply default format settings for map
  theme(plot.title = element_text(size=22)) +
  geom_sf(data=CTAStops, 
          aes(colour = CTA_Rail), # assign color to the CTA_Rail point attribute
          show.legend = "point", size= 1) +
  scale_colour_manual(values = c("blue"))  
graduate_rent

Despite that these maps provide evidence for our analysis, it is important to know that they have unavoidable scale bias. Estimating the population and median rent for the area within 0.5 mile of CTA stations is done by joining the estimates of the tracts the stations are located in. This is definitely not the most accurate number since we didn’t survey the actual population within the areas. Nonetheless, the maps still provide useful information for planners.

Relationship between rent and distance to CTA stations

To take a closer look at the relationship between rent and distance to transit, we used multiple ring buffers to generate distances to transit and plot them with respect to the average median rent of the intersected tracts. The plot suggests that generally, rent decreases as distance to transit increases. However, this relationship is most significant for those within 1-mile of CTA stations. As distance increases, the trends of rent vary.

multipleRingBuffer <- function(inputPolygon, maxDistance, interval) 
{
  #create a list of distances that we'll iterate through to create each ring
  distances <- seq(0, maxDistance, interval)
  #we'll start with the second value in that list - the first is '0'
  distancesCounter <- 2
  #total number of rings we're going to create
  numberOfRings <- floor(maxDistance / interval)
  #a counter of number of rings
  numberOfRingsCounter <- 1
  #initialize an otuput data frame (that is not an sf)
  allRings <- data.frame()
  
  #while number of rings  counteris less than the specified nubmer of rings
  while (numberOfRingsCounter <= numberOfRings) 
  {
    #if we're interested in a negative buffer and this is the first buffer
    #(ie. not distance = '0' in the distances list)
    if(distances[distancesCounter] < 0 & distancesCounter == 2)
    {
      #buffer the input by the first distance
      buffer1 <- st_buffer(inputPolygon, distances[distancesCounter])
      #different that buffer from the input polygon to get the first ring
      buffer1_ <- st_difference(inputPolygon, buffer1)
      #cast this sf as a polygon geometry type
      thisRing <- st_cast(buffer1_, "POLYGON")
      #take the last column which is 'geometry'
      thisRing <- as.data.frame(thisRing[,ncol(thisRing)])
      #add a new field, 'distance' so we know how far the distance is for a give ring
      thisRing$distance <- distances[distancesCounter]
    }
    
    
    #otherwise, if this is the second or more ring (and a negative buffer)
    else if(distances[distancesCounter] < 0 & distancesCounter > 2) 
    {
      #buffer by a specific distance
      buffer1 <- st_buffer(inputPolygon, distances[distancesCounter])
      #create the next smallest buffer
      buffer2 <- st_buffer(inputPolygon, distances[distancesCounter-1])
      #This can then be used to difference out a buffer running from 660 to 1320
      #This works because differencing 1320ft by 660ft = a buffer between 660 & 1320.
      #bc the area after 660ft in buffer2 = NA.
      thisRing <- st_difference(buffer2,buffer1)
      #cast as apolygon
      thisRing <- st_cast(thisRing, "POLYGON")
      #get the last field
      thisRing <- as.data.frame(thisRing$geometry)
      #create the distance field
      thisRing$distance <- distances[distancesCounter]
    }
    
    #Otherwise, if its a positive buffer
    else 
    {
      #Create a positive buffer
      buffer1 <- st_buffer(inputPolygon, distances[distancesCounter])
      #create a positive buffer that is one distance smaller. So if its the first buffer
      #distance, buffer1_ will = 0. 
      buffer1_ <- st_buffer(inputPolygon, distances[distancesCounter-1])
      #difference the two buffers
      thisRing <- st_difference(buffer1,buffer1_)
      #cast as a polygon
      thisRing <- st_cast(thisRing, "POLYGON")
      #geometry column as a data frame
      thisRing <- as.data.frame(thisRing[,ncol(thisRing)])
      #add teh distance
      thisRing$distance <- distances[distancesCounter]
    }  
    
    #rbind this ring to the rest of the rings
    allRings <- rbind(allRings, thisRing)
    #iterate the distance counter
    distancesCounter <- distancesCounter + 1
    #iterate the number of rings counter
    numberOfRingsCounter <- numberOfRingsCounter + 1
  }
  
  #convert the allRings data frame to an sf data frame
  allRings <- st_as_sf(allRings)
}
    
buffers <- multipleRingBuffer(buffer, 100000, 1320)
#clip butters to median rent, calculate rent

#copy original tract
RentTracts <- Tracts0917.group
#join buffer to rent tracts
RentTracts_joined <- st_join(RentTracts, buffers, join = st_intersects)
#summarize average median rent by distance by year
Rent.Summary <- # variable for the summary stats
  st_drop_geometry(RentTracts_joined) %>%
  group_by(year, distance) %>% # show diff across year and TOD field type (TOD or no TOD)
  summarize(AverageMedianRent = mean(MedianRent, na.rm = T)) %>%
  na.omit() %>%
  mutate(miles=distance/5280)

RentPlot <-
  ggplot(data=Rent.Summary, aes(x=miles, y=AverageMedianRent, group=year, color=year)) +# year is the x axis, bar color filled by TOD
  geom_line(size=2) + 
  scale_fill_manual(values = c("#bae4bc", "#0868ac")) +
  labs(title = "Rent as a function of distance to subway stations", subtitle = "Census tracts",
       caption="Figure 4: Relationship between distance to subway stations and rent")  +
  plotTheme() +
  theme(legend.position="bottom",
                      strip.text.x = element_text(size=8))
RentPlot 

Relationship between theft, rent, and distance to CTA stations

First, we downloaded and wrangled crime data from Chicago Data Portal. We selected theft as the primary type and kept the data for 2009 and 2017. We then joined theft points to tracts and counted the number of theft per tract. Figure 5.1 shows the theft count by census tracts, with the red border denoting areas close to CTA stations. It provides a preliminary evidence that transit access and theft risk could be related, as the areas close to CTA stations overlay with plenty of tracts with high theft counts. However, some tracts that are not close to CTA stations also demonstrate high crime counts, especially in the South side of the city.

#theft data as crime type
theft <- 
  read.socrata("https://data.cityofchicago.org/resource/ijzp-q8t2.json?primary_type=THEFT")
theft.v1 <- 
  theft %>%
  filter(year%in%c('2009','2017')) %>%
  na.omit() %>%
  st_as_sf(coords = c("longitude","latitude"), crs=st_crs(4326), agr="constant") %>%
  st_transform("ESRI:102728")

#copy tract to save old tract data before joining theft counts
TractsTheft <- Tracts0917.group

#joining theft points to tracts
TheftTracts_joined <- st_join(TractsTheft, theft.v1, join = st_intersects)
TheftCount <- count(as_tibble(TheftTracts_joined), GEOID)
TheftCount.tract <- left_join(TheftCount, TractsTheft) %>%
  rename("TheftCount" = n) %>%
  st_as_sf()

#plot theft
theft.locations <- ggplot(TheftCount.tract[st_union(tracts09),]) +
  geom_sf(data = st_union(tracts09)) +
  geom_sf(aes(fill = q5(TheftCount))) + 
  geom_sf(data = buffer, fill = "transparent", color = "red")+ # buffer represents the TOD area
  scale_fill_manual(values = palette5,
                    labels = qBr(TheftCount.tract, "TheftCount"),
                    name = "Theft Count\n(Quintile Breaks)") +
  labs(title = "Theft Map, 2009 vs 2017", subtitle = "by tract",
       caption = "Figure 5.1: Theft map") +
  facet_wrap(~year) +
  mapTheme() + 
  theme(plot.title = element_text(size=22))
theft.locations

For further analysis, we first summarize the average median rent and average theft count by TOD and non-TOD tracts, as shown in Table 2 and visualized in Figure 5.2.

#theft summary
Theft.Summary <- # variable for the summary stats
  st_drop_geometry(TheftCount.tract) %>%
  group_by(year, TOD) %>% # show diff across year and TOD field type (TOD or no TOD)
  summarize(TheftCount = mean(TheftCount, na.rm = T), # na.rm remove missing values
            MedianRent = mean(MedianRent, na.rm = T))

#kable(Theft.Summary) %>% # generate table with kable
#  kable_styling() %>%
#  footnote(general_title = "\n",
#           general = "Theft Count and Median Rent for TOD and non-TOD")

Theft.Summary.plot <-
  Theft.Summary %>%
  unite(year.TOD, year, TOD, sep = ": ", remove = T) %>%
  gather(Variable, estimate, -year.TOD) %>% # note that "estimate" represent the value for each field
  mutate(estimate = round(estimate, 2)) %>% #"gather" converts data to long form using year.TOD as the grouping variable
  spread(year.TOD, estimate) %>%
  kable() %>%
  kable_styling() %>%
  footnote(general_title = "\n",
           general = "Table 2: Theft Count and Median Rent for TOD and non-TOD")
Theft.Summary.plot
Variable 2009: Non-TOD 2009: TOD 2017: Non-TOD 2017: TOD
MedianRent 728.38 824.70 859.81 1023.54
TheftCount 270.64 401.69 248.03 355.91

Table 2: Theft Count and Median Rent for TOD and non-TOD
TheftBar <-
  Theft.Summary %>%
  gather(Variable, estimate, -year, -TOD) %>% # move data into the long form using year and TOD as grouping variables
  ggplot(aes(year, estimate, fill = TOD)) + # year is the x axis, bar color filled by TOD
  geom_bar(stat = "identity", position = "dodge") + # tells that the y-axis estimate is provided, position the bars side-by-side
  facet_wrap(~Variable, scales = "free", ncol=5) +# small multiple plots across each variable, scales=free allows y-axis to vary with the scale of each variable
  scale_fill_manual(values = c("#bae4bc", "#0868ac")) +
  labs(title = "Median Rent and Theft Occurance 2009-2017, \nfor TOD and Non-TOD",
       caption = "Figure 5.2: Bar plot for theft count and median rent for TOD and non-TOD") +
  plotTheme() + theme(legend.position="bottom",
                      strip.text.x = element_text(size=8))
TheftBar

Then, we examined the relationship between distance to transit and theft occurrence as well as theft and rent, as the relationship between distance to transit and rent was already analyzed as shown in Figure 4.

According to Figure 5.3, the general relationship between distance to transit and theft occurrence is that, as distance increase, theft occurrence decrease. However, this trend is not a perfect fit for the data distributed, as it is obvious that the trend starts to inverse after roughly 4.2 miles. However, it is interesting that similar to rent and distance, this relationship is most significant for those within around 1-mile of CTA stations.

#line plot based on distance, rent, theft occurrence
#already has distance and rent
#RentPlot
#theft count and distance
TheftBuffer_joined <- st_join(TheftCount.tract, buffers, join = st_intersects)
#summarize average median rent by distance by year
TheftBuffer.Summary <- # variable for the summary stats
  st_drop_geometry(TheftBuffer_joined) %>%
  group_by(year, distance) %>% # show diff across year and TOD field type (TOD or no TOD)
  summarize(AverageTheftCount = mean(TheftCount, na.rm = T)) %>%
  na.omit() %>%
  mutate(miles=distance/5280)

TheftBufferPlot <-
  ggplot(data=TheftBuffer.Summary, aes(x=miles, y=AverageTheftCount, group=year, color=year)) +# year is the x axis, bar color filled by TOD
  geom_line(size=2) + 
  scale_fill_manual(values = c("#bae4bc", "#0868ac")) +
  labs(title = "Theft occurrence as a function of distance to subway stations", subtitle = "Census tracts",
       caption="Figure 5.3: Relationship between distance to subway stations and theft occurrence")  +
  plotTheme() +
  theme(legend.position="bottom",
                      strip.text.x = element_text(size=8))
TheftBufferPlot

As for theft occurrence and rent, Figure 5.4 indicates a strong positive association between them. However, the data are very skewed in this figure. After zooming in to most of the data points which are those tracts with theft occurrence smaller than 500, as Figure 5.5 shows, their association is actually very small, as the slope is almost flat.

#how about theft and rent?
TheftRentPlot <-
  ggplot(data=TheftBuffer_joined, aes(x=TheftCount, y=MedianRent.inf, group=year, color=year)) +# year is the x axis, bar color filled by TOD
  geom_point(size=1) + 
  geom_smooth(method=lm,se=F)+
  scale_fill_manual(values = c("#bae4bc", "#0868ac")) +
  labs(title = "Rent as a function of theft occurrences", subtitle = "Census tracts",
       caption="Figure 5.4: Relationship between theft and rent")  +
  plotTheme() +
  theme(legend.position="bottom",
                      strip.text.x = element_text(size=8))

TheftRentPlot_closer <-
  ggplot(data=TheftBuffer_joined, aes(x=TheftCount, y=MedianRent.inf, group=year, color=year)) +# year is the x axis, bar color filled by TOD
  geom_point(size=1) + 
  geom_smooth(method=lm,se=F)+
  scale_x_continuous(breaks = seq(0, 2000, by = 100)) +
  xlim(0,500) +
  scale_fill_manual(values = c("#bae4bc", "#0868ac")) +
  labs(title = "Rent as a function of theft occurrences", subtitle = "Census tracts",
       caption="Figure 5.5: Relationship between theft and rent, for theft count smaller than 500")  +
  plotTheme() +
  theme(legend.position="bottom",
                      strip.text.x = element_text(size=8))
TheftRentPlot

TheftRentPlot_closer

Our preliminary analysis suggests that theft occurrence and rent generally decreases as distance to transit increase, though exceptions exist. The central tendency indicator for both theft occurrence and rent between TOD and non-TOD is more obvious, as the median theft occurrence and median rent are much higher in TOD tracts than those in non-TOD tracts.

Conclusion

In conclusion, Chicagoans tend to be willing to pay more rents in order to live in transit-rich neighborhoods. Comparing results for 2009 and those for 2017, the population living in TOD tracts has increased even as the rent also has increased. For those younger, higher-income, inclined to not own vehicles, not objected to higher-density, they have been enjoying the convenience of living near CTA stations. For the relationship between theft, transit access and rents, it is evident that theft occurrence and median rent are higher for transit-rich neighborhoods. For tracts within ~1 mile radius of transit, theft occurrence and rent increase as distance to transit increases. Yet no significant relationship between rent and theft was found.

Some limitations to this analysis include: First, there are scale biases when joining ACS estimates to points and shapes. Second, the TOD area we use for 2009 and 2017 is the same, as the CTA data are only available for one given year. Third, CTA was developed long before 2009, so our analysis was not able to compare the data before and after CTA was constructed, which reduces the significance for the comparison across time.

Nonetheless, our analysis of Chicago’s TOD case provides evidence that land value should increase for the transit-rich neighborhood, and people are willing to live in such places.