New York City Subway

Matteo Rulli, FlairBit

3/10/2019

Credits

NY Subway Quick Facts

  • 36 rail lines
  • 472 stations, 470 are served 24 hours a day
  • Enter / Exit turnstile with transit data available in CSV files

The Goal

  • Show how to prepare and manipulate data with R and dplyr
  • Classify the turnstile data over the time into differrent transit patterns with DTW
  • Characterize station access and exit flows to plan shifts and improve service
  • Predict the access and exit flows

Turnstile data

station longitude latitude day hour datetime weekday ENTRIESn_hourly EXITSn_hourly
CYPRESS HILLS (R003) -73.87256 40.68995 1 0 2011-05-01 00:00:00 0 0 0
CYPRESS HILLS (R003) -73.87256 40.68995 1 4 2011-05-01 04:00:00 0 0 0
CYPRESS HILLS (R003) -73.87256 40.68995 1 12 2011-05-01 12:00:00 0 0 0
CYPRESS HILLS (R003) -73.87256 40.68995 1 16 2011-05-01 16:00:00 0 0 0
CYPRESS HILLS (R003) -73.87256 40.68995 1 20 2011-05-01 20:00:00 0 0 0
CYPRESS HILLS (R003) -73.87256 40.68995 2 0 2011-05-02 00:00:00 1 15 34
CYPRESS HILLS (R003) -73.87256 40.68995 2 4 2011-05-02 04:00:00 1 19 40
CYPRESS HILLS (R003) -73.87256 40.68995 2 8 2011-05-02 08:00:00 1 488 118
CYPRESS HILLS (R003) -73.87256 40.68995 2 12 2011-05-02 12:00:00 1 490 132
CYPRESS HILLS (R003) -73.87256 40.68995 2 16 2011-05-02 16:00:00 1 231 232
CYPRESS HILLS (R003) -73.87256 40.68995 2 20 2011-05-02 20:00:00 1 235 405
CYPRESS HILLS (R003) -73.87256 40.68995 3 0 2011-05-03 00:00:00 1 74 164
CYPRESS HILLS (R003) -73.87256 40.68995 3 4 2011-05-03 04:00:00 1 20 33
CYPRESS HILLS (R003) -73.87256 40.68995 3 12 2011-05-03 12:00:00 1 975 234
CYPRESS HILLS (R003) -73.87256 40.68995 3 16 2011-05-03 16:00:00 1 267 257
CYPRESS HILLS (R003) -73.87256 40.68995 3 20 2011-05-03 20:00:00 1 277 398

Dynamic time warping

Dynamic time warping

  • Create a local cost matrix (lcm), which has n × m dimensions. Such a matrix must be created for every pair of compared series;
  • DTW algorithm finds the path that minimizes the alignment between x and y by iteratively stepping through the LCM
  • Limit the computational cost defining lower bounds (see next slide)
  • Clustering: use DTW distance as dissimilarity measure
  • Prototyping: define a time-series that effectively summarizes the most important characteristics of all series in a given cluster

DTW Clustering

  • k-Shape clustering
  • Hierarchical
  • TADPole clustering (k-medoids/Partitioning Around Medoids)

Clustering: k-means and PAM

Clustering: Hierarchical

Similarity of two observations is based on the location on the vertical axis where branches containing those two observations first are fused. Dissimilarities measure uses the DTW distance/norm

DTW: lower bounds

The green dashed line represents the upper envelope, while the red dashed line represents the lower envelope.

Tidyverse

The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.

– https://www.tidyverse.org/

  • Tidying: tidyr
  • Transform and prepare: dplyr
  • Visualize: ggvis
  • Modelling: your choice, today is DTW

Original dataset

Load and transform with dyplr

data.df <- readRDS('../data/improved-data.rds')
  data.df <- tbl_df(data.df) %>%
    mutate(at = ymd_hms(datetime, tz = "UTC")) %>%
    mutate(month = as.integer(month(at))) %>%
    mutate(day = as.integer(day(at))) %>%
    mutate(station = paste(station, 
                        ' (', UNIT, ')', sep = '')) %>%
    select(station, 
           longitude, 
           latitude, 
           day, hour, day_week, weekday, 
           ENTRIESn_hourly, EXITSn_hourly)
  saveRDS(data.df, '../data/data.df.rds')

Daily aggragation

data.daily.df <- data.df %>% 
  group_by(station, latitude, longitude, day) %>%
  dplyr::summarise(
    ENTRIESn_hourly = sum(ENTRIESn_hourly), 
    EXITSn_hourly = sum(EXITSn_hourly)) %>%
  ungroup()
saveRDS(data.daily.df, '../data/data.daily.df.rds')

Classify the turnstile data

  • Select the clustering variable: ENTRIESn_hourly vs EXITSn_hourly
  • Select map bounds to perform clustering in a specific subset of the subway network
  • Select working days vs weekend days to perform clustering
  • Decide the number of clusters
# "ENTRIESn_hourly" to analyse entries, 
# "EXITSn_hourly" to analyse exits
variableClustering <- "EXITSn_hourly" 

# Geo-fence we use to filter access data
clusteringMap_bounds <- as.list(setNames(
  c(40.75505950577882, 
    -73.95103454589844, 
    40.69001034095325, 
    -74.04338836669922), 
  c("north", "east", "south", "west")))

# weekday = 1 means Week, 0 means weekend:
weekday = 1

# the number of classes we want to use to classify access patterns
numGroups = 3

Parameters struct

input <- structure (list(variableClustering, 
                         clusteringMap_bounds,
                         weekday, 
                         numGroups),
           names=c("variableClustering", 
                   "clusteringMap_bounds", 
                   "weekday", 
                   "numGroups"))

Further data arrangements and DTW

  • Filter data based on lat-lon limits
  • Re-arrange turnstile data and group stations according to daily profile
  • Produce a table with different access patterns (3 classes in this presentation)
# Apply input conditions as filters and select a time span:
f_data.filtered.df <- function(minDay, maxDay) {
  bounds <- input$clusteringMap_bounds  
  latRng <- range(bounds$north, bounds$south)
  lngRng <- range(bounds$east, bounds$west)
  data.df['count'] = data.df[input$variableClustering]
  if (missing(minDay) == F & missing(maxDay) == F) { 
    minDay <- minDay; 
    maxDay <- maxDay;
  } else { 
    minDay <- 0; 
    maxDay <- 31 
  }
  data.df %>%
    filter(latitude >= latRng[1] & 
           latitude <= latRng[2] & 
           longitude >= lngRng[1] & 
           longitude <= lngRng[2] ) %>%
    filter(day > minDay & day <= maxDay)
}

Groups of stations according to daily profile

  • Introduce the concept of melting and casting DF
  • Apply this concept to the dataset to create DTW timeseries
  • Apply DTW and create clusters

Melting and casting

Melting:

x = data.frame(
  id   = c(1, 2, 3, 4),
  blue = c(1, 0, 5, 0),
  red  = c(0, 3, 0, 7)
)
x
##   id blue red
## 1  1    1   0
## 2  2    0   3
## 3  3    5   0
## 4  4    0   7
melted <- melt(data = x, id.vars = "id", 
               measure.vars = c("blue", "red"))
melted
##   id variable value
## 1  1     blue     1
## 2  2     blue     0
## 3  3     blue     5
## 4  4     blue     0
## 5  1      red     0
## 6  2      red     3
## 7  3      red     0
## 8  4      red     7

Cast Functions Cast A Molten Data Frame Into An Array Or Data Frame:

dcast(melted, id  ~ variable, sum)
##   id blue red
## 1  1    1   0
## 2  2    0   3
## 3  3    5   0
## 4  4    0   7

Applying casting to our station records:

Before:

head(f_data.filtered.df() %>%
      filter(weekday == as.numeric(input$weekday)) %>%
      group_by(station, hour) %>%
      dplyr::summarise(count = mean(count)) %>%
      ungroup())
## # A tibble: 6 x 3
##   station       hour count
##   <chr>        <int> <dbl>
## 1 1 AVE (R248)     0 3275.
## 2 1 AVE (R248)     4  474.
## 3 1 AVE (R248)     8 1484.
## 4 1 AVE (R248)    12 6186.
## 5 1 AVE (R248)    16 3575.
## 6 1 AVE (R248)    20 5472.

After casting:

## # A tibble: 6 x 51
##   `1 AVE (R248)` `14 ST (R105)` `18 ST (R321)` `2 AVE (R300)`
##            <dbl>          <dbl>          <dbl>          <dbl>
## 1          3275.          1329.           415.          2343.
## 2           474.           288.           110            263.
## 3          1484.           786.           235.           504.
## 4          6186.          3637.          1735.          3008.
## 5          3575.          2549.           978.          2346.
## 6          5472.          3549.          1240.          4749.
## # ... with 47 more variables: `23 ST (R111)` <dbl>, `23 ST-5 AVE
## #   (R083)` <dbl>, `23 ST-6 AVE (R203)` <dbl>, `23 ST-6 AVE (R453)` <dbl>,
## #   `28 ST-BROADWAY (R082)` <dbl>, `3 AVE (R330)` <dbl>, `34 ST-HERALD SQ
## #   (R022)` <dbl>, `34 ST-HERALD SQ (R023)` <dbl>, `34 ST-PENN STA
## #   (R012)` <dbl>, `34 ST-PENN STA (R013)` <dbl>, `34 ST-PENN STA
## #   (R031)` <dbl>, `42 ST-BRYANT PK (R021)` <dbl>, `42 ST-GRD CNTRL
## #   (R046)` <dbl>, `5 AVE-BRYANT PK (R054)` <dbl>, `6 AVE (R163)` <dbl>,
## #   `8 ST-B'WAY NYU (R085)` <dbl>, `BEDFORD AVE (R235)` <dbl>, `BLEECKER
## #   ST (R194)` <dbl>, `BOROUGH HALL/CT (R108)` <dbl>, `BOWERY
## #   (R311)` <dbl>, `BOWLING GREEN (R041)` <dbl>, `BOWLING GREEN
## #   (R042)` <dbl>, `CANAL ST (R139)` <dbl>, `CHAMBERS ST (R030)` <dbl>,
## #   `CHAMBERS ST (R044)` <dbl>, `CHRISTOPHER ST (R189)` <dbl>, `CLARK ST
## #   (R224)` <dbl>, `DEKALB AVE (R099)` <dbl>, `EAST BROADWAY
## #   (R257)` <dbl>, `GRAND ST (R240)` <dbl>, `GREENPOINT AVE (R239)` <dbl>,
## #   `HIGH ST (R252)` <dbl>, `HOYT ST (R225)` <dbl>, `HOYT ST
## #   (R456)` <dbl>, `JAY ST-METROTEC (R089)` <dbl>, `JAY ST-METROTEC
## #   (R127)` <dbl>, `MURRAY ST-B'WAY (R087)` <dbl>, `NASSAU AV
## #   (R256)` <dbl>, `PRINCE ST-B'WAY (R086)` <dbl>, `RECTOR ST
## #   (R227)` <dbl>, `RECTOR ST (R304)` <dbl>, `SPRING ST (R282)` <dbl>,
## #   `SPRING ST (R322)` <dbl>, `VERNON/JACKSON (R276)` <dbl>, `WALL ST
## #   (R027)` <dbl>, `WALL ST (R043)` <dbl>, `WORLD TRADE CTR (R029)` <dbl>

And now transpose the matrix:

head(t(data.tsclust.df))
##                         [,1]     [,2]      [,3]     [,4]     [,5]     [,6]
## 1 AVE (R248)       3275.2273 474.5000 1484.3182 6186.045 3575.091 5471.636
## 14 ST (R105)       1328.8636 288.4545  785.8182 3637.182 2549.273 3549.045
## 18 ST (R321)        414.6818 110.0000  234.7273 1734.545  978.500 1240.318
## 2 AVE (R300)       2343.1818 263.0909  503.5909 3008.318 2346.318 4748.545
## 23 ST (R111)       1457.3636 420.9091 1064.3182 4705.364 2652.727 3778.864
## 23 ST-5 AVE (R083)  652.4545 180.1364  707.3636 5787.955 2435.136 2598.318

DTW Classification

f_groupsProfiles <- function() {
    data.hourly.avg.df <- f_data.filtered.df() %>%
      filter(weekday == as.numeric(input$weekday)) %>%
      group_by(station, hour) %>%
      dplyr::summarise(count = mean(count)) %>%
      ungroup()
    data.tsclust.df <- tbl_df(dcast(
      data.hourly.avg.df, hour ~ station)) %>% select(-hour)
    hc.alt <- tsclust(
      series = t(data.tsclust.df), 
      type = 'h', distance = 'dtw', 
      preproc = zscore)
    nGroups <- input$numGroups
    groups.alt <- cutree(tree = hc.alt, k = nGroups)
}

lut__groups <- f_groupsProfiles()

lut__groups now contains stations groups:

x
1 AVE (R248) 1
14 ST (R105) 1
18 ST (R321) 2
2 AVE (R300) 1
23 ST (R111) 2
23 ST-5 AVE (R083) 2
23 ST-6 AVE (R203) 2
23 ST-6 AVE (R453) 2
28 ST-BROADWAY (R082) 2
3 AVE (R330) 2
34 ST-HERALD SQ (R022) 2
34 ST-HERALD SQ (R023) 2
34 ST-PENN STA (R012) 1
34 ST-PENN STA (R013) 1
34 ST-PENN STA (R031) 1
42 ST-BRYANT PK (R021) 2
42 ST-GRD CNTRL (R046) 2
5 AVE-BRYANT PK (R054) 2
6 AVE (R163) 2
8 ST-B’WAY NYU (R085) 1
BEDFORD AVE (R235) 3
BLEECKER ST (R194) 1
BOROUGH HALL/CT (R108) 2
BOWERY (R311) 2
BOWLING GREEN (R041) 2
BOWLING GREEN (R042) 2
CANAL ST (R139) 2
CHAMBERS ST (R030) 2
CHAMBERS ST (R044) 2
CHRISTOPHER ST (R189) 1
CLARK ST (R224) 1
DEKALB AVE (R099) 2
EAST BROADWAY (R257) 2
GRAND ST (R240) 2
GREENPOINT AVE (R239) 1
HIGH ST (R252) 1
HOYT ST (R225) 1
HOYT ST (R456) 2
JAY ST-METROTEC (R089) 2
JAY ST-METROTEC (R127) 2
MURRAY ST-B’WAY (R087) 2
NASSAU AV (R256) 1
PRINCE ST-B’WAY (R086) 1
RECTOR ST (R227) 2
RECTOR ST (R304) 2
SPRING ST (R282) 2
SPRING ST (R322) 1
VERNON/JACKSON (R276) 1
WALL ST (R027) 2
WALL ST (R043) 2
WORLD TRADE CTR (R029) 2

Technical data manipulations (1 of 3)

Now we merge the lut__groups with the daily average profiles for each corresp. station:

# Daily average profiles for each station
data.avg.profiles.df <- f_data.filtered.df() %>%
  filter(weekday == as.numeric(input$weekday)) %>%
  group_by(station, hour) %>%
  dplyr::summarise(count = mean(count)) %>%
  ungroup()

# Merge classes dataframe (groups) with the average access 
# DF by station name:
tableData <- merge(
  as.data.frame(lut__groups), 
  data.avg.profiles.df, 
  by.x=0, by.y="station") 
colnames(tableData) <- c("stations", "groups", "hour", "count")
  
# put the group id in front of the DF 
# ([-2] means except column number 2):
tableDataSortedCols <- tableData[, c(2, (1:ncol(tableData))[-2])]
groups stations hour count
1 1 AVE (R248) 0 3275.22727
1 1 AVE (R248) 4 474.50000
1 1 AVE (R248) 8 1484.31818
1 1 AVE (R248) 12 6186.04545
1 1 AVE (R248) 16 3575.09091
1 1 AVE (R248) 20 5471.63636
1 14 ST (R105) 0 1328.86364
1 14 ST (R105) 4 288.45455
1 14 ST (R105) 8 785.81818
1 14 ST (R105) 12 3637.18182
1 14 ST (R105) 16 2549.27273
1 14 ST (R105) 20 3549.04545
2 18 ST (R321) 0 414.68182
2 18 ST (R321) 4 110.00000
2 18 ST (R321) 8 234.72727
2 18 ST (R321) 12 1734.54545
2 18 ST (R321) 16 978.50000
2 18 ST (R321) 20 1240.31818
1 2 AVE (R300) 0 2343.18182
1 2 AVE (R300) 4 263.09091
1 2 AVE (R300) 8 503.59091
1 2 AVE (R300) 12 3008.31818
1 2 AVE (R300) 16 2346.31818
1 2 AVE (R300) 20 4748.54545
2 23 ST (R111) 0 1457.36364
2 23 ST (R111) 4 420.90909
2 23 ST (R111) 8 1064.31818
2 23 ST (R111) 12 4705.36364
2 23 ST (R111) 16 2652.72727
2 23 ST (R111) 20 3778.86364
2 23 ST-5 AVE (R083) 0 652.45455
2 23 ST-5 AVE (R083) 4 180.13636
2 23 ST-5 AVE (R083) 8 707.36364
2 23 ST-5 AVE (R083) 12 5787.95455
2 23 ST-5 AVE (R083) 16 2435.13636
2 23 ST-5 AVE (R083) 20 2598.31818
2 23 ST-6 AVE (R203) 0 577.31818
2 23 ST-6 AVE (R203) 4 39.59091
2 23 ST-6 AVE (R203) 8 505.77778
2 23 ST-6 AVE (R203) 12 5178.23810
2 23 ST-6 AVE (R203) 16 2170.38095
2 23 ST-6 AVE (R203) 20 2334.27273
2 23 ST-6 AVE (R453) 0 1071.61905
2 23 ST-6 AVE (R453) 4 76.42857
2 23 ST-6 AVE (R453) 8 850.92857
2 23 ST-6 AVE (R453) 12 5289.36842
2 23 ST-6 AVE (R453) 16 2975.84211
2 23 ST-6 AVE (R453) 20 4093.10526
2 28 ST-BROADWAY (R082) 0 743.95455
2 28 ST-BROADWAY (R082) 4 80.00000
2 28 ST-BROADWAY (R082) 8 446.42857
2 28 ST-BROADWAY (R082) 12 5491.81818
2 28 ST-BROADWAY (R082) 16 2951.00000
2 28 ST-BROADWAY (R082) 20 3008.45455
2 3 AVE (R330) 0 865.90909
2 3 AVE (R330) 4 208.72727
2 3 AVE (R330) 8 401.05556
2 3 AVE (R330) 12 2421.90909
2 3 AVE (R330) 16 1273.28571
2 3 AVE (R330) 20 1569.57143
2 34 ST-HERALD SQ (R022) 0 4222.72727
2 34 ST-HERALD SQ (R022) 4 1255.68182
2 34 ST-HERALD SQ (R022) 8 3327.04545
2 34 ST-HERALD SQ (R022) 12 18711.77273
2 34 ST-HERALD SQ (R022) 16 12092.86364
2 34 ST-HERALD SQ (R022) 20 13158.09091
2 34 ST-HERALD SQ (R023) 0 1637.27273
2 34 ST-HERALD SQ (R023) 4 155.36364
2 34 ST-HERALD SQ (R023) 8 1275.54545
2 34 ST-HERALD SQ (R023) 12 6813.90909
2 34 ST-HERALD SQ (R023) 16 3696.36364
2 34 ST-HERALD SQ (R023) 20 5495.18182
1 34 ST-PENN STA (R012) 0 3545.13636
1 34 ST-PENN STA (R012) 4 697.50000
1 34 ST-PENN STA (R012) 8 2498.63636
1 34 ST-PENN STA (R012) 12 10362.04545
1 34 ST-PENN STA (R012) 16 7734.18182
1 34 ST-PENN STA (R012) 20 14571.31818
1 34 ST-PENN STA (R013) 0 1663.86364
1 34 ST-PENN STA (R013) 4 126.95455
1 34 ST-PENN STA (R013) 8 1372.90909
1 34 ST-PENN STA (R013) 12 7873.86364
1 34 ST-PENN STA (R013) 16 4891.90909
1 34 ST-PENN STA (R013) 20 9607.40909
1 34 ST-PENN STA (R031) 0 1908.22727
1 34 ST-PENN STA (R031) 4 251.00000
1 34 ST-PENN STA (R031) 8 1083.13636
1 34 ST-PENN STA (R031) 12 5875.50000
1 34 ST-PENN STA (R031) 16 5693.72727
1 34 ST-PENN STA (R031) 20 10581.45455
2 42 ST-BRYANT PK (R021) 0 1338.81818
2 42 ST-BRYANT PK (R021) 4 359.50000
2 42 ST-BRYANT PK (R021) 8 2506.52381
2 42 ST-BRYANT PK (R021) 12 16189.40909
2 42 ST-BRYANT PK (R021) 16 5079.13636
2 42 ST-BRYANT PK (R021) 20 4477.63636
2 42 ST-GRD CNTRL (R046) 0 2473.77273
2 42 ST-GRD CNTRL (R046) 4 635.22727
2 42 ST-GRD CNTRL (R046) 8 3813.59091
2 42 ST-GRD CNTRL (R046) 12 14510.50000
2 42 ST-GRD CNTRL (R046) 16 6837.68182
2 42 ST-GRD CNTRL (R046) 20 9843.72727
2 5 AVE-BRYANT PK (R054) 0 624.68182
2 5 AVE-BRYANT PK (R054) 4 80.36364
2 5 AVE-BRYANT PK (R054) 8 589.00000
2 5 AVE-BRYANT PK (R054) 12 5662.45455
2 5 AVE-BRYANT PK (R054) 16 1915.13636
2 5 AVE-BRYANT PK (R054) 20 1445.90909
2 6 AVE (R163) 0 1798.90909
2 6 AVE (R163) 4 821.31818
2 6 AVE (R163) 8 1620.95455
2 6 AVE (R163) 12 7532.36364
2 6 AVE (R163) 16 4721.81818
2 6 AVE (R163) 20 5858.13636
1 8 ST-B’WAY NYU (R085) 0 1242.54545
1 8 ST-B’WAY NYU (R085) 4 260.57143
1 8 ST-B’WAY NYU (R085) 8 472.33333
1 8 ST-B’WAY NYU (R085) 12 3430.77273
1 8 ST-B’WAY NYU (R085) 16 2707.95455
1 8 ST-B’WAY NYU (R085) 20 4251.63636
3 BEDFORD AVE (R235) 0 2527.40909
3 BEDFORD AVE (R235) 4 628.54545
3 BEDFORD AVE (R235) 8 220.57143
3 BEDFORD AVE (R235) 12 1202.54545
3 BEDFORD AVE (R235) 16 1403.04545
3 BEDFORD AVE (R235) 20 3760.95455
1 BLEECKER ST (R194) 0 1002.86364
1 BLEECKER ST (R194) 4 313.45455
1 BLEECKER ST (R194) 8 367.25000
1 BLEECKER ST (R194) 12 2345.86364
1 BLEECKER ST (R194) 16 1987.85714
1 BLEECKER ST (R194) 20 2599.04762
2 BOROUGH HALL/CT (R108) 0 1923.81818
2 BOROUGH HALL/CT (R108) 4 295.31818
2 BOROUGH HALL/CT (R108) 8 2149.09091
2 BOROUGH HALL/CT (R108) 12 9748.09091
2 BOROUGH HALL/CT (R108) 16 5571.09091
2 BOROUGH HALL/CT (R108) 20 6674.68182
2 BOWERY (R311) 0 278.77273
2 BOWERY (R311) 4 67.81818
2 BOWERY (R311) 8 134.47368
2 BOWERY (R311) 12 1117.09091
2 BOWERY (R311) 16 610.57143
2 BOWERY (R311) 20 598.95238
2 BOWLING GREEN (R041) 0 608.50000
2 BOWLING GREEN (R041) 4 152.54545
2 BOWLING GREEN (R041) 8 2040.90909
2 BOWLING GREEN (R041) 12 8626.18182
2 BOWLING GREEN (R041) 16 2425.22727
2 BOWLING GREEN (R041) 20 1828.90909
2 BOWLING GREEN (R042) 0 707.86364
2 BOWLING GREEN (R042) 4 368.22727
2 BOWLING GREEN (R042) 8 378.38889
2 BOWLING GREEN (R042) 12 4218.36364
2 BOWLING GREEN (R042) 16 1893.22727
2 BOWLING GREEN (R042) 20 2303.13636
2 CANAL ST (R139) 0 515.95455
2 CANAL ST (R139) 4 182.00000
2 CANAL ST (R139) 8 1110.36364
2 CANAL ST (R139) 12 5284.54545
2 CANAL ST (R139) 16 2238.04545
2 CANAL ST (R139) 20 1884.54545
2 CHAMBERS ST (R030) 0 1344.27273
2 CHAMBERS ST (R030) 4 251.36364
2 CHAMBERS ST (R030) 8 2001.09091
2 CHAMBERS ST (R030) 12 8113.95455
2 CHAMBERS ST (R030) 16 4217.95455
2 CHAMBERS ST (R030) 20 4481.45455
2 CHAMBERS ST (R044) 0 1507.40909
2 CHAMBERS ST (R044) 4 451.45455
2 CHAMBERS ST (R044) 8 2389.13636
2 CHAMBERS ST (R044) 12 13143.18182
2 CHAMBERS ST (R044) 16 5549.50000
2 CHAMBERS ST (R044) 20 3845.72727
1 CHRISTOPHER ST (R189) 0 1003.90909
1 CHRISTOPHER ST (R189) 4 176.63636
1 CHRISTOPHER ST (R189) 8 197.05556
1 CHRISTOPHER ST (R189) 12 1384.36364
1 CHRISTOPHER ST (R189) 16 1470.77273
1 CHRISTOPHER ST (R189) 20 2638.27273
1 CLARK ST (R224) 0 924.81818
1 CLARK ST (R224) 4 97.00000
1 CLARK ST (R224) 8 132.88235
1 CLARK ST (R224) 12 1275.40909
1 CLARK ST (R224) 16 987.45455
1 CLARK ST (R224) 20 2119.59091
2 DEKALB AVE (R099) 0 1245.45455
2 DEKALB AVE (R099) 4 270.22727
2 DEKALB AVE (R099) 8 685.09091
2 DEKALB AVE (R099) 12 4648.86364
2 DEKALB AVE (R099) 16 1706.95455
2 DEKALB AVE (R099) 20 2602.27273
2 EAST BROADWAY (R257) 0 1064.72727
2 EAST BROADWAY (R257) 4 290.18182
2 EAST BROADWAY (R257) 8 368.31818
2 EAST BROADWAY (R257) 12 2342.36364
2 EAST BROADWAY (R257) 16 1533.18182
2 EAST BROADWAY (R257) 20 1923.36364
2 GRAND ST (R240) 0 1170.72727
2 GRAND ST (R240) 4 371.50000
2 GRAND ST (R240) 8 835.60000
2 GRAND ST (R240) 12 5995.72727
2 GRAND ST (R240) 16 3036.54545
2 GRAND ST (R240) 20 2715.90909
1 GREENPOINT AVE (R239) 0 422.77273
1 GREENPOINT AVE (R239) 4 124.59091
1 GREENPOINT AVE (R239) 8 96.04545
1 GREENPOINT AVE (R239) 12 621.45455
1 GREENPOINT AVE (R239) 16 548.59091
1 GREENPOINT AVE (R239) 20 1094.72727
1 HIGH ST (R252) 0 1155.95455
1 HIGH ST (R252) 4 116.68182
1 HIGH ST (R252) 8 210.52381
1 HIGH ST (R252) 12 2437.63636
1 HIGH ST (R252) 16 1379.72727
1 HIGH ST (R252) 20 2382.31818
1 HOYT ST (R225) 0 208.76190
1 HOYT ST (R225) 4 32.50000
1 HOYT ST (R225) 8 99.36842
1 HOYT ST (R225) 12 738.50000
1 HOYT ST (R225) 16 524.40909
1 HOYT ST (R225) 20 739.40909
2 HOYT ST (R456) 0 229.77273
2 HOYT ST (R456) 4 40.45455
2 HOYT ST (R456) 8 321.31579
2 HOYT ST (R456) 12 2220.90909
2 HOYT ST (R456) 16 953.04545
2 HOYT ST (R456) 20 669.81818
2 JAY ST-METROTEC (R089) 0 179.63636
2 JAY ST-METROTEC (R089) 4 29.00000
2 JAY ST-METROTEC (R089) 8 113.09091
2 JAY ST-METROTEC (R089) 12 1350.09091
2 JAY ST-METROTEC (R089) 16 565.18182
2 JAY ST-METROTEC (R089) 20 438.50000
2 JAY ST-METROTEC (R127) 0 1532.59091
2 JAY ST-METROTEC (R127) 4 336.31818
2 JAY ST-METROTEC (R127) 8 2086.36364
2 JAY ST-METROTEC (R127) 12 9700.59091
2 JAY ST-METROTEC (R127) 16 4278.40909
2 JAY ST-METROTEC (R127) 20 4401.27273
2 MURRAY ST-B’WAY (R087) 0 205.68182
2 MURRAY ST-B’WAY (R087) 4 118.77273
2 MURRAY ST-B’WAY (R087) 8 346.23810
2 MURRAY ST-B’WAY (R087) 12 2255.68182
2 MURRAY ST-B’WAY (R087) 16 935.86364
2 MURRAY ST-B’WAY (R087) 20 871.13636
1 NASSAU AV (R256) 0 747.50000
1 NASSAU AV (R256) 4 238.50000
1 NASSAU AV (R256) 8 275.09524
1 NASSAU AV (R256) 12 1155.63636
1 NASSAU AV (R256) 16 990.77273
1 NASSAU AV (R256) 20 1922.18182
1 PRINCE ST-B’WAY (R086) 0 1416.72727
1 PRINCE ST-B’WAY (R086) 4 240.40909
1 PRINCE ST-B’WAY (R086) 8 507.40909
1 PRINCE ST-B’WAY (R086) 12 4965.95455
1 PRINCE ST-B’WAY (R086) 16 4571.13636
1 PRINCE ST-B’WAY (R086) 20 4956.86364
2 RECTOR ST (R227) 0 206.59091
2 RECTOR ST (R227) 4 41.40909
2 RECTOR ST (R227) 8 418.40909
2 RECTOR ST (R227) 12 2052.68182
2 RECTOR ST (R227) 16 614.77273
2 RECTOR ST (R227) 20 600.27273
2 RECTOR ST (R304) 0 262.22727
2 RECTOR ST (R304) 4 37.09091
2 RECTOR ST (R304) 8 223.14286
2 RECTOR ST (R304) 12 1444.09091
2 RECTOR ST (R304) 16 747.86364
2 RECTOR ST (R304) 20 810.22727
2 SPRING ST (R282) 0 210.50000
2 SPRING ST (R282) 4 49.04545
2 SPRING ST (R282) 8 104.14286
2 SPRING ST (R282) 12 1391.36364
2 SPRING ST (R282) 16 675.95455
2 SPRING ST (R282) 20 734.00000
1 SPRING ST (R322) 0 1045.86364
1 SPRING ST (R322) 4 366.00000
1 SPRING ST (R322) 8 423.55000
1 SPRING ST (R322) 12 4155.09091
1 SPRING ST (R322) 16 2990.66667
1 SPRING ST (R322) 20 3638.71429
1 VERNON/JACKSON (R276) 0 1193.27273
1 VERNON/JACKSON (R276) 4 236.27273
1 VERNON/JACKSON (R276) 8 502.81818
1 VERNON/JACKSON (R276) 12 1349.86364
1 VERNON/JACKSON (R276) 16 1176.81818
1 VERNON/JACKSON (R276) 20 2701.59091
2 WALL ST (R027) 0 971.86364
2 WALL ST (R027) 4 147.90909
2 WALL ST (R027) 8 3019.63636
2 WALL ST (R027) 12 12496.04545
2 WALL ST (R027) 16 2842.09091
2 WALL ST (R027) 20 2765.72727
2 WALL ST (R043) 0 556.86364
2 WALL ST (R043) 4 60.09091
2 WALL ST (R043) 8 2654.72727
2 WALL ST (R043) 12 9535.09091
2 WALL ST (R043) 16 3035.40909
2 WALL ST (R043) 20 2450.50000
2 WORLD TRADE CTR (R029) 0 1420.27273
2 WORLD TRADE CTR (R029) 4 534.50000
2 WORLD TRADE CTR (R029) 8 4873.63636
2 WORLD TRADE CTR (R029) 12 12770.27273
2 WORLD TRADE CTR (R029) 16 5092.86364
2 WORLD TRADE CTR (R029) 20 5312.36364

Technical data manipulations (2 of 3)

f_collapse_rows_df <- function(df, gr_var) {
  group_var <- enquo(gr_var) # Make use of tidyeval (see appendix)
  df %>%
    group_by(!! group_var) %>% # bang-bang op. evaluates the enquosure
    mutate(groupRow = 1:n()) %>%
    ungroup() %>%
    # `:=` op. adds or updates columns by reference:
    mutate(!! quo_name(group_var) := # This means the name of the variable, i.e. `groups`
             ifelse(groupRow == 1, as.character(!! group_var), "")) %>%
    select(-c(groupRow))
}
intermediateDF <- tableDataSortedCols %>%
  group_by(groups, stations) %>%
  summarise(
    trend = 
      as.character(
        htmltools::as.tags(
          sparkline::sparkline(factor(c(count))))
      )
    ) %>%
  group_by(groups) %>%
  select(groups, everything()) %>%
  distinct %>%
  # in pipes implicit 1st parameter defaults to `.`:
  f_collapse_rows_df(groups) 

Technical data manipulations (3 of 3)

Compute the groups prototypes:

mgavg <- function(i) {
  data.avg.profiles.df %>%
       filter(station %in% names(lut__groups[lut__groups == i])) %>%
       group_by(station) %>%
       mutate(count = rescale(count, to = c(0, 1000))) %>%
       ungroup() %>%
       group_by(hour) %>%
       dplyr::summarise(count = mean(count)) %>%
       ungroup()
}

Visualization! (Tools)

We’ll use sparkline lib and htmlwidgets to represent groups prototypes.

A sparkline is a very small line chart, typically drawn without axes or coordinates. It presents the general shape of the variation (typically over time) in some measurement, such as temperature or stock market price, in a simple and highly condensed way.

– Wikipedia

Visualization! (Code)

doSpark <- function(g) {
  lapply(g, function(i) {
    as.character(
      htmltools::as.tags(
        sparkline::sparkline(mgavg(i)$count, type = 'line')
      )
    )
  })
}

# add sparkline to cells:
intermediateDF$groups[nchar(intermediateDF$groups) > 0] <-   
  paste(intermediateDF$groups[nchar(intermediateDF$groups) > 0], 
        doSpark(intermediateDF$groups[nchar(intermediateDF$groups) > 0]), sep = " ")

tmpFormatTab <- intermediateDF %>%
  formattable() %>%
  formattable::as.htmlwidget()

tmpFormatTab$dependencies <- c(
  tmpFormatTab$dependencies,
  htmlwidgets:::widget_dependencies("sparkline", "sparkline")
)

Visualization! (Prize)

Shinyapp Demo

Main References