3 Results

3.1 Part0: Background: Inspection Scores/Grades

Code

# inspection score statistics
count <- length(data$SCORE)
mean <- mean(data$SCORE, na.rm = TRUE)
std_dev <- sd(data$SCORE, na.rm = TRUE)
min_val <- min(data$SCORE, na.rm = TRUE)
max_val <- max(data$SCORE, na.rm = TRUE)

# Displaying the statistics of inspection score
cat("Count:", count, "\nMean:", mean, "\nStandard Deviation:", std_dev, "\nMinimum:", min_val, "\nMaximum:", max_val, "\n")

Count: 209461 
Mean: 22.82137 
Standard Deviation: 17.54135 
Minimum: 0 
Maximum: 168

Code

data %>%
  drop_na(SCORE) %>%
  ggplot(aes(x = SCORE)) +
    geom_histogram(color = "#80593D", fill = "#9FC29F", alpha = .5, bins = 20, boundary=0) +
    labs(title = "NYC restaurants Inspection Scores Distribution",
     x = "Inspection Scores") +
     theme_grey(13)

Most of the inspection scores are located around mean, which is around 23, and the distribution of the inspection scores is right-skewed. Note that low inspection scores mean good grade, and high inspection scores mean bad grade.

Code

data %>%
  drop_na(GRADE) %>%
  filter(GRADE %in% c("A", "B", "C")) %>%
  ggplot(aes(x = GRADE)) +
    geom_bar(fill="#9FC29F") +
    labs(title = "NYC restaurants Inspection Grades Distribution",
     x = "Inspection Grades") +
    theme_grey(13)

Note that there are around 50% missing data in variable GRADE. Among restaurants which have GRADE data, most of the restaurants got grade A.

Code

data %>%
  drop_na(SCORE, GRADE) %>%
  filter(GRADE %in% c("A", "B", "C")) %>%
  ggplot(aes(x = SCORE, y = fct_reorder(GRADE, SCORE, median))) +
    geom_boxplot(fill="#9FC29F") +
    labs(title="Inspection Scores by Grades",
         x="Inspection Scores",
         y="Grades") +
    theme_grey(13)

The visualization matches the expected corresponding grades for scores (A for 0-13, B for 14-27, C for 28+), and there are some unknown grades such as N, P, Z, which are dropped in the visualization. Note that grade C has outliers of low inspection scores, which are expected to get A or better grades.

3.2 Part1: Inspection results by Locations of Restaurants

Code

data %>%
  drop_na(SCORE, BORO) %>%
  ggplot(aes(x = SCORE, y = reorder(BORO,SCORE,median))) +
    geom_boxplot(fill="#9FC29F") +
    labs(title="Inspection Scores by BORO",
         x="Inspection Scores",
         y="BORO") +
    theme_grey(13)

Code

test1 <- chisq.test(data$SCORE, data$BORO)
print(test1)


    Pearson's Chi-squared test

data:  data$SCORE and data$BORO
X-squared = 4509.1, df = 516, p-value < 2.2e-16

All the boroughs have very similar average inspection scores and distribution. Note that Brooklyn, Manhattan and Queens have some outliers with the highest one in Manhattan. It is hard to tell if there is an association between boroughs and inspection scores by the visualization. However, according to the chi-square test, there is strong evidence, which is very small p-value, that there is association between inspection scores and boroughs.

Code

subset1 <- data %>%
  filter(GRADE %in% c("A", "B", "C"))

vcd::mosaic(GRADE ~ BORO, subset1, direction = c("v", "h"), highlighting_fill = c("#35E445", "#1B3BF2", "#F2281B"))

Code

test2 <- chisq.test(data$GRADE, data$BORO)
print(test2)


    Pearson's Chi-squared test

data:  data$GRADE and data$BORO
X-squared = 392.16, df = 20, p-value < 2.2e-16

Staten Island has the highest proportion of grade A and lowest proportion of C, which seems the best result among all the boroughs. But the majority of the inspections grades are A in all the boroughs, and all the boroughs have similar proportion of grades according to the mosaic plot. It is hard to tell if there is an association between grades and boroughs by the mosaic plot. However, according to the chi-square test, there is strong evidence, which is very small p-value, that there is association between inspection grades and boroughs.

3.2.1 Average Inspection Scores by Districts

Code

data$Year = as.Date(data$`GRADE DATE`, format = "%m/%d/%Y")
data$Year <- format(data$Year, format = "%Y")
data$Year <- as.numeric(data$Year)
data1 <- data %>% 
  filter(!is.na(Year) & !is.na(SCORE) & !is.na(BORO) & Year>2015 & !is.na(`Council District`) & !is.na(DBA) & !is.na(Latitude) & !is.na(Longitude) & !is.na(SCORE) & `Longitude` != 0 & `Latitude` != 0)


avg_scores <- data1 %>%
  group_by(`Council District`) %>%
  summarize(AvgScore = mean(SCORE))
avg_scores$`Council District` <- as.character(as.numeric(avg_scores$`Council District`))
# Read the geojson for NYC, make sure it includes council districts
nyc_districts <- st_read("NYC_City_Council_Districts.geojson", quiet = TRUE)

# Join the data with the spatial data on council districts
nyc_districts <- left_join(nyc_districts, avg_scores, by = c("coun_dist" = "Council District"))

# Plotting the map
ggplot(data = nyc_districts) +
  geom_sf(aes(fill = AvgScore)) +
  scale_color_viridis_c(trans = "reverse") +
  theme_minimal() +
  labs(title = "NYC Council Districts Map with AvgScore")

The lighter shades on the choropleth map across the districts of Queens suggest lower average scores compared to other boroughs, indicating that the sanitation standards in these areas may require attention and improvement.

3.2.2 Average Inspection Scores by Boroughs in every year

Code

avgscore_bar <- data1 %>% 
  filter(!is.na(Year) & !is.na(SCORE) & !is.na(BORO) & Year>2015) %>% 
  group_by(BORO, Year)

avgscore_bar <- avgscore_bar %>%
  summarize(Avg_Score = round(mean(SCORE, na.rm = TRUE),0))

ggplot(avgscore_bar, aes(fill = BORO, y = Avg_Score, x = Year)) + 
  geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
  theme_minimal() +
  geom_text(aes(label = Avg_Score), vjust = -0.3, position = position_dodge(width = 0.9), size = 2 ) +
 
  labs(title = "Boroughs - Average Score vs. Year", x = "Years", y = "Average Score") +
  scale_x_continuous(breaks = 2016:2023, labels = 2016:2023) +
  scale_fill_brewer(palette = "RdYlBu") + 

  theme(plot.title = element_text(hjust = 0.5))

Across all boroughs, it appears to be a trend of fluctuating scores from year to year, with no clear pattern of consistent improvement or decline. Queens has a noticeable score of 18 in 2023, suggesting room for improvement recently.

4 Part2: Inspection results by Cuisines

Code

data <- data %>%
  rename(CUISINE = `CUISINE DESCRIPTION`)

data %>%
  drop_na(CUISINE) %>%
  group_by(CUISINE) %>%
  dplyr::summarise(
    cuisine_count = n()
  ) %>%
  arrange(desc(cuisine_count)) %>%
  head(10)

# A tibble: 10 × 2
   CUISINE                  cuisine_count
   <chr>                            <int>
 1 American                         34492
 2 Chinese                          20389
 3 Coffee/Tea                       13956
 4 Pizza                            12773
 5 Latin American                    8312
 6 Mexican                           8123
 7 Bakery Products/Desserts          7978
 8 Caribbean                         7600
 9 Japanese                          7087
10 Italian                           7026

Code

top_cuisines <- data %>%
  drop_na(CUISINE) %>%
  group_by(CUISINE) %>%
  dplyr::summarise(
    cuisine_count = n()
  ) %>%
  arrange(desc(cuisine_count)) %>%
  head(20) 

data %>%
  drop_na(CUISINE, SCORE) %>%
  filter(CUISINE %in% top_cuisines$CUISINE) %>%
  ggplot(aes(x = SCORE, y = reorder(CUISINE,SCORE,median))) +
    geom_boxplot(fill="#9FC29F") +
    labs(title="Inspection Scores by CUISINE",
         x="Inspection Scores",
         y="CUISINE") +
    theme_grey(13)

Code

test3 <- chisq.test(data$SCORE, data$BORO)
print(test3)


    Pearson's Chi-squared test

data:  data$SCORE and data$BORO
X-squared = 4509.1, df = 516, p-value < 2.2e-16

According to the visualization, Indian, Spanish, Latin American, Chinese, and Caribbean have top5 highest median of the inspection scores, which means that these cuisines didn’t get really good inspection results. Sandwiches, American, Coffee/Tea, Hamburgers, and Donuts have top5 lowest median of the inspection scores, which means that these cuisines got really good inspection results. Compared to the other cuisines, Sandwiches, Coffee/Tea, Hamburgers, and Donuts are easier to make than other cuisines, so it is understandable that they have less chance to have sanity problems. Note that American food has low inspection score, and we can assume it is because American foods include foods such as Sandwiches, Coffee, Hamburgers, and Donuts. Also, according to the chi-square test, there is strong evidence, which is very small p-value, that there is association between inspection scores and cuisines.

4.0.1 Critical flags by Cuisines

Code

subset5 <- data %>%
  rename(critical_flag = 'CRITICAL FLAG') %>%
  drop_na(critical_flag, CUISINE)

subset5 %>%
  filter(CUISINE %in% top_cuisines$CUISINE) %>%
  group_by(CUISINE) %>%
  count(critical_flag) %>%
  pivot_wider(
    names_from = critical_flag, values_from = n
  ) %>%
  rename(Not_Critical = `Not Critical`, Not_Applicable = `Not Applicable`) %>%
  mutate(cf_prop = Critical/(Critical + Not_Critical + Not_Applicable)) %>%
  arrange(desc(cf_prop))

# A tibble: 20 × 5
# Groups:   CUISINE [20]
   CUISINE                        Critical Not_Applicable Not_Critical cf_prop
   <chr>                             <int>          <int>        <int>   <dbl>
 1 Indian                             1847             16         1260   0.591
 2 Asian/Asian Fusion                 2143             32         1471   0.588
 3 Chinese                           11920            119         8350   0.585
 4 Spanish                            3305             82         2347   0.576
 5 Thai                               1790             12         1326   0.572
 6 Italian                            3997             51         2978   0.569
 7 Latin American                     4711            131         3470   0.567
 8 Japanese                           4015             44         3028   0.567
 9 Mexican                            4501             91         3531   0.554
10 Jewish/Kosher                      1644             20         1325   0.550
11 Pizza                              7025            133         5615   0.550
12 Caribbean                          4151             74         3375   0.546
13 Bakery Products/Desserts           4340             62         3576   0.544
14 American                          18726            520        15246   0.543
15 Sandwiches                         2220             38         1866   0.538
16 Juice, Smoothies, Fruit Salads     1922             90         1700   0.518
17 Chicken                            2943             81         2667   0.517
18 Coffee/Tea                         7209            208         6539   0.517
19 Donuts                             1938             46         1936   0.494
20 Hamburgers                         1594             41         1685   0.480

Similar to the result of inspection scores by cuisines, Indian, Asian, Chinese, Spanish, and Thai cuisines have top5 critical flag proportions. And Coffee/Tea, Donuts, and Hamburgers have low critical flag proportions. However, note that all the top 20 cuisines have around 50% critical flag proportion.

4.1 Part3: Sort some restaurants out based on Critical flags and Inspection Scores

4.2 Sort out by Critical flags

Code

subset7 <- data %>%
  rename(critical_flag = 'CRITICAL FLAG') %>%
  drop_na(DBA, critical_flag) %>%
  group_by(DBA) %>%
  count(critical_flag) %>%
  pivot_wider(
    names_from = critical_flag, values_from = n
  ) %>%
  rename(Not_Critical = `Not Critical`, Not_Applicable = `Not Applicable`) %>%
  mutate(
    Not_Applicable = ifelse(is.na(Not_Applicable), 0, Not_Applicable),
    Not_Critical = ifelse(is.na(Not_Critical), 0, Not_Critical),
    Critical = ifelse(is.na(Critical), 0, Critical),
    Total = Critical + Not_Critical + Not_Applicable) %>%
  mutate(cf_prop = Critical/(Critical + Not_Critical + Not_Applicable)) %>%
  arrange(desc(cf_prop))

Code

mean(subset7$Critical)

[1] 5.137253

Code

mean(subset7$Total)

[1] 9.418297

The average number of the critical flags per restaurant is around 5, and the mean of the total inspection cases is around 9 per restaurant for last three years (2020-2023).

Code

subset7 %>%
  filter(Total > 10) %>%
  head(20)

# A tibble: 20 × 6
# Groups:   DBA [20]
   DBA                        Critical Not_Critical Not_Applicable Total cf_prop
   <chr>                         <dbl>        <dbl>          <dbl> <dbl>   <dbl>
 1 WE VILLAGE CORP                  12            0              0    12   1    
 2 PEPERINO                         13            1              0    14   0.929
 3 IN CHA/TASTE OF GUILIN           12            1              0    13   0.923
 4 Boishakhi Restaurant             11            1              0    12   0.917
 5 SWEET CATS CAFE                  11            1              0    12   0.917
 6 DA NICO RESTAURANT               10            1              0    11   0.909
 7 JJ BROWN CUP                     10            1              0    11   0.909
 8 XI JIANG QIAN HU NOODLE I…       10            1              0    11   0.909
 9 KENKA                            13            2              0    15   0.867
10 QUEENS BUFFET                    13            2              0    15   0.867
11 SWEET HOUR                       13            2              0    15   0.867
12 BAAR BAAR                        19            3              0    22   0.864
13 ESTRELLITA POBLANA # 1           12            2              0    14   0.857
14 MASALA KING                      12            2              0    14   0.857
15 QUATORZE                         12            2              0    14   0.857
16 TEA CUP CAFE                     12            2              0    14   0.857
17 VILLAGE SQUARE PIZZA             12            2              0    14   0.857
18 YONG KANG STREET                 12            2              0    14   0.857
19 BOBO RESTAURANT                  11            2              0    13   0.846
20 DA NONNA ROSA                    11            2              0    13   0.846

These are top10 highest critical flag proportion restaurants among restaurants whose total inspection cases are above 10 cases for the last three years.

Code

subset7 %>%
  filter(Total > 10) %>%
  tail(20)

# A tibble: 20 × 6
# Groups:   DBA [20]
   DBA                        Critical Not_Critical Not_Applicable Total cf_prop
   <chr>                         <dbl>        <dbl>          <dbl> <dbl>   <dbl>
 1 NORTH COAST SHARK AND BAKE        3            9              0    12  0.25  
 2 ORCHARD BEACH SNACK BAR S…        3            9              0    12  0.25  
 3 PANDA EXPRESS #2622               3            9              0    12  0.25  
 4 QUICKLY                           3            7              2    12  0.25  
 5 The Oma                           3            9              0    12  0.25  
 6 Y J FRIED FISH                    3            8              1    12  0.25  
 7 ZIBETTO ESPRESSO BAR              3            9              0    12  0.25  
 8 ARAMARK @ ACCENTURE #19261        3            8              2    13  0.231 
 9 CABANA JORGE RESTAURANT           3            9              1    13  0.231 
10 CIBO EXPRESS GOURMET MARK…        4           12              3    19  0.211 
11 BRONSON'S BURGERS                 3           10              2    15  0.2   
12 HALAL GRILL EATS                  3            9              3    15  0.2   
13 CITI FIELD STERLING KITCH…        2            8              1    11  0.182 
14 KEKI MODERN CAKES                 2            8              1    11  0.182 
15 LAZY SUNDAES                      2            9              0    11  0.182 
16 PROPER FOOD                       3           14              0    17  0.176 
17 CRISPY                            2            9              1    12  0.167 
18 PENNYLANE COFFEE                  2           10              0    12  0.167 
19 AGUA E' COCO BAR & GRILL          1            7              3    11  0.0909
20 RADIO CITY MUSIC HALL             2           10             11    23  0.0870

These are top20 lowest critical flag proportion restaurants among restaurants whose total inspection cases are above 10 for the last three years.

Code

#mean of the critical flag proportion for all inspected restaurants
mean(subset7$cf_prop)

[1] 0.5042503

Code

subset7 %>%
  arrange(desc(Critical))

# A tibble: 22,178 × 6
# Groups:   DBA [22,178]
   DBA                        Critical Not_Critical Not_Applicable Total cf_prop
   <chr>                         <dbl>        <dbl>          <dbl> <dbl>   <dbl>
 1 DUNKIN                         1278         1271             33  2582   0.495
 2 SUBWAY                          779          747             11  1537   0.507
 3 MCDONALD'S                      492          558              8  1058   0.465
 4 STARBUCKS                       444          731             14  1189   0.373
 5 KENNEDY FRIED CHICKEN           375          341              9   725   0.517
 6 BURGER KING                     307          310              9   626   0.490
 7 POPEYES                         293          378              7   678   0.432
 8 CROWN FRIED CHICKEN             264          201              5   470   0.562
 9 GOLDEN KRUST CARIBBEAN BA…      251          251              5   507   0.495
10 DUNKIN'                         205          229              2   436   0.470
# ℹ 22,168 more rows

These are top10 highest accumulated critical flag restaurants for last three years. Since these restaurants are mostly chain restaurants, which have a number of restaurants in NYC. So the result is understandable that these chain restaurants have high accumulated critical flags. And note that these chain restaurants are mostly under the mean of the critical flag proportion for all inspected restaurants (0.5), so it is hard to say chain restaurants have more critical flags than other restaurants.

Code

bd_by_cfprop <- subset7 %>%
  filter(Total > 10) %>%
  head(40)
gd_by_cfprop <- subset7 %>%
  filter(Total > 10) %>%
  tail(40)

# left join DBA
loc = data[!duplicated(data$DBA), c("DBA", "Latitude", "Longitude")]
bd_by_cfprop <- left_join(bd_by_cfprop, loc, by = c("DBA" = "DBA"))
#bd_by_score <- left_join(bd_by_score, loc, by = c("DBA" = "DBA"))
gd_by_cfprop <- left_join(gd_by_cfprop, loc, by = c("DBA" = "DBA"))
#gd_by_score <- left_join(gd_by_score, loc, by = c("DBA" = "DBA"))

# check overlap
# merge_bd <- rbind(bd_by_cfprop, bd_by_score)
# merge_bd <- distinct(merge_bd, DBA, .keep_all = TRUE)
# merge_bd <- merge_bd[, c("DBA", "cf_prop", "avg_score", "Latitude", "Longitude")]
# merge_bd$grade = 0
# merge_gd <- rbind(gd_by_cfprop, gd_by_score)
# merge_gd <- distinct(merge_gd, DBA, .keep_all = TRUE)
# merge_gd <- merge_gd[, c("DBA", "cf_prop", "avg_score", "Latitude", "Longitude")]
# merge_gd$grade = 1
bd_by_cfprop <- bd_by_cfprop[, c("DBA", "cf_prop", "Latitude", "Longitude")]
bd_by_cfprop$Grade = "bad"
gd_by_cfprop <- gd_by_cfprop[, c("DBA", "cf_prop", "Latitude", "Longitude")]
gd_by_cfprop$Grade = "good"
merge_bdgd <- rbind(bd_by_cfprop, gd_by_cfprop)
colnames(merge_bdgd)[colnames(merge_bdgd) == "cf_prop"] = "Critical_Flag_Proportion"
colnames(merge_bdgd)[colnames(merge_bdgd) == "DBA"] = "Restaurant_Name"
#write.csv(merge_bdgd, "tidydata.csv", row.names=FALSE)

4.2.1 Interactive plot for sorting restaurants out by Critical Flag proportion.

Code

# interactive map - score vs long and lat
# filter the data
tmap_data = merge_bdgd

# Convert restaurant data to sf object
tmapdata_sf <- st_as_sf(tmap_data, coords = c("Longitude", "Latitude"), crs = 4326)

tmapdata_sf$color <- ifelse(tmapdata_sf$Grade == "bad", "red","green")

# Set tmap to view mode
tmap_mode("view")

tmap mode set to interactive viewing

Code

# Plot the base map
tm_base <- tm_shape(nyc_neighborhoods) +
  tm_borders() +
  tm_fill(col = "grey", alpha = 0.5) +
  tm_layout(frame = FALSE)
  

# Add the restaurant scores with custom colors
tm_restaurants <- tm_shape(tmapdata_sf) +
  tm_symbols(
    size = 0.1, # Replace 'size' with the name of the variable determining the size of the symbols
    col = "color", # The color column created based on the grade
    border.col = "black",
    border.alpha = 0.5,
    title.col = "Restaurant Score",
    shape = 21, # Shape 21 is a filled circle, similar to a bubble
    popup.vars = c("Restaurant_Name" = "Restaurant_Name", "Grade" = "Grade", "Critical_Flag_Proportion" = "Critical_Flag_Proportion")
  )


# +
#   tm_bubbles(size = 0.5, col = "color", 
#              border.col = "black", border.alpha = 0.5,
#              title.col = "Restaurant Score",
#              style = "pretty",
#              labels = "1")


# Combine the layers and print the map
tm_map <- tm_base + tm_restaurants + tm_add_legend("fill", col = c("red", "green"), 
                labels = c("Bad", "Good"),
                title = "Legend",
                size = 1)
 
# tmap_save(tm_map, "try_map.html")

4.3 Sort out by Inspection scores

Code

subset8 <- data %>%
  drop_na(DBA, SCORE) %>%
  filter(GRADE %in% c("A", "B", "C")) %>%
  group_by(DBA) %>%
  dplyr::summarise(cases = n(), avg_score = mean(SCORE)) %>%
  arrange(desc(avg_score))

Code

subset8 %>%
  filter(cases > 10) %>%
  select(DBA, avg_score)

# A tibble: 750 × 2
   DBA                   avg_score
   <chr>                     <dbl>
 1 NEW RED LANTERN            88  
 2 SPICY PALACE               85  
 3 TWIN THUMB RESTAURANT      82.4
 4 PI GREEK BAKERIE           79.4
 5 GAMMEEOK                   79  
 6 ASIAN FOOD LTD             71.2
 7 HUNDRED TASTE              70.4
 8 GOTTA GETTA BAGEL          66  
 9 SLICE                      65  
10 CAFFE NAPOLI               61.5
# ℹ 740 more rows

These are top20 highest inspection scores (high score is bad) restaurants among restaurant whose total inspection cases are above 10 for last three years.

Code

subset8 %>%
  filter(cases > 10) %>%
  tail(20)

# A tibble: 20 × 3
   DBA                            cases avg_score
   <chr>                          <int>     <dbl>
 1 HARDEE                            14      5.71
 2 TWO HANDS                         17      5.53
 3 FUSHIMI                           12      5.17
 4 CIBO EXPRESS GOURMET MARKET       19      5.16
 5 NEW APOLLO DINER                  12      5   
 6 NINO'S PIZZA                      11      4.73
 7 MIKE'S DAKOTA DINER               14      4.71
 8 BAR & GRILL 43                    11      4.36
 9 HUDSON FOOD COURT                 11      4.36
10 SHANGHAI YOU GARDEN               24      3.12
11 10TH AVENUE PIZZA & CAFE          11      3   
12 MASALA GRILL                      11      3   
13 Pick-A-Bagel                      11      3   
14 FELIX                             11      2.73
15 POKE FRESH SUSHI                  13      2.54
16 RADIO CITY MUSIC HALL             22      2   
17 ROSA'S PIZZA & PASTA              15      2   
18 SPICE UP SWEETS AND RESTAURANT    12      2   
19 LA BARCA RESTAURANT               11      1.82
20 NEW CHOI HEE                      12      1.5

These are top20 lowest inspection scores (low score is good) restaurants whose total inspection cases are above 10 for last three years.

4.4 Part4: Violation description by word cloud

Code

rest_cf <- subset7 %>%
  filter(Total > 10) %>%
  head(100) %>%
  select(DBA)

rest_score <- subset8 %>%
  filter(cases > 10) %>%
  head(100) %>%
  select(DBA)

rests <- c(rest_cf$DBA,rest_score$DBA)

Code

subset4 <- data %>%
  rename(VIOLATION = "VIOLATION DESCRIPTION") %>%
  drop_na(VIOLATION) %>%
  filter(DBA %in% rests)

words <- tokenize_words(subset4$VIOLATION, stopwords = stopwords::stopwords("en"))

# violation <- vector()
# for (i in 1:nrow(subset4)) {
#     for (j in 1:length(words[[i]])){
#           violation <- c(final_vector, words[[i]][j])
#     }
# }
# 
# df <- as.data.frame(violation)
#write.csv(violation, "violation.csv", row.names=FALSE)

df <- read_csv("violation.csv")

Rows: 33310 Columns: 1
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): x

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Code

df <- df %>%
  rename(violation = x) %>%
  count(violation) %>%
  filter(!violation %in% c("food", "held", "f", "non", "properly")) %>%
  arrange(desc(n)) %>%
  rename(freq = n) 
wordcloud2(data = df, size = 0.75, shape = 'circle', minSize = 10)

This is a wordcloud from top100 highest critical flag proportion restaurants and top100 highest inspection scores (high score means bad result) restaurants among all the restaurants whose total inspection cases are above 10 for last three years.