For this case study we assume that our audience is the CEO and CFO of Budweiser (our client). They have hired us to answer 7 questions and beyond those general questions we will speculate / anticipate what may be of interest to them.
We will start by importing the following data for analysis:
Beers.csv: Name: Name of the beer. Beer_ID: Unique identifier of the beer. ABV: Alcohol by volume of the beer. IBU: International Bitterness Units of the beer. Brewery_ID: Brewery id associated with the beer. Style: Style of the beer. Ounces: Ounces of beer.
Breweries.csv: Brew_ID: Unique identifier of the brewery. Name: Name of the brewery. City: City where the brewery is located. State: U.S. State where the brewery is located.
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.1.0 v stringr 1.4.0
## v tidyr 1.1.3 v forcats 0.5.1
## v readr 1.4.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
##
## Attaching package: 'mice'
## The following object is masked from 'package:stats':
##
## filter
## The following objects are masked from 'package:base':
##
## cbind, rbind
## Loading required package: colorspace
## Loading required package: grid
## VIM is ready to use.
## Suggestions and bug-reports can be submitted at: https://github.com/statistikat/VIM/issues
##
## Attaching package: 'VIM'
## The following object is masked from 'package:datasets':
##
## sleep
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
## Loading required package: lattice
##
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
##
## lift
##
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
##
## map
##
## Attaching package: 'table1'
## The following objects are masked from 'package:base':
##
## units, units<-
Load Theme for plots
Load and Merge initial data sets provided
## [1] 2410 7
## [1] 558 4
## [1] 0
## [1] 0
## Brewery_id Beer_Name Beer_ID ABV IBU
## 1 1 Get Together 2692 0.045 50
## 2 1 Maggie's Leap 2691 0.049 26
## 3 1 Wall's End 2690 0.048 19
## 4 1 Pumpion 2689 0.060 38
## 5 1 Stronghold 2688 0.060 25
## 6 1 Parapet ESB 2687 0.056 47
## Style Ounces Brewery_Name City
## 1 American IPA 16 NorthGate Brewing Minneapolis
## 2 Milk / Sweet Stout 16 NorthGate Brewing Minneapolis
## 3 English Brown Ale 16 NorthGate Brewing Minneapolis
## 4 Pumpkin Ale 16 NorthGate Brewing Minneapolis
## 5 American Porter 16 NorthGate Brewing Minneapolis
## 6 Extra Special / Strong Bitter (ESB) 16 NorthGate Brewing Minneapolis
## State
## 1 MN
## 2 MN
## 3 MN
## 4 MN
## 5 MN
## 6 MN
Add Features to the individual beer level data frame
Investigate NA values to determine what needs resolution
##
## Variables sorted by number of missings:
## Variable Count
## IBU 0.41701245
## ABV 0.02572614
## Brewery_id 0.00000000
## Beer_Name 0.00000000
## Beer_ID 0.00000000
## Style 0.00000000
## Ounces 0.00000000
## Brewery_Name 0.00000000
## City 0.00000000
## State 0.00000000
## Class 0.00000000
Method #1 for imputing NA values, with Predictive Mean Mean Matching
##
## iter imp variable
## 1 1 ABV IBU
## 1 2 ABV IBU
## 1 3 ABV IBU
## 1 4 ABV IBU
## 1 5 ABV IBU
## 2 1 ABV IBU
## 2 2 ABV IBU
## 2 3 ABV IBU
## 2 4 ABV IBU
## 2 5 ABV IBU
## 3 1 ABV IBU
## 3 2 ABV IBU
## 3 3 ABV IBU
## 3 4 ABV IBU
## 3 5 ABV IBU
## 4 1 ABV IBU
## 4 2 ABV IBU
## 4 3 ABV IBU
## 4 4 ABV IBU
## 4 5 ABV IBU
## 5 1 ABV IBU
## 5 2 ABV IBU
## 5 3 ABV IBU
## 5 4 ABV IBU
## 5 5 ABV IBU
## 6 1 ABV IBU
## 6 2 ABV IBU
## 6 3 ABV IBU
## 6 4 ABV IBU
## 6 5 ABV IBU
## 7 1 ABV IBU
## 7 2 ABV IBU
## 7 3 ABV IBU
## 7 4 ABV IBU
## 7 5 ABV IBU
## 8 1 ABV IBU
## 8 2 ABV IBU
## 8 3 ABV IBU
## 8 4 ABV IBU
## 8 5 ABV IBU
## 9 1 ABV IBU
## 9 2 ABV IBU
## 9 3 ABV IBU
## 9 4 ABV IBU
## 9 5 ABV IBU
## 10 1 ABV IBU
## 10 2 ABV IBU
## 10 3 ABV IBU
## 10 4 ABV IBU
## 10 5 ABV IBU
## 11 1 ABV IBU
## 11 2 ABV IBU
## 11 3 ABV IBU
## 11 4 ABV IBU
## 11 5 ABV IBU
## 12 1 ABV IBU
## 12 2 ABV IBU
## 12 3 ABV IBU
## 12 4 ABV IBU
## 12 5 ABV IBU
## 13 1 ABV IBU
## 13 2 ABV IBU
## 13 3 ABV IBU
## 13 4 ABV IBU
## 13 5 ABV IBU
## 14 1 ABV IBU
## 14 2 ABV IBU
## 14 3 ABV IBU
## 14 4 ABV IBU
## 14 5 ABV IBU
## 15 1 ABV IBU
## 15 2 ABV IBU
## 15 3 ABV IBU
## 15 4 ABV IBU
## 15 5 ABV IBU
## 16 1 ABV IBU
## 16 2 ABV IBU
## 16 3 ABV IBU
## 16 4 ABV IBU
## 16 5 ABV IBU
## 17 1 ABV IBU
## 17 2 ABV IBU
## 17 3 ABV IBU
## 17 4 ABV IBU
## 17 5 ABV IBU
## 18 1 ABV IBU
## 18 2 ABV IBU
## 18 3 ABV IBU
## 18 4 ABV IBU
## 18 5 ABV IBU
## 19 1 ABV IBU
## 19 2 ABV IBU
## 19 3 ABV IBU
## 19 4 ABV IBU
## 19 5 ABV IBU
## 20 1 ABV IBU
## 20 2 ABV IBU
## 20 3 ABV IBU
## 20 4 ABV IBU
## 20 5 ABV IBU
## 21 1 ABV IBU
## 21 2 ABV IBU
## 21 3 ABV IBU
## 21 4 ABV IBU
## 21 5 ABV IBU
## 22 1 ABV IBU
## 22 2 ABV IBU
## 22 3 ABV IBU
## 22 4 ABV IBU
## 22 5 ABV IBU
## 23 1 ABV IBU
## 23 2 ABV IBU
## 23 3 ABV IBU
## 23 4 ABV IBU
## 23 5 ABV IBU
## 24 1 ABV IBU
## 24 2 ABV IBU
## 24 3 ABV IBU
## 24 4 ABV IBU
## 24 5 ABV IBU
## 25 1 ABV IBU
## 25 2 ABV IBU
## 25 3 ABV IBU
## 25 4 ABV IBU
## 25 5 ABV IBU
## 26 1 ABV IBU
## 26 2 ABV IBU
## 26 3 ABV IBU
## 26 4 ABV IBU
## 26 5 ABV IBU
## 27 1 ABV IBU
## 27 2 ABV IBU
## 27 3 ABV IBU
## 27 4 ABV IBU
## 27 5 ABV IBU
## 28 1 ABV IBU
## 28 2 ABV IBU
## 28 3 ABV IBU
## 28 4 ABV IBU
## 28 5 ABV IBU
## 29 1 ABV IBU
## 29 2 ABV IBU
## 29 3 ABV IBU
## 29 4 ABV IBU
## 29 5 ABV IBU
## 30 1 ABV IBU
## 30 2 ABV IBU
## 30 3 ABV IBU
## 30 4 ABV IBU
## 30 5 ABV IBU
## 31 1 ABV IBU
## 31 2 ABV IBU
## 31 3 ABV IBU
## 31 4 ABV IBU
## 31 5 ABV IBU
## 32 1 ABV IBU
## 32 2 ABV IBU
## 32 3 ABV IBU
## 32 4 ABV IBU
## 32 5 ABV IBU
## 33 1 ABV IBU
## 33 2 ABV IBU
## 33 3 ABV IBU
## 33 4 ABV IBU
## 33 5 ABV IBU
## 34 1 ABV IBU
## 34 2 ABV IBU
## 34 3 ABV IBU
## 34 4 ABV IBU
## 34 5 ABV IBU
## 35 1 ABV IBU
## 35 2 ABV IBU
## 35 3 ABV IBU
## 35 4 ABV IBU
## 35 5 ABV IBU
## 36 1 ABV IBU
## 36 2 ABV IBU
## 36 3 ABV IBU
## 36 4 ABV IBU
## 36 5 ABV IBU
## 37 1 ABV IBU
## 37 2 ABV IBU
## 37 3 ABV IBU
## 37 4 ABV IBU
## 37 5 ABV IBU
## 38 1 ABV IBU
## 38 2 ABV IBU
## 38 3 ABV IBU
## 38 4 ABV IBU
## 38 5 ABV IBU
## 39 1 ABV IBU
## 39 2 ABV IBU
## 39 3 ABV IBU
## 39 4 ABV IBU
## 39 5 ABV IBU
## 40 1 ABV IBU
## 40 2 ABV IBU
## 40 3 ABV IBU
## 40 4 ABV IBU
## 40 5 ABV IBU
## 41 1 ABV IBU
## 41 2 ABV IBU
## 41 3 ABV IBU
## 41 4 ABV IBU
## 41 5 ABV IBU
## 42 1 ABV IBU
## 42 2 ABV IBU
## 42 3 ABV IBU
## 42 4 ABV IBU
## 42 5 ABV IBU
## 43 1 ABV IBU
## 43 2 ABV IBU
## 43 3 ABV IBU
## 43 4 ABV IBU
## 43 5 ABV IBU
## 44 1 ABV IBU
## 44 2 ABV IBU
## 44 3 ABV IBU
## 44 4 ABV IBU
## 44 5 ABV IBU
## 45 1 ABV IBU
## 45 2 ABV IBU
## 45 3 ABV IBU
## 45 4 ABV IBU
## 45 5 ABV IBU
## 46 1 ABV IBU
## 46 2 ABV IBU
## 46 3 ABV IBU
## 46 4 ABV IBU
## 46 5 ABV IBU
## 47 1 ABV IBU
## 47 2 ABV IBU
## 47 3 ABV IBU
## 47 4 ABV IBU
## 47 5 ABV IBU
## 48 1 ABV IBU
## 48 2 ABV IBU
## 48 3 ABV IBU
## 48 4 ABV IBU
## 48 5 ABV IBU
## 49 1 ABV IBU
## 49 2 ABV IBU
## 49 3 ABV IBU
## 49 4 ABV IBU
## 49 5 ABV IBU
## 50 1 ABV IBU
## 50 2 ABV IBU
## 50 3 ABV IBU
## 50 4 ABV IBU
## 50 5 ABV IBU
## Warning: Number of logged events: 6
## /\ /\
## { `---' }
## { O O }
## ==> V <== No need for mice. This data set is completely observed.
## \ \|/ /
## `-----'
## Brewery_id Beer_Name Beer_ID ABV IBU Style Ounces Brewery_Name City State
## 2410 1 1 1 1 1 1 1 1 1 1
## 0 0 0 0 0 0 0 0 0 0
## Class
## 2410 1 0
## 0 0
##
## Variables sorted by number of missings:
## Variable Count
## Brewery_id 0
## Beer_Name 0
## Beer_ID 0
## ABV 0
## IBU 0
## Style 0
## Ounces 0
## Brewery_Name 0
## City 0
## State 0
## Class 0
Method #2 for imputing values: Study distributions of IBU and ABV by Class of beer rather than using predictive mean for distribution of all beer style combined
## Warning: Removed 1005 rows containing non-finite values (stat_boxplot).
## Warning: Removed 62 rows containing non-finite values (stat_boxplot).
## Warning: Removed 37 rows containing missing values (geom_point).
Create State Level table of summary statistics
Add external data and features to the State level tables including, consumption, population, and consumption per capita. This will be used to explore additional analysis beyond the questions Budweiser asked us.
## Warning: NAs introduced by coercion
Create a State Mapping data frame so that all summary statistics can be geographically plotted
Question 1: How many breweries are present in each state?
Answer 1: Colorado has the most breweries with 47. Other than Colorado, states with most breweries are clustered along US border and coasts. The top 5 brewery count states account for 31% of total craft breweris in the US.
Question 2: Merge beer data with the breweries data. Print the first 6 observations and the last six observations to check the merged file. (RMD only, this does not need to be included in the presentation or the deck.)
## # A tibble: 6 x 15
## # Groups: Class [5]
## Brewery_id Beer_Name Beer_ID ABV IBU Style Ounces Brewery_Name City State
## <int> <chr> <int> <dbl> <int> <chr> <dbl> <chr> <chr> <chr>
## 1 1 Get Toge~ 2692 0.045 50 Amer~ 16 "NorthGate ~ Minn~ " MN"
## 2 1 Maggie's~ 2691 0.049 26 Milk~ 16 "NorthGate ~ Minn~ " MN"
## 3 1 Wall's E~ 2690 0.048 19 Engl~ 16 "NorthGate ~ Minn~ " MN"
## 4 1 Pumpion 2689 0.06 38 Pump~ 16 "NorthGate ~ Minn~ " MN"
## 5 1 Strongho~ 2688 0.06 25 Amer~ 16 "NorthGate ~ Minn~ " MN"
## 6 1 Parapet ~ 2687 0.056 47 Extr~ 16 "NorthGate ~ Minn~ " MN"
## # ... with 5 more variables: Class <chr>, ABV.pmm.imputed <dbl>,
## # IBU.pmm.imputed <int>, IBU.class.imputed <dbl>, ABV.class.imputed <dbl>
## # A tibble: 6 x 15
## # Groups: Class [5]
## Brewery_id Beer_Name Beer_ID ABV IBU Style Ounces Brewery_Name City State
## <int> <chr> <int> <dbl> <int> <chr> <dbl> <chr> <chr> <chr>
## 1 556 Pilsner ~ 98 0.055 NA Germ~ 12 Ukiah Brewi~ Ukiah " CA"
## 2 557 Heinniew~ 52 0.049 NA Hefe~ 12 Butternuts ~ Garr~ " NY"
## 3 557 Snapperh~ 51 0.068 NA Amer~ 12 Butternuts ~ Garr~ " NY"
## 4 557 Moo Thun~ 50 0.049 NA Milk~ 12 Butternuts ~ Garr~ " NY"
## 5 557 Porkslap~ 49 0.043 NA Amer~ 12 Butternuts ~ Garr~ " NY"
## 6 558 Urban Wi~ 30 0.049 NA Engl~ 12 Sleeping La~ Anch~ " AK"
## # ... with 5 more variables: Class <chr>, ABV.pmm.imputed <dbl>,
## # IBU.pmm.imputed <int>, IBU.class.imputed <dbl>, ABV.class.imputed <dbl>
Question 3: Address the missing values in each column.See above for imputing using two methods. Here we will compare methods and decide which is best to use for subsequent analyses
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 1042 rows containing non-finite values (stat_bin).
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 62 rows containing non-finite values (stat_bin).
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 1042 rows containing non-finite values (stat_smooth).
## Warning: Removed 1042 rows containing missing values (geom_point).
Answer 3: The PMM method of imputing mimcs the overall distribution of available data points but does not seem appropriate for each invidual missing point as it replaces with values not accurate for the type of beer that has missing values. The Class-mean method seems more appropriate and will be used for subsequent distribution analysis and summary statitistics. However it does not seema appropriate for relationship analysis, so we will omit NA’s for correlations and models
Question 4: Compute the median alcohol content and international bitterness unit for each state. Plot a bar chart to compare.
Answer 4: For states with sufficiently larger sample sizes median IBU in the mid-30’s. The two outlier states with high median IBU are small samples. They only have 2 beers per state, which happen to be high IBU styles. If they brewed more beers of multiple varieties, they would likely have median IBU more similar to the rest of states.
Almost all states have a median ABV in the 5% to 6% Range. Utah, the one outlier state with low ABV has to do with a state law limiting the max ABV to 4%
Question 5: Which state has the maximum alcoholic (ABV) beer? Which state has the most bitter (IBU) beer?
## # A tibble: 1 x 6
## # Groups: Class [1]
## Beer_Name ABV Class Style State Brewery_Name
## <chr> <dbl> <chr> <chr> <chr> <chr>
## 1 Lee Hill Series Vol. 5 - Belgi~ 0.128 Other Quadrupel~ " CO" Upslope Brewing ~
## # A tibble: 1 x 6
## # Groups: Class [1]
## Beer_Name IBU Class Style State Brewery_Name
## <chr> <int> <chr> <chr> <chr> <chr>
## 1 Bitter Bitch Imper~ 138 IPA American Double / Im~ " OR" Astoria Brewing C~
Answer 5: Max ABV beer = 12.8% by Upslope Brewing Company, Colorado Lee Hill Series Vol. 5 - Belgian Style Quadrupel Ale Brewed with a select strain of Belgian yeast and traditional Belgian candy syrup, the beer matured over six months to mellow and soften its character.
Max IBU Beer = 138 by Astoria Brewing Company, Oregon Bitter Bitch Imperial IPA A big IPA with a huge bite. Peoples Award Winner 3 Years running at the Spring Beer & Wine Festival, Portland, OR.
Question 6: Comment on the summary statistics and distribution of the ABV variable.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00100 0.05000 0.05600 0.05977 0.06700 0.12800 62
## Warning: Removed 62 rows containing non-finite values (stat_boxplot).
## Warning: Removed 62 rows containing missing values (geom_point).
## Adding missing grouping variables: `Class`
## Warning: Removed 62 rows containing non-finite values (stat_bin).
## Adding missing grouping variables: `Class`
## Warning: Removed 62 rows containing non-finite values (stat_bin).
## Warning: Removed 62 rows containing non-finite values (stat_density).
Overall (N=2410) |
|
---|---|
ABV | |
Mean (SD) | 0.0598 (0.0135) |
Median [Min, Max] | 0.0560 [0.00100, 0.128] |
Missing | 62 (2.6%) |
Answer 6: ABV distribution is relatively normal with a right skew IPA’s contribute most to the right tail of high ABV values, explains mean > median The minimum value is a non-alcoholic beer, excluding this one reduces range to 2.7% - 12.8%
Question 7: Is there an apparent relationship between the bitterness of the beer and its alcoholic content? Draw a scatter plot. Make your best judgment of a relationship and EXPLAIN your answer.
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 1005 rows containing non-finite values (stat_smooth).
## Warning: Removed 1005 rows containing missing values (geom_point).
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 179 rows containing non-finite values (stat_smooth).
## Warning: Removed 179 rows containing missing values (geom_point).
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 411 rows containing non-finite values (stat_smooth).
## Warning: Removed 411 rows containing missing values (geom_point).
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 590 rows containing non-finite values (stat_smooth).
## Warning: Removed 590 rows containing missing values (geom_point).
Answer 7: ABV and IBU are positively correlated with 45% of the variation in ABV explained by IBU The IBU/ABV relationship is different or shifted for different styles of beer For similar alcohol level, IPA is more bitter than Ale
Question 8: Budweiser would also like to investigate the difference with respect to IBU and ABV between IPAs (India Pale Ales) and other types of Ale (any beer with “Ale” in its name other than IPA). You decide to use KNN classification to investigate this relationship. Provide statistical evidence one way or the other.
#KNN - This model omits NA values and loops to find average confusion matrix parameters over iterations and plots for optimal k value. See comments after plots where single confusion matrix created using optimal k
## classifications
## Ale IPA
## Ale 367 7
## IPA 20 253
## Confusion Matrix and Statistics
##
## classifications
## Ale IPA
## Ale 367 7
## IPA 20 253
##
## Accuracy : 0.9583
## 95% CI : (0.9399, 0.9723)
## No Information Rate : 0.5981
## P-Value [Acc > NIR] : < 2e-16
##
## Kappa : 0.9139
##
## Mcnemar's Test P-Value : 0.02092
##
## Sensitivity : 0.9483
## Specificity : 0.9731
## Pos Pred Value : 0.9813
## Neg Pred Value : 0.9267
## Prevalence : 0.5981
## Detection Rate : 0.5672
## Detection Prevalence : 0.5781
## Balanced Accuracy : 0.9607
##
## 'Positive' Class : Ale
##
#KNN - This model omits NA values then scales ABV/IBU parameters to reduce leverage of nearest neighbor distances due to differing scales. Then it loops to find average confusion matrix parameters over iterations and plots for optimal k value. See comments after plots where single confusion matrix created using optimal k
## classifications
## Ale IPA
## Ale 137 23
## IPA 12 113
## Confusion Matrix and Statistics
##
## classifications
## Ale IPA
## Ale 137 23
## IPA 12 113
##
## Accuracy : 0.8772
## 95% CI : (0.8334, 0.9129)
## No Information Rate : 0.5228
## P-Value [Acc > NIR] : < 2e-16
##
## Kappa : 0.753
##
## Mcnemar's Test P-Value : 0.09097
##
## Sensitivity : 0.9195
## Specificity : 0.8309
## Pos Pred Value : 0.8562
## Neg Pred Value : 0.9040
## Prevalence : 0.5228
## Detection Rate : 0.4807
## Detection Prevalence : 0.5614
## Balanced Accuracy : 0.8752
##
## 'Positive' Class : Ale
##
#KNN - Unlike above models where NA values were omitted, this model uses the imputed IBU and ABV values using the Redictive Mean Matching (PMM) discussed in sections above.
#KNN - PMM - Scale ABV/IBU
#KNN - CLASS
#KNN - CLASS - Scale ABV/IBU
Answer 8: We looked at multiple KNN models with three different approaches on values: omit NA’s, PMM imputed missing values, Class-mean imputed missing values. Using imputed values lowered accuracy which makes some sense since they are not true IBU/ABV paired data. Omitting NA’s produced the highest accuracy.
Since the range of values for ABV and IBU are so different, we looked at scaling these axes for potential improvement to model accuracy. In this case, scaling did not improve our models.
Finally we ran loops to plot the optimal k (number of nearest neighbors for classification) and observed k=1 to provide the highest accuracy. Using k=1 could cause concerns of overfitting the model, so we looped through 500 iterations of reassignments to the train and test data sets. The mean accuracy was still highest with k=1 leading us to conclude it was not an abberation based on one assignment of the training set.
Our best model can classify Ales vs. IPA with 95% accuracy using their IBU and ABV measurements alone. If overfitting is a concern, higher values of k still produce accuracy of 85% which is still a good model.
Question 9: Knock their socks off! Find one other useful inference from the data that you feel Budweiser may be able to find value in. You must convince them why it is important and back up your conviction with appropriate statistical evidence.
We will analyze alcohol consumption data as a proxy for demand and consumer interest in beer vs. the location of existing breweries to look for opportunities of high demand with lower supply.
## Warning: Removed 2 rows containing missing values (geom_point).
## Warning: Removed 2 rows containing missing values (geom_point).
## Warning: ggrepel: 13 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
## city lat lon
## 1 baldwinsville 43.15870 -76.33270
## 2 cartersville 34.16510 -84.79994
## 3 cartersville 34.16510 -84.79994
## 4 columbus 39.96118 -82.99879
## 5 losangeles 34.05223 -118.24368
## 6 newark 40.73566 -74.17237
## 7 williamsburg 37.27070 -76.70746
## 8 fairfield 38.24936 -122.03997
## 9 fortcollins 40.58526 -105.08442
## 10 houston 29.76328 -95.36327
## 11 jacksonville 30.33218 -81.65565
## 12 merrimack 42.86509 -71.49340
## 13 stlouis 38.62727 -90.19789
Answer 9: Some states with the highest alcohol consumption rate have very few breweries. There may be an opportunity to launch new beer with a brewery in these areas to create regional pride and increase demand for a new budweiser beer in a ripe market
Preliminary Analysis shows some states such as Nevada, South Dakota and North Dakota warrant further investigation in order to expand Budweiser’s share of craft brew market. These states demonstrate potential to support Budweiser Brewery Experiential Site
We’d recommend adding beer sales data provided by Budweiser so we can refine market analysis with more precise data. We would also like to include consumer taste preference for Beer with respect to IBU / ABV.
Session Information
## R version 4.0.4 (2021-02-15)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19042)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] grid stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] table1_1.2.1 ggthemes_4.2.4 mapproj_1.2.7 maps_3.3.0
## [5] class_7.3-18 e1071_1.7-5 caret_6.0-86 lattice_0.20-41
## [9] ggpubr_0.4.0 plotly_4.9.3 VIM_6.1.0 colorspace_2.0-0
## [13] mice_3.13.0 usmap_0.5.2 GGally_2.1.1 visdat_0.5.3
## [17] forcats_0.5.1 stringr_1.4.0 purrr_0.3.4 readr_1.4.0
## [21] tidyr_1.1.3 tibble_3.1.0 ggplot2_3.3.3 tidyverse_1.3.0
## [25] dplyr_1.0.5
##
## loaded via a namespace (and not attached):
## [1] ggsignif_0.6.1 ellipsis_0.3.1 rio_0.5.26
## [4] fs_1.5.0 rstudioapi_0.13 proxy_0.4-25
## [7] farver_2.1.0 ggrepel_0.9.1 prodlim_2019.11.13
## [10] fansi_0.4.2 lubridate_1.7.10 ranger_0.12.1
## [13] xml2_1.3.2 splines_4.0.4 codetools_0.2-18
## [16] robustbase_0.93-7 knitr_1.31 Formula_1.2-4
## [19] jsonlite_1.7.2 pROC_1.17.0.1 broom_0.7.5
## [22] dbplyr_2.1.0 compiler_4.0.4 httr_1.4.2
## [25] backports_1.2.1 assertthat_0.2.1 Matrix_1.3-2
## [28] lazyeval_0.2.2 cli_2.3.1 htmltools_0.5.1.1
## [31] tools_4.0.4 gtable_0.3.0 glue_1.4.2
## [34] reshape2_1.4.4 Rcpp_1.0.6 carData_3.0-4
## [37] cellranger_1.1.0 jquerylib_0.1.3 vctrs_0.3.6
## [40] nlme_3.1-152 iterators_1.0.13 lmtest_0.9-38
## [43] timeDate_3043.102 xfun_0.22 gower_0.2.2
## [46] laeken_0.5.1 openxlsx_4.2.3 rvest_1.0.0
## [49] lifecycle_1.0.0 rstatix_0.7.0 DEoptimR_1.0-8
## [52] MASS_7.3-53 zoo_1.8-9 scales_1.1.1
## [55] ipred_0.9-11 hms_1.0.0 RColorBrewer_1.1-2
## [58] yaml_2.2.1 curl_4.3 sass_0.3.1
## [61] rpart_4.1-15 reshape_0.8.8 stringi_1.5.3
## [64] highr_0.8 foreach_1.5.1 boot_1.3-26
## [67] zip_2.1.1 lava_1.6.9 rlang_0.4.10
## [70] pkgconfig_2.0.3 evaluate_0.14 labeling_0.4.2
## [73] recipes_0.1.15 htmlwidgets_1.5.3 tidyselect_1.1.0
## [76] plyr_1.8.6 magrittr_2.0.1 R6_2.5.0
## [79] generics_0.1.0 DBI_1.1.1 mgcv_1.8-33
## [82] pillar_1.5.1 haven_2.3.1 foreign_0.8-81
## [85] withr_2.4.1 survival_3.2-7 abind_1.4-5
## [88] sp_1.4-5 nnet_7.3-15 modelr_0.1.8
## [91] crayon_1.4.1 car_3.0-10 utf8_1.2.1
## [94] rmarkdown_2.7 readxl_1.3.1 data.table_1.14.0
## [97] ModelMetrics_1.2.2.2 vcd_1.4-8 reprex_1.0.0
## [100] digest_0.6.27 stats4_4.0.4 munsell_0.5.0
## [103] viridisLite_0.3.0 bslib_0.2.4