Data Science Interview

21 Jan 2018

Under the Weather, Still Grinding

About a week ago, while dealing with what turned out to be a nasty cold, I came across an interesting tweet.

Now, if you’re a regular reader (Do I even have any regular readers?!), you already know that I’ve been working on machine learning and data science skills. The LinkedIn post in the tweet outlined a one-size-fits-all interview kit for testing and hiring for data skills. It was a well-written article, and instantly piqued my interest. When I found out that the dataset in the test is beer ratings from Beer Advocate, I knew I had to give this test a shot.

The Data & The Questions

This dataset consists of over 1.5 million rows of review data in csv (comma separated values) format. The size was carefully chosen to be too large for MS Excel, but small enough to be managed by most laptops using Python or R. There were 4 questions to answer in this test:

  1. Which brewery produces the strongest beers by ABV%
  2. If you had to pick 3 beers to recommend using only this data, which would you pick?
  3. Which of the factors (aroma, taste, appearance, palette), are most important in determining the overall quality of a beer?
  4. Lastly, if I typically enjoy a beer due to its aroma and appearance, which beer style should I try?

Results and Insights

I was hoping to plow right through this project in a day, but ended up needing a few hours spread over a few days to complete the test. I’m not sure how long I’d be allowed to work on this in a real world interview scenario, so that’s something I’d need to keep in mind. At this point, you should reference the Jupyter Notebook, that shows my work from initial exploration through to the answers that I came up with. If you just want to know the solutions, skip to the last section of this post. One thing I learned that blew me away is the answer to question #1. Apparently, there’s a brewery out there that makes some extremely potent brews! Schorschbräu in Germany makes beers with ABV percents in the 40s and 50s, topping out at 57%! Side Note: This brewery is actually near the town I stayed in during a work trip back in 2006. Had I known about this place back then, I would’ve made a trek out there! Something that I’m particularly proud of is my effort in answering question #2. In one of the online classes I took, I learned about building a basic recommender system using movie ratings as an example. I was able to successfully adapt that example into a recommender system for beers. I’m satisfied with the results, but at the same time, would really like to continue to build out this recommender. There is much more that can be done to refine it. Time to add another project to the backlog! I enjoyed working on this test. I was able to put my data analysis and data science skills to work, and I’m really starting to get in a groove. I definitely feel like I could have gone into more depth with these questions, but that can also be a slippery slope. On one question, I started working on a scatter matrix plot, and then tried a hexbin plot. Ultimately, my first solution, using correlation turned out to be a perfectly adequate answer. As cool as some advanced charts and other visualizations are, the task at hand is to extract useful information from the data. Coming up with a quick answer is sometimes preferred to fancy charts, especially when time is of the essence.

TL; DR Answers

  1. Which brewery produces the strongest beers by ABV%?
    • Schorschbräu
  2. If you had to pick 3 beers to recommend using only this data, which would you pick?
    • Founders Pale Ale (not only did it have the highest similarity score, I’ve heard of Founders and I like their beer)
    • Ithaca Beer Company Cascazilla It’s an Amber/Red Ale, had a high aggregate score and has a cool name.
    • Dogfish Head Shelter Pale Ale This one should be easy to find, and matched well to my input beer.
  3. Which of the factors (aroma, taste, appearance, palette) are most important in determining the overall quality of a beer?
    • Taste is most important when determining the overall quality
  4. Lastly, if I typically enjoy a beer due to its aroma and appearance, which beer style should I try?
    • American Double / Imperial Stout (Quadrupel for the adventurous folks!)