A Classification model showdown at the Huang-Page Buffet

by Jiawen Huang
with Greg Page

Background:  Customer Satisfaction at the Huang-Page International Super Buffet 

We recently opened the Huang-Page International Super Buffet (HP Buffet) in a location just north of Boston.  Since opening, we have received rave reviews from restaurant critics and fans of casual dining.  

For the sake of simplicity, the HP Buffet offers a limited range of items:  california rolls, crab rangoon, pizza, deviled eggs, sesame chicken, spare ribs, texas toast, italian sausage, bulgogi, egg rolls, and soft drinks.   For a flat rate of $11.95 per person, guests may eat as many servings of the above items as they wish to consume. 

In theory, this simple, all-you-can-eat setup should optimize happiness among our visitors — people can come to the buffet hungry, eat as much as they wish to, and then leave once they feel full.  

Thankfully, our observations from the first few weeks of operations show us that most patrons of the HP Buffet are quite happy with our service and food.  However,  we also notice a pattern that troubles us a bit — it seems as if some of our visitors overeat at the buffet; when they do, they seem very dissatisfied with their experience.  

Understanding Customer Satisfaction: Exploratory Data Analysis 

In order to better understand the relationship between food consumption and overall satisfaction at HP, we conducted a simple exit survey for patrons.  For a three-week period, as our guests left the premises, we simply asked them whether they felt satisfied, with the only possible responses being “Yes” or “No.”  After recording each guest’s answer, we then directed our staff of interns to review the footage from our surveillance cameras to painstakingly record each of these patrons’ item-by-item consumption totals during their visit.  We matched the survey answers from each patron with his/her consumption stats in order to generate the dataframe used for this analysis.  

The data indicated that nearly 70 percent of the 5847 guests in our survey left the buffet feeling satisfied.  Of course, we can improve on this, but these numbers look encouraging:

As we dug deeper into the satisfaction survey results, we discovered some pattern that confirmed our earlier suspicions:   for many menu items, consumers’ average satisfaction levels simply went higher as more units were consumed. For some other menu items, however, customers’ happiness initially increased with additional consumption, before leveling off and then even decreasing once a guest passed some particular threshold of units consumed.  

For instance, let’s take a look at crab rangoon consumption:

At first, additional units of crab rangoon consumption seem to suggest higher likelihood of patron satisfaction — note the improved satisfaction average among guests who had more than two pieces of crab rangoon.  However, average satisfaction diminishes sharply once a patron exceeds eight servings.  Why?  We’re not entirely sure why.  Perhaps over-consumption of crab rangoon makes our guests feel bloated, tired, or nauseous. We will defer the “why” question for another time, but here, we will concentrate on what the data reveals.  

With some menu items, the drop-off in satisfaction wasn’t as dramatic.  With bulgogi, for instance, average satisfaction hovers near the overall restaurant mean for patrons who have eaten 0, 1, or 2 servings.  A third serving of bulgogi, however, is associated with a precipitous decline in guest satisfaction, as noted by the barplot below.  

Logistic Regression:  Model Performance 

First, we used logistic regression to model the relationship between consumption and satisfaction.  After randomly assigning 70 percent of our records to a training set and 30 percent to a test set, we used the LogisticRegression module from scikit-learn to fit a model.  The model included all of the food items as inputs, with ‘satisfaction’ as the outcome variable.   

The table below shows the coefficients associated with each of our menu items:  

We will say more about these coefficients, and their interpretation, in the “Comparing the Models” section.  

The model’s accuracy of 74.64 percent compares favorably to the null rate of 68.43 percent, but not by a wide margin.  The model’s precision rate, shown below, indicates that of the times that the model predicts a customer will answer “Yes” to the satisfaction survey, it is correct 76.24 percent of the time.  

The Receiver Operating Characteristic (ROC) Curve, shown below, informs us about the model’s True Positive Rate and False Positive Rate at various classification thresholds.  We will use a separate article to cover the ROC curve in greater depth, but for now, we will simply note that an ideal model’s ROC curve would rise from the lower left corner upward, before moving from the upper left corner to the upper right corner.  The diagonal line on the graph shown below represents the performance of a model that randomly assigned records to either class.  The more closely the model approximates the ideal situation described above, the higher its Area under the Curve (AUC) metric will be.  As indicated below, our model’s AUC value is .7282.

Random Forest:  Model Performance 

Next, we built a random forest model, using the RandomForestClassifier module from scikit-learn, with the same set of variables.  

As shown below, the random forest model demonstrated considerably better performance, as measured both by its accuracy and its precision.

After calculating those basic classification performance metrics, we went on to construct the ROC curve shown below.  By the AUC metric, the superior performance of our random forest model, compared to the logistic regression model, becomes evident.  The random forest AUC score was a 25 percent improvement over the AUC score from the logistic regression model.

Comparing the Results   

To understand why a tree-based model delivers superior results to the logistic regression model when working with this dataset, we should take a moment to compare and contrast the way these models are built.  

A logistic regression model is fit to the data through a process called Maximum Likelihood Estimate (MLE).  MLE generates coefficients for each predictor that will maximize the likelihood of correctly placing a record into the correct outcome class.   The coefficient values associated with each predictor tell us how much the log-odds of a “1” class outcome change after a one-unit increase in the predictor’s value; when using the method shown here, each predictor can take only a single coefficient value.  

This “one-size-fits-all” limitation with logistic regression coefficients explains why this type of model falls short when trying to predict outcomes at the HP Buffet.  For some of our foods, logistic regression works well.  For instance, with pizza or with spare ribs, more consumption is associated with a greater likelihood of satisfaction in a predictable, linear way.   

For the foods that make our guests happy when consumed within a particular range, though, the logistic regression model just gets confused.  The negative coefficients for crab rangoon, sesame chicken, bulgogi, and egg rolls suggest that any consumption of these items decreases the log-odds of satisfaction.  

However, a close inspection of what really happens with the data tells a different, more nuanced story.  Let’s take a look at the relationship between sesame chicken consumption and satisfaction:

In our entire sample, there were only two patrons who ate fewer than three pieces of sesame chicken, so the first value here can be dismissed as insignificant.  But look at what happens as a person’s sesame chicken consumption rises from three to four to five pieces — satisfaction increases remarkably.  In fact, a person who consumes five pieces of sesame chicken is far likelier to be satisfied than is a randomly-selected visitor from the whole dataset.  

A random forest is the result of an ensemble of tree models.  With tree models, there is no assumption of linearity — the tree can simply split records into groups based on rules.  These rules are not subject to any specific parameters or assumptions, but are simply determined through a mathematical process that maximizes the homogeneity of the resulting groups.  With the HP Buffet data, a rule could say “IF crab rangoon <= 8, patron will be satisfied, but IF crab_rangoon > 8, patron will not be satisfied.”   The same general concept could be applied to egg rolls, sesame chicken, bulgogi, or anything else — the structure of the tree models enables the type of nuance that the logistic regression model misses.  

Conclusions   

For handling the HP Buffet consumption data in order to predict guest satisfaction, the random forest is clearly superior to logistic regression. However, every dataset is unique, and we do not mean to suggest that random forests are necessarily “better” than logistic regression models.  

A big advantage of logistic regression models is their interpretability.  When the input-outcome relationships are more linear and straightforward, the coefficients genplaceerated through the MLE process are quite useful, as they quantify the relationship between input variable values and expected outcomes.  A logistic regression model can tell us not only whether some particular predictor is more likely to push a record towards the “0” class or the “1” class, but also how strongly this predictor influences the log-odds.  While a random forest model often delivers impressive overall results, and delivers useful information about overall feature importance in determining outcomes, it does not offer the same detail about the input-outcome relationship that logistic regression does.  When a dataset presents quirky, non-linear variable relationships like the ones at the HP Buffet, however, logistic regression models are likely to miss the mark.  

The author is a Master’s Degree student in Applied Business Analytics.  He will graduate in Fall 2020.  His co-author is a Senior Lecturer in Applied Business Analytics at Boston University.