I like charts, graphs, plots, and other visualizations. So do business users. There’s only so much of a story you can tell with raw numbers. You have to understand the context in which the numbers exist and what they represent. Of course, math is extremely important for determining objective truths about datasets. The problem with raw data, however, is there are no statistical calculations or models that replace the human element of critical thinking. When it comes to analyzing data, you must start with the context before throwing equations and algorithms around.
There is no better representation of the pitfalls of summary statistics than Anscombe’s Quartet. Constructed in 1973 by English statistician Francis Anscombe, these number sets demonstrate the importance of context, as well as the effect outlying statistics have on overall statistical properties.
Let’s looks at Anscombe’s Quartet:
Here, we’re given four datasets. Each set has 11 coordinates just waiting to be analyzed. The interesting thing about these sets is that all of the basic stats are virtually identical.
The following stats are consistent across all sets:
- Mean of x and y
- Variance of x and y
- Simple linear regression
- Correlation
When we actually plot these points with Power BI and look at the resulting charts, however, some interesting patterns emerge:
Clearly, we have some interesting data sets that are wildly different in the story they tell, even though their basic statistics are the same.
In the data set “One”, we have a typical scatter showing a positive trend. Of particular note is set “Two”, if you think of this data as being real and organic. Let’s say the X-axis is age of a machine, and the Y-axis is the production rate. With this context, we can make an educated guess that there is a break-in period followed by degradation of performance as the machine gets older.
“Three” and “Four” show the power of statistical outliers and their effect on averages. To further illustrate this phenomenon, I analyzed the performance of a particular staff member on an IT helpdesk. Let’s call him “Bill”.
First, I asked Power BI Q&A how long it takes Bill to resolve a ticket on average:
These results are not great. Typically, we want to see tickets taken care of within a business day or two. We’re nearly reaching an average of three business days here.
Does this really tell the whole story of how quickly Bill is resolving tickets?
By looking at where his tickets actually fall, we see that the vast majority (~78%) are resolved in two or less business days. Remember: the essence of the business problem here is that people are happier when problems are solved quickly.
What about problems we know are going to take a while?
Let’s say there’s a scheduled on-site, or he’s waiting on a vendor. Typically, we would have a system in place for pausing these timers, but let’s filter out tickets marked as “non-time sensitive”, as well as a few statistical outliers that didn’t actually take as long as was recorded:
With a few context-based decisions, we have a much clearer picture of how quickly Bill is closing time-sensitive IT issues. At this point, I can see that Bill is closing well over 90% of his tickets in about two business days, which is much better than the grim picture painted by a simple question posed to our AI assistant.
The moral of the story? In order to get the most out of your organization’s data, don’t just throw math at it and call it a day. Look at it. Play with it. Give it that human element that adds context. Situational awareness coupled with well-kept statistics can tell you everything you need to know about the essence of the business problem; and, help you make changes that will better work for your business.
To learn more about how to improve your company’s business intelligence platform and practices, call the BI professionals at IronEdge Group today at (832) 910-9222 .
IronEdge Group Recognized on CRN’s 2024 MSP 500 List
Houston, Texas, February 12, 2024 — IronEdge Group is honored…