We are constantly manipulated by statistics, some from the government, some from vendors, often from those we trust. However, sometimes those parties are not worthy of that trust. Sometimes they are being manipulated themselves and so the figures they present can be dangerously misleading.
Given how often we are manipulated by statistics it would seem that, for our own defense, the education system should have stepped up and given us more of the basics on how to interpret them. The statistics I am referring to this week are those recently disseminated by the Open Source Development Labs. I may disagree from time to time with the OSDL but, up until now, I have never doubted the group’s integrity.
That changed last week when I received a link to the group’s “Get the Truth on Linux Management” report. I read it, and to be frank, I would be hard pressed to find a report as inherently inaccurate as this one, even from a vendor or a politician. I’m going to assume that this is because the group simply doesn’t understand statistics, but I fear it was because they liked the results so much they didn’t look critically at the method.
This is an all-too-common error. I’ll use this report to make a point on just how dangerous an intentionally invalid study can be.
‘Get the Facts’
Let’s start this with some history. Microsoft, in its aggressive move against open source, created a program called “Get the Facts” which provided a list of well funded resources that positioned Windows favorably against Linux.
OSDL raised objections to this research, successfully for the most part, attacking the funding source — Microsoft, and did a decent job of getting folks to not read the reports. Microsoft, meanwhile, went to increasingly greater efforts to improve the quality of the reports which, honestly, did little to address the distrust surrounding any vendor-funded research.
Now, you would think that if OSDL was going to do a research report of its own it would produce one that couldn’t easily be challenged or, worse, invalidated, because of obvious bias. You would think that the group’s leadership would realize that doing exactly what they accused Microsoft of doing would damage their own credibility — to Microsoft’s benefit. If you had made these assumptions, however, you would have been wrong.
The study released by OSDL was prepared by Enterprise Management Associates. Since I contend that it is not worth the paper it was printed on, let’s cover some basics.
The Danger of Bad Market Studies
First, be aware that programmers probably don’t have to study statistics and marketing studies in school like business majors do, but there are folks in most companies who actually know how studies are supposed to be done to be valid — and these folks are generally in senior positions with titles like CMO, CFO, CEO, and COO. In addition, Internal Auditors generally have business and finance degrees and use statistical sampling as a part of their jobs (this is how they test for a number of problems).
It is generally considered somewhat suicidal for a company to take an obviously biased study from any one research firm and use it to make an important decision. The result, regardless of the decision, will generally be a substantial reduction in credibility and perception of competence at best. At worst, it could put a decision maker on a much less favorable, career path rather quickly.
Poorly done studies put advocates at risk. One of the strongest advocacy groups in the marketplace, second only to Apple’s contingent, is the one that surrounds Linux. Unlike the unified front of Apple advocates, Linux advocates are more commonly found at war with each other on a variety of subjects. But that is no reason to put them all, as a group, at risk.
The Invalidity of the ‘Get the Truth’ Study
For a market study to be good — i.e., valid, you need a defined population, a sample that represents that population, an unbiased data path, and unbiased analysis of the data resulting in conclusions that can be supported by that data. Some bias won’t kill the study as long as you can identify the bias and factor it out. The goal of the study needs to be an accurate measurement of something and all aspects of the work need to be “open” for review and study to assure this goal.
A bad, or invalid, study is one where the results aren’t supported by the data. A good, short book I read in graduate school is Flaws and Fallacies in Statistical Thinking by Stephen K. Campbell out of the University of Denver. At less than 200 pages, it is a fun read, illustrated and full of examples including a study done in the Middle Ages estimating the number of demons in the world. (The total at the time was estimated to be around 7.5 million).
Now, back to the “Get the Truth on Linux Management” Study. This report falls so far onto the bad side that I’m having a hard time believing that even a novice wouldn’t notice. But it’s out there, so let’s point out what I consider to be its most obvious flaws:
- Population tested is unclear (IT organizations, generic respondents, CIOs and MIS Managers?)
- Sampling method is inconsistent, largely self-selecting, and inherently biased (i.e., not blind)
- Respondents are not of consistent level or responsibility (They might not even consistently know the answers to the questions asked, many may not even be employed, we have no way of knowing.)
- As company respondents appear to have vastly different locations and business types ranging from hosting companies to banks, there appears to be no test of consistency of product use. In other words, differences in products could have had more to do with how the products were used — nearly half of company respondents in the smallest bucket had under $5 million in revenue.
- Results in the body of the report are supported largely by anecdotal rather than statistical data (i.e., not representative).
- Results — in terms of pricing — are outside of survey scope and intentionally misleading (Package price is compared against single-year subscription price, for example).
- No test ensures that respondents consistently ran both Linux and Windows-based platforms or accounted for various versions or distributions. The report said 68 percent ran less than 20 Linux servers, 43 percent less than 10, but there were no server numbers provided for Windows use.
- A large percentage of the statistical results were Linux only and not compared to Windows except anecdotally (the entire Management Cost section is Linux only, for instance).
- Conclusions are not supported by the data. There is very little Windows data, sparse enterprise data, but a great deal of anecdotal commentary.
Intentionally misleading people is not something that should be encouraged. Microsoft has been accused of doing this in the past, but many of the firm’s recent reports have shown great improvement. Because this study, from the EMA, is advocated by the non-profit, seemingly benevolent OSDL, though, people might actually use it. They are in danger, however, of regretting that decision.
From my perspective, OSDL has given Microsoft the best thing the software giant could hope for, a way to easily discredit any of OSDL’s claims, future or present, on Linux superiority. This has certainly caused me to view OSDL differently.
Strangely, many of my own peers have not yet openly challenged this thing. I don’t know if that is because they didn’t read it or because they fear the infamous Linux reprisals. It really worries me when a poor-quality report like this gets accepted at face value. What else are we collectively missing that we should, in fact, be actively challenging?
Market Study Advice
Bad statistical reports are commonly used to manipulate people, most notably, in politics, but it can happen in a wide variety of consumer venues as well, for example, with the automotive market, which is somewhat famous for this practice. Don’t take any report from anyone at face value if you’re going to depend on the results. Look underneath and behind the results in every case, and make sure you aren’t being played.
If you want to get a scary sense of all this, pick up More Damned Lies and Statistics by Joel Best (a sequel to the excellent Damned Lies and Statistics), which focuses on how governments manipulate us with statistical reports.
Furthermore, if you are going to conduct a study, do it right or don’t do it at all. The most dangerous reports I’ve seen in my life are those done by internal competitive analysis groups designed to tell management what they want rather than need to know. The worst I have recently seen was one done by Time Warner to support the belief that HD-DVD would beat out Blu-ray in the market. To get the response it wanted the company actually marketed to the respondents in an attempt to shift the results in the direction it wanted. I pointed this out to Time Warner — and later the company moved to support both formats.
We are always going to be surrounded by companies and people who want to manipulate us and will trade their integrity for our favorable decisions. I strongly disagree with the practice. Regardless of your position on open source, you should at least be willing to spend a little time understanding statistics and how you can be manipulated by them before you put any stake in any statistical report’s conclusions.
Rob Enderle, a TechNewsWorld columnist, is the Principal Analyst for the Enderle Group, a consultancy that focuses on personal technology products and trends.