The Future of Big Data
What is the future of so-called ‘Big Data’? Big data is a large volume of information under analysis by computers in an effort to comprehend human behavioral patterns. More often than not, the behaviors that are being observed and studied are related to either spending or voting.
I spent over a decade of my career as a data analyst so here are my predictions. Come along with me and we’ll go to ten years from now.
Have you ever struck up a conversation with a stranger on an airplane? If so, then at some point, you might have compared what you paid for airfare. For your trip to, say, Albuquerque, you might have paid $450. Your seatmate might have paid $250.
The people in the two-seater across the aisle from you might have paid via frequent flyer miles, and they may have gotten those points via flights that cost $1,000 or as low as $100. What’s going on here?
First Degree Price Discrimination
Everybody on the airplane is subject to what’s called ‘first degree price discrimination‘, which, according to Adam Ozimek of Forbes, “involves charging every individual customer a price based on their individual willingness to pay.”
Now, you probably would have preferred paying $250 to $450, particularly when it seems that your seatmate’s experience is identical to your own. But you might not have been given the opportunity to do so.
Or maybe you were, either by the time you were booking, or where you were booking from (either your IP address or where you surfed in from), and you didn’t know it at the time.
But here we are, ten years from now. And guess what? First degree price discrimination is the rule, and not the exception. You go to buy groceries, and you are shown choices.
Green bananas cost more than yellow ones, because you can store them longer, and so can the supermarket. A mixed salad costs more than the fixings not only because of the labor involved in putting it together, but also because you are willing to pay extra for the convenience.
You have a choice between a whole chicken and one that’s cut into pieces. Combine it with broccoli and pasta and you get offers for soy sauce or tomato sauce, and the prices depend not only on what you paid last week, but also on your spending habits. Are you more likely to cook Italian or Chinese style foods?
That will also determine which prices the market offers you, as will the supermarket’s stock and the expiration dates for the sauces.
Messing with the System
You can really throw a monkey wrench into things if you step out of character and throw a party, and shop for it. Suddenly the system might think you have a dozen teenagers, based on all the pizza and chips you bought.
In some ways, it’s the electronic equivalent of an outdoor market. But instead of people haggling over rugs or spices, it’s the use of big data, as the supermarket attempts to predict what you’ll pay, what you’ll buy, and what will keep you coming back. How do you beat it? Current conventional wisdom is to clear cookies, surf privately, and be patient and watch for changes.
But what if you need it now? And what if this is all happening in the grocery aisles or at the checkout counter? About the only things you can do are to pay in cash or put your purchases back, thereby opting out completely.
Here we are, still ten years from now. And there’s even more social stratification. The rich are richer. The poor haven’t budged much. The middle class is even more squeezed. Why?
This is another issue with big data – biases. So much attention is paid to the quantity of data that its quality can sometimes be overlooked, as can its relevance, or the reason for the quantity. Take the tweeting that goes on after a disaster. During the manhunt for the Boston Marathon bomber, all sorts of tweets came from Boston and Cambridge.
But how many tweets are there currently about a tsunami (these disasters often occur in poorer countries, although not always) and Germanwings?
The bias is heavily in favor of more tweets about the Germanwings air disaster, versus a nonspecific tsunami. When just looking at raw numbers, Germanwings looks like a far more important news story. But is it? Or are there just more tweets about it because (a) it’s new (as of the writing of this blog post) and (b) it happened in the West, where there’s more use of Twitter?
Big Data and Selection Bias
It’s a bit of a selection bias, too, as readers might select or retweet information about the Germanwings crash as they’ve heard of it, and then more retweet and it’s a self-fulfilling prophecy that the information will continue spread at a more rapid rate than news of tsunamis.
Ten years from now, we might not even notice the selection biases going on around us, or that we ourselves have made. After all, we’ve told Facebook or its successor, and all news outlets, that news about, say, dolphins, is important to us.
Hence we see more and more tales of dolphins. Whereas stories of famine or elections or the like aren’t served up quite as quickly as we, and a statistically significant portion of our peers, continue to choose fluff pieces and familiar storylines over hard news, particularly if it’s about faraway places.
Make Big Data Smaller
How do we get off this train? Let’s come back to the present time. And let’s deliver hard news even if it’s not necessarily requested, because it matters. Let’s make pricing more transparent to consumers. And let’s look for reasons for data quantity and popularity that go beyond numbers. Just because there’s more of something, doesn’t make it better or more important. It just means there’s more of it.