First off, a special thanks goes out to the Social Media Club of Boston, which was able to finagle some tickets. Their blog post about the event includes the complete audio from the session. Chuck Tanowitz and Todd Van Hoosear were gracious hosts as always. Please be sure to vote for them as one of the best up and coming blogs.
Mr. Baker started off by asking the audience how many zeroes they thought grace the number of storage units in an exobyte. That is, it’s a one followed by how many zeroes? If you said eighteen, you’re right.
There is a coming avalanche of data, and we are all going to have to deal with it, process it, store it, synthesize it, analyze it and, on balance, triage it. It cannot possibly all fit into our heads.
An analogy was drawn between the eating habits of hunter-gatherer societies versus the effects of our current diet. Way back when, we could and did eat everything we came across, because food was scarce and you quite literally didn’t know where your next meal was coming from. Evolution favored gluttony.
But times have changed and we can no longer just grab everything out there. Ours is a far more sedentary lifestyle, to be sure, but the issue is also one of exceptional plenty. Our cave ancestors would have happily dined on our table scraps. But now we have buffet banquets virtually every night, or we can. We cannot — if we wish to safeguard our health — consume it all.
And the same is true of data. We cannot get it all in, and it’s no good for us if we even try.
Next the talk turned to understanding data and making sense of it. We have all heard the tired expression (and probably used it), garbage in, garbage out. And while that is still essentially true, when you are dealing with a tidal wave of data, it may be enough to understand just a little tiny piece of the puzzle. Understanding what is true today is vital, even if all you get is an inkling.
You can create and elicit detail, but don’t overdo it. The detail still needs to be useful, e. g. just because you can get some detail or a particular metric, does not mean you need it or that it can produce any value for you whatsoever.
Then the talk turned to Mr. Baker’s main interest, which is privacy and data mining/metrics. There are two areas which are getting rather exciting.
The first is human sensors. Imagine, for example, that you have elderly parents, and you don’t live near them (which may very well be true). Would you want the peace of mind that could come with knowing more about their whereabouts at any given time? Wouldn’t you wish to know if, one day, they failed to get out of bed? For example, there is a British insurance company, Norwich Union, which is offering black boxes for automobiles. Would you accept one if your elderly parents’ driving could be monitored?
What if you’ve got teenagers? Would you accept a sensor to monitor whether they went to school or work? Would you leap at the chance to get a black box in the car they drive so as to enhance their safety? And, not coincidentally, reduce your liability rates?
Now, what about you? Would you allow such sensors to be tracking and tagging you?
The audience’s answers were understandable — they’d accept a sensor to assure more peace of mind and, hopefully, more security for their aging parents. And they would tag their children with sensors if it meant enhanced safety and reduced premiums. As for themselves, they were more reluctant to volunteer, but the promise of a reduction in auto insurance rates made the offer more tempting to some.
Essentially, we are willing to impose this technology on the people we’re responsible for. And, when the time comes, the younger generation may very well impose it upon us. By that time, if we have been monitoring them, they will see it as normal, and a longstanding tradition of privacy, which goes back to the Bible, could be wiped out in a generation or two.
But there’s more. Sense Networks is a company which analyzes cel phone data and aggregates people into tribes based upon their behaviors. For example, there might be a group which regularly goes to a nightclub or restaurant, so the marketing possibilities leap out. However, there is some misinterpretation of the data (or at least the data does not necessarily lend itself to easy observations) — that nightspot visited at 2 AM every night by one tribe might be a hospital. Are they doctors? Patients? Visitors? Brawlers? It’s hard to determine that without context.
Jeopardy! is particularly challenging because the questions are often multipartite and rather convoluted. Watson, like any human contestant, would be expected to parse what seem to be unrelated pieces of information and synthesize them into a whole. Here’s a somewhat more straightforward clue (it’s the Clue of the Day from June 22, 2010): the category is 12-letter words: an archaeologist who specializes in the land of the pharaohs and its artifacts. If Watson (or any other contestant) spots the key words (in this case, they’re archaeologist and pharaohs), the answer comes fairly readily: egyptologist.
But Watson doesn’t always get that, and can have trouble making linkages. Human language, which was the real subject being covered, has elements that are huge stumbling blocks for computers, namely, the aforementioned context and intonation.
A typical sentence might be: I didn’t say she stole my money. This simple seven-word sentence has seven possible meanings and they are solely based upon intonation:
- I didn’t say she stole my money. – It wasn’t me who said it, it was him.
- I didn’t say she stole my money. – A denial.
- I didn’t say she stole my money. – I wrote it down.
- I didn’t say she stole my money. – It was her brother.
- I didn’t say she stole my money. – She just borrowed it.
- I didn’t say she stole my money. – It was your money.
- I didn’t say she stole my money. – It was my car.
We understand this things readily. Watson, on the other hand, has a lot of trouble. As data consumers, we need to figure out how to get value from sensors, and Watson the computer needs to figure out how to get value from the nuances of language.
Finally, what do we humans need to know, as the computers that surround us become smarter and smarter, and store more and more material? Our personal internal question must be: how can we best optimize the torrent of data that is swirling about our ears?
I’ve got his book on my Amazon wish list now.