On episode #15 we briefly discussed
“Big Data. Is it real or a buzzword?”
Zack and Bill tied Big Data’s definition to volume. Jordan also mentioned volume of data, and started to say something about technology. My position was (and remains), “I have no idea. I’ll deal with whatever my clients hand me.”
(Scroll to the bottom of this page to see the discussion.)
As Data Science and Big Data increase in sexiness, getting a definition for Big Data is something akin to asking questions of the Cheshire Cat in Alice in Wonderland, the response is likely to be riddles and more confusion.
Recently, Jenna Dutcher published an interesting blogpost compiling 43 responses to the question:
Jenna is community relations manager for [email protected], UC Berkeley School of Information’s online masters in data science.
The 43 responses came from a good mix of people; university professors, data scientists, CIOs, an economist, a science blogger … an overall good group of people who know their way around data. As I read through their definitions I noticed some recurring themes:
I decided to parse the responses into categories, allowing a definition to fit into multiple categories; e.g., if a definition included size and complexity, I added it to both categories.
10 responses describe Big Data as a condition more than a thing. The Access to Everything category summarizes the “if you want it, come and get it” perspective. Like this quote from Rohan Deuskar:
Peter Skomoroch, formerly with LinkedIn:
… Many features and signals can only be observed by collecting massive amounts of data (for example, the relationships across an entire social network), and would not be detected using smaller samples …
This points to something very interesting about working with data: do you have enough to derive meaningful insights?
Peter suggests that Big Data is a situation where you need so much data that the minimum that you need begins to create problems. Therefore, Big Data isn’t just a massive frikken amount of details just because you can get them. No. It’s got to serve a purpose and be necessary.
Deirdre Mulligan Associate Professor at UC Berkeley School of Information
Big data: Endless possibilities or cradle-to-grave shackles, depending upon the political, ethical, and legal choices we make.
That stands by itself. No comments from me.
Raymond Yee, PhD
Software Developer at unglue.it
Big data enchants us with the promise of new insights. Let’s not forget the knowledge hidden in the small data right before us.
Ah! The “shiny new thing” alert. So, it’s not so much a definition as it is a reminder to not lose perspective. Sure, we can access all manner of data, take our inquiries down curious alleys, look for unexpected correlations, and load excessive required fields on forms, but we have to ask:
And a big concern of mine:
Yup! There’s knowledge hidden in the small data … if we have the means to dig it out.
Similar to dealing with the Cheshire Cat, we’ve got no clear answer to what Big Data is. We have more possibilities and more questions. However, for the moment, let’s agree that Big Data is a condition. Most of us won’t ever lay hands on it.
The real issue of Big Data is the information that’s available to those who do have the resources to access anything they want (ethically or unethically), manage the complexity, do the analysis, and use means that weren’t available yesterday. That impacts people who don’t sling data, and openly hate Excel. Privacy, identity theft, and those creepy ads that follow you from website to website … that’s the wrong side, the dark side of whatever we call Big Data: when a regular person starts wondering,”how the hell do they do that, and why can’t I make it stop?”
And that brings us back to Deirdre Mulligan’s definition: endless possibilities or cradle-to-grave shackles.
The Big Data discussion is 26:50 to 32:20
Cheshire Cat image courtesy of nightgrowler
Please log in again. The login page will open in a new window. After logging in you can close it and return to this page.