On episode #15 we briefly discussed
“Big Data. Is it real or a buzzword?”
Zack and Bill tied Big Data’s definition to volume. Jordan also mentioned volume of data, and started to say something about technology. My position was (and remains), “I have no idea. I’ll deal with whatever my clients hand me.”
(Scroll to the bottom of this page to see the discussion.)
As Data Science and Big Data increase in sexiness, getting a definition for Big Data is something akin to asking questions of the Cheshire Cat in Alice in Wonderland, the response is likely to be riddles and more confusion.
Recently, Jenna Dutcher published an interesting blogpost compiling 43 responses to the question:
43 Definitions of Big Data
Jenna is community relations manager for datascience@berkeley, UC Berkeley School of Information’s online masters in data science.
The 43 responses came from a good mix of people; university professors, data scientists, CIOs, an economist, a science blogger … an overall good group of people who know their way around data. As I read through their definitions I noticed some recurring themes:
- A few were reluctant to answer
- Requiring new skill or methods
I decided to parse the responses into categories, allowing a definition to fit into multiple categories; e.g., if a definition included size and complexity, I added it to both categories.
QUICK-&-DIRTY SUMMARY OF THE BIG DATA DEFINITIONS
10 responses describe Big Data as a condition more than a thing. The Access to Everything category summarizes the “if you want it, come and get it” perspective. Like this quote from Rohan Deuskar:
MOST INTERESTING DEFINITIONS OF BIG DATA
Peter Skomoroch, formerly with LinkedIn:
… Many features and signals can only be observed by collecting massive amounts of data (for example, the relationships across an entire social network), and would not be detected using smaller samples …
This points to something very interesting about working with data: do you have enough to derive meaningful insights?
Peter suggests that Big Data is a situation where you need so much data that the minimum that you need begins to create problems. Therefore, Big Data isn’t just a massive frikken amount of details just because you can get them. No. It’s got to serve a purpose and be necessary.
Deirdre Mulligan Associate Professor at UC Berkeley School of Information
Big data: Endless possibilities or cradle-to-grave shackles, depending upon the political, ethical, and legal choices we make.
That stands by itself. No comments from me.
MY FAVORITE BIG DATA DEFINITION
Raymond Yee, PhD
Software Developer at unglue.it
Big data enchants us with the promise of new insights. Let’s not forget the knowledge hidden in the small data right before us.
Ah! The “shiny new thing” alert. So, it’s not so much a definition as it is a reminder to not lose perspective. Sure, we can access all manner of data, take our inquiries down curious alleys, look for unexpected correlations, and load excessive required fields on forms, but we have to ask:
- Do we really need to?
- Is it worth the costs of collection, cleansing, storing and analyzing?
And a big concern of mine:
- There’s a shortage of people who have the skill to collect, cleanse, store and analyze the data that’s already available to us. I cover this in a blogpost: Big Data Schmig Data.
Yup! There’s knowledge hidden in the small data … if we have the means to dig it out.
CLOSING THOUGHT ABOUT BIG DATA
Similar to dealing with the Cheshire Cat, we’ve got no clear answer to what Big Data is. We have more possibilities and more questions. However, for the moment, let’s agree that Big Data is a condition. Most of us won’t ever lay hands on it.
The real issue of Big Data is the information that’s available to those who do have the resources to access anything they want (ethically or unethically), manage the complexity, do the analysis, and use means that weren’t available yesterday. That impacts people who don’t sling data, and openly hate Excel. Privacy, identity theft, and those creepy ads that follow you from website to website … that’s the wrong side, the dark side of whatever we call Big Data: when a regular person starts wondering,”how the hell do they do that, and why can’t I make it stop?”
And that brings us back to Deirdre Mulligan’s definition: endless possibilities or cradle-to-grave shackles.
The Big Data discussion is 26:50 to 32:20
Cheshire Cat image courtesy of nightgrowler