IoT and Big Data: The Dichotomy Between Too Much and Too Little

The Internet of Things (IoT) is all about data – lots of data. In fact, there is so much data now that data centers are becoming overwhelmed, as this previous blog discussed.

Let’s cycle back a minute and think about the whole concept of collecting data. Gartner Research says 85% of Fortune 500 organizations will be unable to exploit big data for a competitive advantage. A report from McKinsey & Company echoes this, using an example of an oil rig that has 30,000 sensors collecting data. Typically only 1% of it is examined and that data is mostly used for anomaly detection (alarms) and control, leaving many other potential uses untapped.

So are we collecting too much data? Brewster McCracken, the president and CEO of Pecan Street, an organization dedicated to advancing university research and accelerating innovation in water and energy, thinks so. He and his team built Dataport, the world’s largest source of disaggregated customer energy data for university researchers around the world. “We’ve taken a consumer-grade data measurement tool for solar panels, and are using it to operate the world’s largest research database on customer energy use,” he says.

Here’s McCracken’s perspective on data collection. “I tell people to get as little data as possible to get the job done,” he suggests. His reasoning is that data needs to be downloaded, stored, organized, curated, analyzed, and converted into something meaningful – and of course, it must be protected. “If you’re getting more data than you need to do a particular job, there is a great deal of costly effort being expended. That’s a major diversion from a company’s objective for why it’s collecting the data in the first place.”

McCracken advises that determining what data to collect is a critical step. “It all depends on the job that you are trying to do,” he says. “For example, in the case of advanced metering, the first question to ask, ‘what do you want to meter’ and ‘why.’ If it’s for billing, then you can pull data from a smart meter once a month, and anything beyond that is more data than needed. However, if the goal is to detect water leaks in homes, then a smart meter will need to measure tenths of a gallon at least hourly and report back in real time.”

These are very different use cases which require very different amounts of data to be managed. Another example of the too little, too much data dilemma is around personalization information. McCracken advises that keeping data anonymized is a good example of following his golden rule of getting as little data as possible. “If you’re using data for feedback, there is absolutely zero need to have people’s identities attached to that,” says McCracken. “You don’t need to add the complexity, risk, or intrusion of getting people’s identity if you’re simply trying to get data on use patterns to make a product work better. People’s identifiable information does not help make products work better. In fact, it makes the job of making the product work better more difficult unnecessarily.”

Okay, collect as little data as possible, right? But wait, there is the dichotomy here. While McCracken firmly is rooted in the belief of collecting as little data as possible, he also believes that most IoT devices do not collect enough data to do a job well.

“My perspective on the IoT side of things is that generally the devices do not get enough data to do the job well,” say McCracken. “Let’s use the example of the water meter and the goal of billing data versus leak detection. There is a fair amount of effort expended in reading water usage daily, a task that is certainly worthwhile for billing purposes. But a little more data from the same meter adds a very high-value service for people, such as finding out they have leaks in their home. This is something that people care about, as it’s a problem that many experience. So, simply through the addition of a little more data, there is a lot of value added.”

The moral of the story? If you can answer the question of whether you are achieving your objective with a yes, then you’re on the right track. If you’re only using a small portion of the data collected to reach your goal, you may be capturing too much and adding a lot of unnecessary expense into the initiative. If you answer no, then you might need to collect a bit more. And beyond achieving your objective, it’s key to ask the question, “What other value can be added here?”

It’s a tricky balance, but one worth examining, for sure!

Image by James Moran on Flickr (CC BY 2.0)