IoT Brings Big Benefits but More Challenges for Big Data

Better data produces better drivers.

That sounds a lot better than explaining that your job involves “modeling trains.” That might seem like you have an HO-gauge tabletop railroad – when actually, a large corporation took exabytes of data from Internet of Things-enabled freight trains and produced significant bottom-line results.

From its Watertown, N.Y. headquarters, New York Air Brake manufactures control equipment that engineers use to run, stop and track freight trains. Data gathered from IoT systems -- throttle, braking, weight, speed and other operations – proved to be as useful to corporate office executives as it is to locomotive engineers. Using real-time data collected from thousands of train trips, NYAB identified best practices about operations and, ultimately, showed how those habits saved fuel and produced greater efficiency.

One key was identifying “behavioral characteristics that define what is a good driving strategy,” said Greg Hrebek, director of engineering, in a corporate video. “You have a coach so that every driver is like your best driver.”

He outlined ways Splunk was used to crunch massive amounts of user data. One result was savings as much as 12 percent on customer fuel purchasing. Simulators now teach these habits and share the knowledge.

Integrating data, standardizing and digesting it all is a new frontier for companies like Splunk, Tamr, Authentica and others - replacing familiar IT names that used to be on the equipment itself such as IBM, Oracle and SAP. These newer firms are integrating data from various sources, formats and automating the process.

And better data begets more data…

Are you ready for terms like zettabyte? That’s a billion terabytes or a thousand exabytes. Big Data has grown up and now it gets really BIG – with the complexity of the number, format, location and accuracy of data. And a really massive data interchange creates new challenges to produce fresh insights.

Data silos that were called warehouses or marts are now being rechristened data lakes or ponds or streams. The watery analogies are apt as volumes rise, fall and flow through an organization. And they can overflow their banks – just like a flooded river unless levels are managed.

Humans will handle some of that oversight, identifying the actions and decisions supported by data. And a tandem of people and algorithms will manage the cleaning of inaccurate or badly formatted data that comes in from multiple systems. One big promise of Big Data is the ability to take huge volumes of information to find patterns, anomalies and new details.

Consider how easy it is to misunderstand change as accuracy. If a website or database lists two stores at the same address does it mean that one closed and the other replaced it? That happens all the time. Context and details such as time or location details make all the difference.

“The analysis of sensor data is going to require an automated approach to connecting, aligning and qualifying a tremendous amount of data from many thousands of new sources that are constantly changing. The only way to tackle this level of automated integration is using a machine-driven, human-guided approach to data curation,” said Andy Palmer, co-founder and CEO of Tamr.

Big Data projects are coming from all over the enterprise, and are not always being led by the CIO now that every department is using data to guide daily decisions. CIO magazine reports on just how many different hands are involved with Big Data. (LINK:

One estimate by analyst firm IDC found that only 10 percent of corporate data available gets used. The remaining 90 percent is “dark data” – meaning out-of-date, not visible, unstructured, untagged and untapped data that is found in data repositories and has not been analyzed or processed.

IDC projected that IoT will generate 4.4 trillion gigabytes of data worldwide by 2020 and fully 10 percent of that will be machine-generated.

In its five-year forecast for IoT growth, IDC projected that nearly 30 percent of server shipments will be required to handle the workload of ingesting and analyzing IoT data alone. Measuring, authenticating and exchanging data will lead to a standardization process by 2018 as machine-to-machine communication increases, said IDC’s Vernon Turner.

Image by Walter on Flickr (CC by 2.0)