Scaling Data Science: How to Deal with Growing Volume, Complexity, and Speed

Machines generate data constantly, from their components, RFID tags, applications, servers, and sensors. If this data could be collected and analyzed, it would be of immense help in enabling business decisions.

Much of this machine data was originally generated for local and specific uses, such as troubleshooting, monitoring, debugging, compliance, and fraud protection. As a result, the protocols and formats are often idiosyncratic and proprietary.

What is needed is a way for professionals who aren’t analytics experts, and who have specific business goals, to interact with the massive amounts of data, and generate needed information from it, in a reasonable amount of time at a sensible cost.

Fortunately, a number of partners are finding ways to support their clients with analytics at scale.

Machine learning for machine data

Machine learning algorithms learn from the data they process and get better at extracting useful information without explicit instruction from the programmer. But machine learning isn’t a simple universal solution: different machine learning techniques work best on different kinds of data.

Machine logs are sometimes called “unstructured” data, which can be misleading. What data is considered “structured” or “unstructured” varies from domain to domain—and the IoT is unifying a wide range of different domains and industries, reducing the value of the older terminology.

Machine logs have a structure. It just isn’t a database-ready structure—and it varies from one type of log to another. Taking that structure and making it more widely usable plays to machine learning’s strengths.

Usability is key

Analytics skills at the required level will be in short supply for the foreseeable future. Fortunately, the world of IoT is one of partnership: no one will solve every problem with internal resources.

Partnership with an IoT platform can supply the analytics support needed, while allowing the business to focus on its own goals.

And, in addition, other ecosystem partners can turn the work of designing and deploying analytics solutions into something non-analytics staff can use and improve. There are many different ways to approach these problems.

Consider the choices of two different companies, National Instruments and Glassbeam.

National Instruments helps scientists and engineers without much programming experience manage and analyze the data from large networks of sensors. How much processing should occur at the sensor itself? What features should be extracted and sent on? NI provides a development environment with analytic functions that allows the user to focus on desired end results.

Glassbeam automates the process of converting machine logs into analytics-ready format, again allowing the user to focus on the problem to be solved, rather than the mechanics of analyzing it. Its analytics can compare fields and determine file structure without the need for human intervention.

Scale will only continue to grow

Several years ago Gartner noted that, while data will grow at 40 percent per year, IT resources will grow only by five percent per year. Simply throwing more resources at the problem will show limited returns. Partnering with someone who can provide smarter machine learning algorithms will allow analytics capabilities to keep up with data.

This report from O’Reilly discusses some of the techniques used at ThingWorx and two of its partners—Glassbeam and National Instruments—to automate and speed up analytics on IoT projects.