Buried deep inside data warehouses and Big Data clouds are some heavy questions and assumptions about the future of the Internet of Things. Aging data inside, and metadata about IoT devices has its own, unseen built-in bias.
Recognizing those flaws is a hot topic, getting its 15 minutes of fame as researchers and computer scientists caution us on the dangers of becoming too reliant on data. The book “Weapons of Math Destruction” by data scientist Carol O’Neil warns that algorithms decide what we see – or don’t see –on the Internet, and users often don’t control their IoT equipment and logs with location, intention, and preferences.
O’Neil was a quant analyst on Wall Street before turning her sights on how models and databases can be misleading. Her blog explores data’s impact on everything from careers to finance and industrial forecasting.
She suggests that having more data doesn’t ensure better IoT analytics. There still needs to be a human to ask challenging questions, identify different data sources to shore up your information, and refine it into knowledge.
According to ZoomInfo, 30 percent of people change jobs each year, and 37 percent of email addresses change – factors that can kill a B2B contact database.
The US Department of Labor projects a turnover rate in corporate America as high as 40 percent each year. If people are that mobile and changeable, what about data in your industrial or operations set? What new categories of data can you gather now that weren’t possible a few years ago to improve client knowledge, experience or pressure test against existing data?
Another data weakness is the original, human database builders. It’s possible that data left unquestioned for social or non-technical bias can lead to wrong conclusions, says Internet researcher Danah Boyd. Consider a data set of people who were arrested and how that data shifts over time based on the definitions of crime, economic factors, demographics, and other circumstances.
“We all bring with us our baggage and we all bring with us cultural biases,” she told a Techonomy conference this past spring. “We need ways of auditing, change over time and a need to root out discrimination and unintended consequences.”
Some social or economic assumptions based on historical data can be wrong because of values imposed on them by programmers and users. For example, a database of purchase records only shows outcomes – not the total number of interactions where no action was made. Other flaws include blindspots in artificial intelligence, where the same word is used with multiple meanings depending on an organization’s view from finance, operations, sales, or marketing.
Sometimes you just need a different kind of data to solve the puzzle. Last year at LiveWorx 16, a demonstration relayed a “Living Predictive Model” that defines outcomes and statistical models. For Diebold Corp. that meant looking not just at the performance of their equipment, but also looking for other factors affecting failure of automated teller machines (ATMs).
Diebold Corp. wanted to predict service needs for ATM and credit card readers, and it came down to number of uses, but humidity was a factor if it rained within last 24 hours – making weather a leading indicator.
Even the location of data can be as important as verifying its security and accuracy, according to a Bloomberg Business Week article, “Building a National Fortress in the Cloud.” Having the same language, cultural background, and other factors made a significant difference.
Auto parts supplier Robert Bosch GMbH began offering data storage and stewardship to its customers and quickly got the attention of other companies asking for such a service. By providing a trusted source of data security, the company created an opportunity to both safeguard data while securely sharing other details.
Bosch developed a ‘Connected Industry’ initiative that is keeping supplier data in Germany and adhering to Europe’s strict privacy laws.
There’s always more data, but adding different models and predictions allows companies to both find their flaws and uncover unexpected correlations. What’s lurking in your data?