Over the past few decades, computers and machines have progressively improved their abilities to process and analyze different forms of data. Initially software applications and the data they harness and transmit revolved around the 2D desktop computer -- things like numbers, text, and email. Data representing the physical world had to be manually input into industrial machinery and people made decisions without full context into the environment.
The recent proliferation of the Internet of Things (IoT) accompanied by artificial intelligence (AI), and ubiquitous computing, is fueling the rapid advancements of computer vision (CV) and correlating opportunities for innovative applications.
Computer vision enables machines and people to see, perceive, interpret, and interact with the physical world through processing visual data into digitally driven insights. CV typically requires vision-based sensors or systems, like cameras, to capture these pixels of the physical world but can leverage other types of images. Examples include MRI scans, imaging radar, building designs, CAD files, or satellite imagery.
Typically, this visual or image data is used to train deep learning models through neural networks and form an inferenced AI model to run in the real world.
PTC Computer Vision Lead, John Schavemaker explains further “AI for CV” can run in two modes: “recognize the known” or “structure the unknown” based on visual features. The first mode is trained “supervised” with (human) labeled examples, the second is trained “unsupervised” in which the machine will “cluster” data in similar patterns itself.
These AI-driven techniques are particularly powerful for processing the immense and complex unstructured visual data in the world and are spurring the next generation of computer vision applications from autonomous driving to disease detection.
In this blog, we’ll dive into various computer vision applications and analyze their impacts on humans, machines, and spaces.
CV impeding on human visual elements can be daunting for many to accept since ‘seeing is believing’, right? Well, that phrase is actually inaccurate -- our visual perception follows Gestalt patterns, which often paints a picture of the world with incomplete data. CV can actually help humans cut through this limited view and provide predictions as to what may happen next, potentially saving lives.
For example, there are around 100,000 known vehicle accidents involving drowsy driving, resulting in over 1,500 fatalities. Computer vision systems embedded in in-vehicle dash cameras are increasingly capable of detecting drowsy driving through analyzing eye blinking intervals, head positioning, yawning, and other human behaviors. Detecting these behaviors can trigger an audio alert to awake the driver.
It can also inform another CV vehicle application called advanced driver-assistance systems (ADAS). ADAS use a mix of sensors increasingly including outward-facing cameras and LiDARs to detect driving environment events and issue immediate actions such as audio or visual alerting of an impending collision and emergency braking in fractions of a second, much quicker than a driver’s potential reaction time. ADAS is estimated to potentially prevent over 20,000 deaths a year.
These CV examples illustrate how AI and sensors can work together to perceive events – much more keenly than what is capable by humans alone, and ultimately create a safer world.
While computer vision can be embedded onto our machines (like vehicles) to detect human behavior, it can also be overlaid directly onto our eyesight through augmented reality (AR).
Augmented reality is emerging as the frontline worker-native computer vision 3D aware digital lens into the physical world. While you don’t necessarily need AI to create a CV application that drives business value, its involvement will only increase the breadth of capabilities.
Take this computer vision example on real-time spare part recognition from PTC’s Vuforia Innovation Runway team; using computer vision and AI to recognize the exact parts (fuse, car rim, screws, brackets etc.) we can see its specific characteristics, how many are in stock, and potentially other information.
Senior Software Development Engineer and innovation runway project owner Eva Agapaki further explains: "We trained this computer vision model based entirely on CAD data, which drastically reduced the amount of AI model development time that would typically require a large data set of real world images of spare parts. Industrial companies have an abundance of CAD data at their disposal and can use this IP to train AI models and computer vision to automatically recognize an array of products, parts, objects, and images."
The industrial world is complex, and it is a daily occurrence for frontline workers to interface with the unknown, creating business challenges and hindering productivity. In this example, traditionally a technician may have to think back to training manuals or identify the exact paper-based operating procedures to locate the right information on a single spare part among thousands of other very similar spare parts.
With an AI-embedded algorithm operating computer vision on augmented reality, the technician can instantly see all of the information he/she needs and reduce the points of failure in the process.
Computer vision is here to augment our visual abilities, not replace them. With smart applications, it can make us safer and more productive than ever before.
Machines have traditionally been unaware of their physical surroundings and depended on specific human inputs to instruct or program their actions. Increasingly through sensorization, the Industrial Internet of Things, AI, and spatial computing, machine situational awareness is improving.
Computer vision is a capability available to machines enabled by these converging and emerging technologies to understand and navigate through the physical world.
The autonomous vehicle (AV) is a great example of a complex computer vision application revolutionizing automotive competitive landscapes and the transportation ecosystem. An AV must ingest volumes of different dynamic (vehicles, pedestrians) and static (infrastructure, trees, signs) visual data and acuate motorized decisions in fractions of a second.
Sourcing these extensive volumes of real-world data to train AI models to handle every driving scenarios faces requires significant resourcing creating AV testing bottlenecks. A resolution for AVs and for other AI training data shortcomings, Schavemaker points out is that "these AI systems can be trained for a large part from synthetic data: simulated digital environments (like roadways) that closely mimic the physical world." Automated mobile robots similarly pick-and-place certain objects in synchronization with workers in warehouses.
Not every machine-oriented CV application needs this comprehensive computational overhaul and sophisticated movements to create business value. Machines have been used for quality inspection and verification for a while now (sometimes called machine vision) and these applications are increasingly improved by AI through deep learning.
By training machines on what a product or object should look like at a point in a manufacturing process, it can quickly detect and remove anything with defects. This is a critical CV application for fast-moving production lines with large volumes of shifting items where maintaining uptime and product yield is a competitive requirement. Expanding this machine CV quality use case can look like an agricultural tractor knowing to only pick a certain crop based on ripeness or a drone detecting a portion of the field’s crops rotting.
CV only improves machines perception of the surrounding environment, which can drive an array of business and societal value across quality, productivity, and mobility.
Another emerging area where CV is starting to gain clarity is its application over spaces or environments. Analyzing the web of interactions between spaces and things (people, machines, objects, etc.) unlocks opportunities for optimizations and efficiencies.
Schavemaker explains the emerging spatial concept as a holistic approach where “you are going to optimize the 'whole' by carefully 'optimizing' entities and their interactions as part of the whole where small optimizations may result in large benefits for the whole system."
The previously mentioned agriculture drone example could also pertain to similar bird’s eye view use cases beyond quality that improve crop yield, monitor and control pesticides, or track livestock. Embedding a CV system into our transportation infrastructure provides traffic flow analysis for governmental agencies aiming to lower a city’s carbon footprint. CV can determine parking occupancy and automate ticketing increasing city revenues (maybe not everyone’s favorite CV use case). In retail, it tracks customer movements to optimize product placements and sales/discounting opportunities.
In manufacturing, the workforce represents a massive productivity challenge as well as opportunity to generate efficiencies. Now with computer vision we can capture workflow data and with spatial analytics identify patterns for optimization and reduce bottlenecks.
In the below video, PTC President and CEO Jim Heppelmann explains this computer vision example and the potentially massive opportunity spatial analytics can have on workforce performance measurement.
CV can also detect potentially hazardous events common in the industrial world and ensure the safety of frontline workers through alerts and other mitigation techniques. Computer vision intertwined with spatial computing can further understand these 3D environments creating powerful digital twins of places with critical insights.
There is a lot of overlap between these computer vision examples across humans, machines, and environments, but the resounding theme is to put the CV lens on the world to better understand it and predict important elements within it. That may be a human recognizing a unique object in an industrial environment, a machine making complex autonomous movements instantaneous, or analyzing objects behaviors in a space.
Computer vision will become increasingly pervasive in industry and society. Aligning winning use cases within a digital transformation strategy provides a path forward to capitalize on these technological innovations.
David Immerman is as a Consulting Analyst for the TMT Consulting team based in Boston, MA. Prior to S&P Market Intelligence, David ran competitive intelligence for a supply chain risk management software startup and provided thought leadership and market research for an industrial software provider. Previously, David was an industry analyst in 451 Research’s Internet of Things channel primarily covering the smart transportation and automotive technology markets.