Video Surveillance & Physical Security Industry Viewpoints
May 9th, 2017

What Does Deep Learning Enable in Video Analytics?

In my last blog post, I talked about the cost efficiency of GPU-powered video analytics, and how a major advantage of GPU processing is the enablement of deep learning techniques. In today’s post, I’d like to delve deeper into what deep learning is and what it enables video analytics solutions to achieve.

Deep Learning Techniques

Deep learning techniques use deep neural networks (DNNs) to train computer systems, imitating the way a human is taught and learns. Historically, deep learning has been possible since the 80s, but it took until now to really gain traction in video analytics because CPU-based processors were too slow for training neural networks effectively.

Today, deep learning, running on GPUs, can be used for efficiently detecting, classifying and recognizing features and objects in video. These capabilities have transformed the video analytics industry by allowing security applications to work out-of-the-box on a broad spectrum of scenarios. Increased coverage and cost-efficient processing allows systems to continuously process more cameras and aggregate metadata over time, making video more accessible. This, in turn helps users to gain deeper insights from previously unused video. Beyond video analytics, deep learning techniques “are crucial to unleashing improvements in robotics, autonomous drones, and, of course, self-driving cars” (Source: Why Deep Learning is Suddenly Changing Your Life).

Deep learning is a great development tool because it can complete many activities simultaneously. Multiple algorithms were once needed to compute different aspects of video analysis, but deep learning can solve many problems at once and, as it learns more, it becomes more equipped to solve more complex problems over time.

The main challenge of deep learning is the large amount of annotated data required for effective training. The annotation process often involves labor intensive and repetitive manual work. It is often worthwhile to invest in annotation tools and in automatically generating annotation proposals. In addition, there is significant research in the field of unsupervised learning that will alleviate the need for manual annotation.

Here are some other challenges to consider when adopting deep learning:

  • While it’s beneficial for the system to solve problems independently, this means there is less visibility into how the problem was solved
  • If the system isn’t exposed to a broad enough variety of data, it could reach wrong, often unexpected conclusions
  • The GPU-processing needed to enable deep learning can be demanding and expensive to run
  • The technology is rapidly evolving, so developers need to follow academic research and frequently re-assess their algorithms (agile)

It is clear that the benefits of deep learning in video analytics, and many other fields, greatly surpass the challenges. It will be interesting to see how the technology develops as processing and automation technology improve. Perhaps in the future, systems will be so well-trained machines will be able to predict and interpret unfamiliar scenarios independently, and help provide further insight for improving security, business intelligence and quality of life.

If you’re attending the GPU Technology Conference this week, you can learn more about Leveraging Deep Learning and GPUs to Accelerate Surveillance Video to Insight in a session with my colleague, Amit Gavish, BriefCam General Manager of Americas.