Earlier this week, we expanded our AI Platform into an exciting new domain – the integration of event detection algorithms for streaming audio data. Dataminr’s AI platform now transforms the live audio transmissions of publicly available emergency dispatch broadcasts into early alerts on active safety threats.
This initiative is already delivering substantial value for our clients. This week, one of our streaming audio based alerts identified an active shooter situation in New York. Dataminr delivered this alert to clients nearly a minute in advance of the first online social media post about the incident. Clients who received the alert were able to quickly understand the event by reading an accurate machine-generated event summary and even listen to the audio segment itself within the application.
As Dataminr’s Co-founder and CTO, it has been exciting to see the capacity of our AI platform grow so dramatically over the past decade to tackle the expanding needs of our clients. As global enterprises have adopted our products for Corporate Risk, we have worked to deliver a broader set of early alerts in major cities for a range of Corporate Risk use cases, including office protection and employee safety. Consider, for example, the task of safe transportation of key corporate personnel through densely populated urban areas – diverting a transit route a few blocks away from an immediate but otherwise unanticipated threat can mean the difference between individuals arriving safely at their intended destination or being put in harm’s way.
Historically, Dataminr’s event detection AI has focused on text classification and computer vision. More recently, we added a number of capabilities around Anomaly Detection on Sensor derived data streams. Over the course of this year, Dataminr’s AI and Research teams have been aggressively testing new AI approaches on streaming audio data.
Automatically transforming police and emergency dispatch broadcasts into actionable alerts presents a number of unique challenges within the disciplines utilized for audio processing. Audio quality is often poor, with substantial background noise. Domain-specific jargon and shorthand is prevalent, transmitter and receiver hardware varies substantially from location to location. This is a complex dataset that has required the development of new platform components and algorithms.
In order to tackle the challenge of transforming raw audio streams into a discrete set of actionable alerts, we combine approaches from a number of distinct scientific disciplines. Dataminr’s Audio Processing Pipeline leverages a variety of frequency filtering and signal processing techniques in conjunction with Deep Learning (Transformer and LSTM Neural Networks) to identify breaking events, partition segments of conversation, perform topic classification and measure intensity, reach, and impact. Extraction of key features at the ingest and signal processing layers allows us to leverage a mixture of Statistical Analysis and Machine Learning approaches (Anomaly Detection, Clustering) for the segmentation of streaming audio into discrete events. We leverage neural networks to identify named locations of interest, and sequence-to-sequence models for the algorithmic generation of descriptive alert headlines.
What excites me most about this new streaming audio initiative is how it showcases the exponential growth of Dataminr’s AI Platform. Our use of Transfer learning is one of the ways that our AI Platform is achieving dataset network effects and exponentially expanding our product’s value for Dataminr’s clients.
The basis for the runaway success of this newest AI initiative was our pre-existing archive of carefully labeled event data, developed through the processing of thousands of public data sources over a decade, and the neural networks that we have trained for handling sequence-to-sequence, classification, and NER tasks. As is always the case with large scale supervised learning problems, data collection and labeling is one of the most resource-intensive elements required for the new development of an effective classification or generative model. Our use of Transfer Learning for this initiative dramatically accelerates this process, allowing our new AI algorithms for streaming audio to perform in ways that otherwise wouldn’t be possible if we were starting from scratch.
Let me provide a specific example of how some of our existing algorithms, developed across our previously integrated data sources, have enabled our rapid expansion into this new domain of streaming audio. We apply Deep Learning Neural Networks for topic classification across our disparate text-based datasets. Our Named Entity Recognition (NER) models, deployed across all of our alerts, have been trained on our vast 10-year archive of annotated social media based alerts generated from the trillions of public data units we’ve processed in the past.
Transfer Learning – the utilization of an existing Neural Network trained on related (but larger) labeled data, coupled with the addition of one or more new layers to the network, and retraining the resultant network using a smaller set of labeled data from the new domain – allows us to adapt these pre-existing NER models to great success on the new domain of streaming audio. This process enables us to build rapidly on the already highly accurate models integrated throughout the stages of our current data pipeline, and quickly generate new AI models that can perform well on a previously out-of-domain data set.
The foundation provided by all the great work we’ve done over the past 10 years is enabling Dataminr’s AI platform to detect valuable new types of signals from within all-together new public data domains. In other words, it is some of our powerful pre-existing AI algorithms that provided the critical foundation for these newest AI algorithms to be effective for streaming audio event detection. The capacity of our AI platform to rapidly adapt to new datasets opens up many exciting new possibilities for the future as we continue to aggressively explore new frontiers of public data.