Discover how the various types of AI models work and their potential to transform the homeland security landscape
This is the second article of a multi-part series about AI and how it can empower Home Team operations.
Artificial Intelligence (AI) models are programs that analyse and find patterns within data sets. By learning these patterns, AI models can make predictions.
While there are many AI models, this article focuses on five common ones that have wide uses across industries.
Machine learning (ML) models
ML models learn from data sets and improve their predictions as they are fed more data. They can be taught via supervised or unsupervised learning. ML is an umbrella term that encompasses almost all modern AI models.
Under supervised learning, data scientists constantly feed the model with labelled datasets. These datasets are specially curated for the model and teach it to interpret data and identify patterns exactly the way the data scientist wishes.
Once the model can consistently identify patterns in the training data, it is considered “trained” and can identify similar patterns in unknown datasets.
On the flip side, unsupervised learning does not require any human intervention. Under unsupervised learning, the model is trained on raw (i.e. not curated) datasets and creates its own algorithms to process data.
Deep learning (DL) models
DL models are a subset of ML models and contain a “brain”—also known as a neural network. This “brain” helps the model interpret and analyse data without human supervision.
Neural networks are complicated and consist of multiple layers, including an input layer, many hidden layers, and an output layer.
The input layer receives raw data and passes it on to the hidden layers of the neural network for processing. These hidden layers form the core of the neural network, where the data is processed and analysed by neurons. These neurons “weigh” the data to determine which aspects of it are most important and use activation functions to decide whether to pass on this processed information to the next layer.
Each of the hidden layers refine the information until it reaches the output layer. Here, the network makes its final prediction and compares it with the ‘correct’ answer or expected output. It then adjusts weights in the hidden layer neurons to improve future predictions.
Deep-learning AI models form the backbone of online recommender systems.
Use cases: While there are many uses for deep learning models, you might have seen them in recommender systems used on sites like Netflix and YouTube. With enough data, these deep learning models can accurately predict the kind of videos you want to watch based on your previous activity on the site.
These recommendation systems are also capable of learning from mistakes. When you select the “not interested” option, the model understands which videos to avoid suggesting to you and improves its future recommendations.
Large language models (LLMs)
LLMs are large deep learning models pre-trained on vast amounts of data, helping them understand and generate natural (i.e. human-like) language.
LLMs function on a transformer architecture, such as the generative pre-trained transformer (GPT).
Transformers are a set of neural networks that contain an encoder and decoder, both of which help the LLM understand textual inputs.
Once textual inputs are received, the transformer pre-processes the words into embeddings—mathematical representations of a word. When these embeddings are encoded in vector space, words with similar meanings are mapped closer together, helping the LLM understand the context of words and phrases in an input. The LLM then uses a decoder to generate human-like responses.
ChatGPT brought LLMs into the spotlight.
Use cases: The best-known example of LLMs would be ChatGPT. Taking the world by storm in 2022, OpenAI created a chatbot that could understand prompts and reply to them in a human-like manner with their GPT 3.0 LLM.
Computer vision models
Computer vision models use deep learning to interpret and understand visual information, such as images and videos. This means that the model can recognise and distinguish various objects, making them excellent at picking out objects of interest from vast swathes of visual data.
Convolutional neural networks (CNNs) help the model understand images by breaking down images into pixels and giving each pixel a label. These labels help CNNs interpret the image data.
Like other neural networks, CNNs compare their predictions with the correct answer and iteratively fine-tune their algorithm to improve future predictions.
To understand videos, computer vision models use recurrent neural networks (RNNs) that help the model understand how all the images in the video are related to one another.
Self-driving vehicles see and understand the road ahead through computer vision models.
Use cases: Autonomous or self-driving vehicles use computer vision to navigate and respond to their surroundings in real-time. For instance, the car will swerve or brake if it “sees” a child running across the road.
Large Multi-Modal Models (LMMs)
LMMs can interpret and generate different types of data, such as text, images, audio, and video. Think of them as LLMs that can go beyond text. They are trained using multiple unimodal neural networks, with each of these neural networks interpret a specific form of data, such as text, image, or audio. This allows the model to train on multiple forms of data rather than a set of the same data type.
For instance, while LLMs might only have one neural network that trains train on textual data, text-to-image multimodal models have two neural networks that train on images and text respectively. This allows the model to understand both visual and textual information, allowing it to understand what textual inputs mean and subsequently predicting the kind of image it should generate.
Use cases: While there are various types of multimodal models, you might be most familiar with Generative AI art models like DALL-E. These models are also known as text-to-image models and can generate images based on your prompts.
A different type of multimodal model is image captioning. Unlike DALL-E, which creates images from text, image captioning models do the opposite. They take an image as input and generate a text description that matches it.
Another application of multimodal models is video captioning. Video language models can understand the relationship between visual information, sounds, and events occurring in the video. They then generate text descriptions based on the context of the video.
Deepfakes are AI-generated media that can alter an individual’s image or voice in a digital content.
How can AI models augment homeland security?
Deep learning models
These models excel at identifying the unknown. For example, when DL models are used in deepfake detectors, they can identify whether a piece of digital media is a deepfake. These models are trained on vast datasets of genuine and manipulated images and videos and learn to identify subtle anomalies that might be imperceptible to humans, such as lighting inconsistencies.
Deepfake detectors enable homeland security officers to quickly identify and take down deepfake news and scams, protecting the public from misinformation and fraud.
Large language models
Within the realm of homeland security, LLMs boost productivity. From automatically filling out forms to summarising meeting minutes, LLMs complete mundane administrative tasks and free up homeland security officers’ time to focus on more pressing matters.
Computer vision models
Computer vision models can enhance border security by providing immigration officers with an extra pair of “eyes”. For example, image and video analytics capabilities can be implemented in X-ray machines at the border to identify suspicious items hidden deep within cargo or luggage.
Computer vision models can identify objects of interest from CCTV footage.
Video analytics capabilities can also augment crime-solving efforts. These AI models can trawl through hours of video evidence, such as CCTV footage, and immediately identify objects of interest for investigative officers—ensuring that no evidence gets overlooked.
Large multi-modal models
While computer vision models can identify objects of interest for investigative officers, LMM video captioning models go a step further and describe the object of interest as well.
For example, a computer vision model could mark a suspicious bag in a crowded train station. Meanwhile, video captioning models would not only draw the investigative officer’s attention to the bag but also give a detailed account of the potential evidence such as describing the colour of the bag.
Video captioning models can also be trained to generate descriptions in specific formats, making them excellent tools in helping investigative officers quickly fill out crime scene reports.
Stay tuned for the next article in the series in which we’ll show examples of how AI is empowering our innovation and the way we work at HTX!
Reference List:
Machine learning models
1) https://www.ibm.com/topics/ai-model
Deep learning models
1) https://www.freecodecamp.org/news/deep-learning-neural-networks-explained-in-
plain-english/
2) https://aws.amazon.com/what-is/neural-network/
LLM
1) https://www.ibm.com/topics/large-language-models
2) https://aws.amazon.com/what-is/large-language-model/
Computer vision
1) https://aws.amazon.com/what-is/computer-vision/
2) https://www.ibm.com/topics/computer-vision
LMM
1) https://cloud.google.com/use-cases/multimodal-ai
2) https://aws.amazon.com/blogs/machine-learning/fine-tune-large-multimodal-models-
using-amazon-sagemaker/