As Artificial Intelligence (AI) advances towards new frontiers in reasoning and autonomy, the demand for greater accountability, transparency and reliability grows in tandem. Addressing this imperative, two HTX teams wrote research papers that will be presented at the 39th Annual Conference on Neural Information Processing Systems (NeurIPS 2025).
Widely regarded as the top international AI conference, NeurIPS gathers the brightest minds in machine learning, neuroscience and AI to showcase cutting-edge research. This year’s event will take place in San Diego, USA, from 2-7 December.
The two papers, by the Q Team Centre of Expertise (CoE) and the Sense-making & Surveillance (S&S) CoE, are Restoring Pruned Large Language Models via Lost Component Compensation, and SemScore: Practical Explainable AI through Quantitative Methods to Measure Semantic Spuriosity respectively.
(From left) Ng Gee Wah, Deryl Chua and Lee Onn Mak from the Q Team. (Photo: HTX/Deryl Chua)
Notably, the Q Team CoE’s paper – authored by Ng Gee Wah, Feng Zijian, Zhou Hanzhang, Zhu Zixiao, Li Tianjiao, Deryl Chua, Lee Onn Mak and Kezhi Mao, in collaboration with Nanyang Technological University – was spotlighted among the top 10% of submissions by conference reviewers.
Their work tackles a longstanding challenge: shrinking large AI models without sacrificing accuracy, so they can respond quicker, use less energy and be deployed efficiently in frontline applications where speed and reliability are critical – including virtual assistants and security screening.
When large AI models are “pruned” – that is, trimmed by removing less important parameters and connections – they become smaller and faster, but can also lose some of their ability to focus on key information. This focus comes from the model’s attention mechanism – the process that helps it understand relationships between words in a sentence, much like how a person pays attention to certain details while reading.
Q Team Director Ng Gee Wah explained that their solution was sparked by a simple observation: “We realised that you can recover much of the diminished performance from pruning by simply adding back the lost information, sparking the idea to build an intelligent system to do so.”
Instead of spending weeks retraining pruned models to regain their accuracy, the team’s AI restoration tool, RestoreLCC, pinpoints which parts of the attention process are degraded during pruning and intelligently reinstates the missing information – much like restoring key memories without rebuilding the entire brain.
The method achieved up to 3.56% higher accuracy than other recovery approaches on heavily pruned models.
Having demonstrated RestoreLCC’s scalability across multiple Large Language Models (LLMs), Q Team CoE plans to further refine the tool and harness it to deploy more efficient AI models for its projects. “We hope that having our work validated at NeurIPS will lead to wider adoption of our method and spark new research collaborations,” said Gee Wah.
Championing AI explainability
(From left) Jovin Leong, Chen Wei May and Tan Tiong Kai have developed a toolkit that assesses AI reasoning. (Photo: HTX)
In the same vein of ensuring reliability in AI systems, S&S CoE’s paper –authored by Jovin Leong, Chen Wei May and Tan Tiong Kai – addresses another pressing challenge: explainability, which refers to the extent to which the decision-making of AI systems can be understood by humans. Explainability and trust are interlinked.
With AI increasingly used in high-stakes areas such as healthcare, public safety and finance, blind trust in models with opaque decision-making can have serious consequences. An example is the COMPAS algorithm, which displayed racial bias in predicting criminal recidivism in the US.
To enable more trustworthy systems, the S&S team developed SemScore, a toolkit that evaluates how AI models reason when interpreting visual cues. The team explained that AI models can sometimes rely on flawed logic – such as identifying a dog in a photo of a park not because it recognises the animal, but because it has learned that dogs often appear in parks.
SemScore, a self-driven initiative co-developed by the engineers, quantifies how closely an AI model’s reasoning aligns with human understanding. This veers from traditional saliency maps – visualisations of where AI “looks” when making decisions – which are slow and subjective.
“When you inspect saliency maps, you need an expert to analyse each image one at a time – it’s very tedious and time-consuming. We wanted a metric that’s more objective,” said Chen Wei May, Engineer (Video Analytics R&D).
The team also noted that SemScore can enhance AI trustworthiness, which is crucial for Home Team applications such as vehicle screening and vape detection, where model reliability directly affects public safety and confidence.
The paper was also shortlisted for the RegML Workshop (3rd Workshop on Regulatable Machine Learning) at NeurIPS 2025, which will be attended by academics from prestigious institutions.
Reflecting on his learning journey, Tan Tiong Kai, Engineer (Video Analytics R&D), shared: “Our interaction with industry experts challenged us to deepen our understanding and elevate our work to a standard of rigour and expertise that would merit recognition at a top-tier conference.”
Moving forward, the team plans to release SemScore as an open-source toolkit, hoping it could become a standard benchmark for evaluating AI models across Home Team applications and beyond.