Intelligent systems
  • Intelligent Systems

Attentional Factorisation Machine Model for Better Predictive Analytics

PI: Chua Tat-seng


The massive market for Artificial intelligence (AI) continues to grow rapidly as more businesses turn to cognitive systems and AI software to create personalised customer experiences, supercharge sales and marketing efforts, and introduce automation in their manufacturing and supply chain processes. Global spending on these intelligent systems is forecast to reach US$77.6 billion by 2022 with an impressive CAGR of 37.3% between 2017-2022. This demand is driving the development of ever-smarter systems.

Factorisation machines (FM) belong to a new model class that combines the advantages of support vector machines (a popular predictor in machine learning and data mining) and factorisation models. Since its introduction in 2010, the FM model has been recognised as the most effective linear embedding method for sparse data prediction. It is, however, hindered by its same-weight modelling of all feature interactions even when not all feature interactions are equally useful and predictive. The modelling of interactions with useless features, in particular, can degrade overall performance.


This invention is a novel model named Attentional Factorisation Model (AFM). Designed as an improvement over the existing FM model, it is able to discriminate and automatically learn the importance of different feature interactions via a neural attention network without the need for any human domain knowledge.

AFM uses a new pair-wise interaction layer and an attention-aware pooling layer in neural network modelling. Its predictive analysis method involves receiving a set of predictor values comprising a plurality of features. Each feature is projected onto a dense vector representation to obtain a set of embedding vectors. From this, it calculates a set of interacted vectors with each interacted vector being an element-wise product of two embedding vectors. A weighted sum of the interacted vectors is performed using a plurality of attention scores that correspond to an interaction between a pair of features of the feature vector. This weighted sum is then projected to obtain a prediction score.



Technology Readiness Level (TRL)


Minimal Viable Product built in laboratory

Applications & Advantages

  • 01

    Better performance, simpler structure and fewer model parameters compared to deep learning methods like Wide&Deep and Deep-Cross

  • 02

    Particularly effective for prediction tasks involving categorical predictor variables

  • 03

    Able to power ranking engines and predict click-through rates for online advertising systems

  • 04

    Useful in social network data analysis and stock market predictions

  • 05

    Automatic learning of the feature interactions that are more important for the prediction

  • 06

    Insights into important feature interactions enable deeper analyses without the need for human domain knowledge