GPTIPS

Symbolic Machine Learning Platform for MATLAB

GPTIPS is a free Explainable-AI machine learning platform and interactive modelling environment for MATLAB

It is driven by the Hypothesis-ML machine learning engine

It provides a new and unique approach for building accurate and intrinsically explainable non-linear regression models (XAI)


A fix for the Symbolic Math toolbox issue currently affecting the use of GPTIPS (v2.0) in MATLAB R2018a onwards will be released soon.

A workaround is to install a version of MATLAB prior to R2018a alongside your current version, check if your MATLAB license allows this (it probably does).

This issue is caused by major changes in the MATLAB Symbolic Math toolbox in versions R2018a onwards.

8th March 2021

"Numerical examples show the superb accuracy, efficiency, and great potential of MGGP. Contrary to artificial neural networks and many other soft computing tools, MGGP provides constitutive prediction equations" 1

*MGGP is the GPTIPS Hypothesis-ML engine

"The developed model equation is found to be more compact compared to the MARS and other AI models and can easily be used by the professionals with the help of a spreadsheet without going into the complexity of model development " 7

Features


  • No more black boxes! Uses machine learning driven explainable AI (XAI) to automatically learn compact, explainable and accurate non-linear equations from your data. These models are not black boxes, they look, feel and act like regular equations like this:


y = 4.94 x2 - 1.08 x1 + 2.24 log(x3 - x1) + 0.41 x3 x4 - 0.753 x32 + 7.54


  • No machine learning expertise required. Aimed at ordinary scientists, engineers, analysts, students and other professionals who need/love to build models.

  • Soothes the pain of deploying - with zero dependencies - your models outside the model building environment.

  • Out of the box. GPTIPS builds non-linear symbolic regression models when you don't know the 'true' underlying structure.*

  • Automatically identifies key predictive features even when your data is noisy and highly correlated with many superfluous features.

  • Optimises your models' accuracy-simplicity ratio (ASR) - GPTIPS automatically generates a model portfolio containing models of different levels of complexity and predictive quality. You choose the models that best suit your use case and can fine tune their structure if you want.*

  • GPTIPS is completely open source, written in standard MATLAB & has a pluggable architecture - it is easy to write new functions to solve your own problems with the GPTIPS Hypothesis-ML engine. But if you just want to build non-linear regression models that's fine - it's all built in.

*Requires MATLAB Symbolic Math Toolbox

GPTIPS is a widely used tech platform across a diverse range of commercial and research application areas. It has been shown many times to outperform existing machine learning methods such as neural networks, support vector machines etc.

Have a look at the testimonials!

Why use GPTIPS?

Your model is an equation

It's pretty much as simple as that - you can just take the equation and copy and paste it where ever you like or easily edit it to work in a spreadsheet. Or you can use GPTIPS functions to export it to a variety of different formats including LaTeX for use in professional documents.

No preconceived assumptions of model structure

GPTIPS automatically creates both the structure and the parameters of regression models using the supplied input features and simple mathematical "building blocks"

These building blocks (plus, minus, sqrt, log, exp, -x, sin, cos, square, cube etc.) are iteratively assembled by GPTIPS to form trees representing the models.

For instance, the simple model y = tanh(x3 - x1) can be represented by a single equation tree as

Equation tree representation of the simple symbolic regression model.

y = tanh(x3 - x1)

Tree weight = 1 and bias term = 0 not shown).

Generated using the drawtrees function. This model contains just one tree. Usually models contain multiple trees.

There are more than 30 built in building blocks in GPTIPS and it is very easy to add your own. Trees can be visualised (as above) in GPTIPS using the drawtrees function (see the Help page for details or type help drawtrees at the MATLAB command line).

Interpretable portable symbolic regression models - what you see is what you get

GPTIPS models are interpretable mathematical equations - you can see what's going on inside them. This often gives you new insight into the systems or processes you got your data from.

Unlike many ML models - such as neural networks - no specialised modelling software environment is required to deploy the trained models. They can be easily and rapidly implemented in any modern computing language - or deployed in a spreadsheet - by a non-modelling expert.

It is easy to export your best GPTIPS models and there are a variety of functions provided to expedite this process.

Optimise your models' accuracy/simplicity ratio

In regression, GPTIPS considers both the model predictive performance and model complexity in an attempt to create models that perform well but are as simple as possible.

The trade-off surface of models ('the Pareto front') represents models that are not beaten by any other model in both predictive performance and complexity. These models are usually of the most interest. An example of a typical Pareto front is shown below as green circles . The purple/blue circles represent models not on the Pareto front - these are usually destined for the trash compactor.

Hint

You can in fact literally trash these non Pareto models using the gpmodelfilter function - leaving just the Pareto front models in your population. You can filter on other model properties too. See the help pages for more details.

A key idea here is that you are not just building a single 'best' equation - you are building a library of them - some more accurate but more complex and some a bit less accurate but simpler and more interpretable. You can choose which ones you want - depending on the demands of your use case.

Pareto optimal models - illustrating the complexity/accuracy tradeoff. The green dots represent the set of models you should explore further.

You can plot (and select) the Pareto optimal models like the graph above using the popbrowser function.

It is also easy to select the model you want from the Pareto optimal set using an HTML report generated by the paretoreport function as shown below.

Pareto optimal models can be sorted by complexity or predictive performance (here measured by R2) by clicking on the appropriate column header.

Note

R2 is the the coefficient of determination. It is a number calculated for a model, usually between 0 and 1, that describes the proportion of variation observed in the target/output variable that is predicted by the model. In GPTIPS it is computed for training, validation and test data sets separately.

A snippet of a paretoreport example showing the accuracy-simplicity tradeoff spectrum in the model population expressed in terms of predictive performance R2 and model complexity. Note that - in this example - the more accurate models (higher R2) tend to have a higher complexity.

What's more is that you can use one of the GPTIPS visual analytic tools (the genebrowser) to fine-tune and tailor the structure of your regression models to adjust their accuracy-simplicity ratio - see Visual Analytic Tools for further details.

Automatic feature selection

GPTIPS automatically selects the input variables ('features') that best predict the output variable (target) of interest.

GPTIPS has been shown to be effective at feature selection even when there are > 1000 irrelevant input variables.

What's more is that GPTIPS can easily be used - as a feature selection method in its own right - to select features or non-linear combinations of features as inputs for any other modelling method.

Detailed HTML symbolic regression model reports

In GPTIPS you can quickly generate a detailed, interactive HTML model report using the gpmodelreport function.

The report includes details of the run configuration, the tree structures of the model, the performance of the model on the data and more.

Excerpts of a typical report are shown below.

Updated: 28h March 2021© Dominic Searson 2009 - 2021