Symbolic regression features

These are implemented in the sym-xai module of GPTIPS.

Symbolic non-linear regression demos (gpdemo1, gpdemo2, gpdemo3, gpdemo4) plus config files for some synthetic non-linear regression problems to experiment with (cubic_config, uball_config, ripple_config, salustowicz1d_config).

Automatic mathematical simplification of symbolic regression models. For instance, gppretty displays simplified models at the command line. Models can be selected by numerical ID for the best model on training, validation or test data sets.

Tight integration with MATLAB's Symbolic Math engine - allows extensive analytical manipulation of any symbolic regression model in MATLAB (gpmodel2sym).

Configurable regression model filter object (gpmodelfilter) enables the progressive refinement of model libraries according to model performance, model complexity and other user criteria.

Extract all the unique regression model trees (genes) from a population or a filtered population (uniquegenes).

genebrowser: a graphical utility allows the inspection of regression model complexity and structure of all trees in the population as well as in a specified model. It identifies suitable trees for removal from a model (and the performance impact this would have) thus allowing you to tune the accuracy-simplicity ratio (ASR) of your models.

Construct new models using the unique trees (genes) in a population (genes2gpmodel).

Export any model directly to a standard MATLAB Symbolic Math object (gpmodel2sym).

Export any model directly to an HTML formatted equation (HTMLequation).

Convert any model to an anonymous function (gpmodel2func).

Export individual trees (genes) of any regression model to a standalone M file for use outside GPTIPS (gpgenes2mfile).

Export any regression model to LaTeX format (gppretty or latex).

Export any regression model to a standalone M file for use outside GPTIPS (gpmodel2mfile).

Export any regression model to an optimised C code snippet for use outside MATLAB (use gpmodel2sym then ccode).

Two measures of model complexity: simple tree ‘node count’ and the more fine-grained 'expressional complexity' to promote flatter, less complex model tree structures.

HTML model report generator enables a comprehensive performance and statistical analysis of any regression model in the population to be exported to HTML for later reference. The HTML report contains interactive graphical displays of model performance and model structure (gpmodelreport). The report can be sent to anyone - they don't need MATLAB or GPTIPS.

Create a data structure containing highly detailed information - including prediction error metrics - on any regression model as well as the individual predictions on training, test and validation data (gpmodel2struct).

Regression Error Characteristic (REC) curves allows a simple graphical comparison of the predictive performance of selected regression models (comparemodelsREC).

Interactive population browser shows regression models in terms of model performance (1 - R2) and either simple node count or expressional complexity (popbrowser). Pareto optimal models are highlighted. The performance can be shown on training, validation or test data sets and hovering over a model shows its simplified regression equation.

Additional mathematical building blocks for improved modelling performance, e.g. square, cube, add3, mult3, negexp, step, thresh, gauss, greater than (> gth), less than (< lth).

Graphical display of input frequency across all regression models satisfying user specified R2 and model complexity constraints (gpmodelvars and gppopvars).

Detailed graphical display of model performance (runtree and gpmodelreport).

Runtime model validation on ‘holdout’ data set.

* Requires Symbolic Math toolbox.

General genetic programming (GP) features

These are implemented in the hypothesis-ML module of GPTIPS

Pareto tournaments - promotes discovery of low complexity/high performance solutions.

Automatic support for the Parallel Computing Toolbox: fitness and complexity calculations are split across multiple cores allowing significant run speedup.

Multiple independent runs which are then automatically merged.

Automatic caching.

Steady-state GP.

Two measures of tree complexity: node count and expressional complexity (promotes low complexity solutions).

Highly customisable (using simple CSS) visualisation of the tree(s) comprising any individual as an HTML report (drawtrees).

Run time command line display of inputs used in best individual.

Multiple tree (multigene) individuals.

Regular tournament selection & lexicographic tournament selection.

High level crossover and low level (standard subtree) crossover operators.

Ephemeral random constants (ERCs). You can use integers, reals or a mixture of both.

Elitism (the good kind).

Early run termination criterion - loss function/fitness achieved and/or time taken for run.

Graphical summary of fitness over GP run (summary).

6 different mutation operators.

Highly configurable run settings.