The MASA Approach to Machine Learning
In a nutshell, the term Machine Learning is commonly used to refer to any technique that serves to automatically extract information from a set of input data. The extracted information can take various forms: mathematical formulae, a set of rules or any logical or mathematical structure associated to the data.
MASA has developed, and has since continuously updated and improved, Machine Learning algorithms based on the statistical analysis of numerical data. MASA’s approach consists in representing information through mathematical models; that is, formulae. Since the problems that can be treated by our library algorithms are all those whose inputs can be translated into sets of real vectors, MASA’s framework is effectively capable of dealing with a very large class of interesting problems. The R&D work that MASA has been carrying out since 2001 is focused on the creation and application of learning algorithms, as well as the study of evaluation methods to assess the quality of the models.
The BK Machine Learning Toolkit
The BlueKaizen Machine Learning Toolkit is basically a framework of statistical learning components that can be used inside applications to solve problems in Supervised Classification, Clustering, Regression and Novelty Detection. Additionally, it contains components meant to help in the extraction of relevant information (or features) and model selection. The notion of statistical risk or quality of a model is essential to choose the best model from a set of candidate models. The extraction of features and the selection of the best model are steps common to all machine learning paradigms, as depicted below.

Some features of BlueKaizen Machine Learning:
- Architecture : the general architecture of the framework toolkit is modular.
The algorithms are organized into hierarchies of independent object-oriented software components sharing a common protocol, thus facilitating their independent evolution.
Also, extending the framework with novel algorithms or composing algorithms is easy to implement, and robust.
- Development Environment : the core of BlueKaizen Machine Learning is programmed in C++, the natural mainstream object-oriented language of choice for building industrial-strength applications. Specific software layers (called bridges) allow the toolkit to be transparently used from scripting languages as well, like Matlab and Python.
- Delivering Libraries : the toolkit has been designed in such a way as to allow the creation and packaging of stand-alone customized libraries with little effort.
A customer-specific customized learning library contains only the modules effectively used by the application, thus minimizing the memory footprint.
It is also possible to easily call the learning services in a stand-alone library from an existing environment through a minimal API, so that the customer is not required to link to the framework, and can try out the possibilities offered by it in no time!
The products
The first industrial applications considered by MASA for the BlueKaizen Machine Learning Toolkit are related to process control and quality monitoring, more specifically in the microelectronics and semiconductor industries.
BlueKaizen Machine Learning solutions have been tested on Fault Detection and Classification, Equipment Monitoring and Drift Detection and Advanced Process Control projects.
Version française
