Extraction Optimization Models from Data: an Approach based on Decision Trees and Forests.

Donskoy V. Extraction Optimization Models from Data: an Approach based on Decision Trees and Forests. // Taurida Journal of Computer Science Theory and Mathematics, – 2017. – T.16. – №4. – P. 59-
logo DOI https://doi.org/10.37279/1729-3901-2017-16-4-59-86

Evolution of mathematical methods of classification and regression based on building decision trees and forests allowed to apply these methods to solve more complex problems of non-classical information modeling — retrieval models selection of the best solutions from the data. In this approach, a mathematical model is not specified a priori but is synthesized automatically based on the available empirical information. The properties of the classification algorithms and regression based on building decision trees and forests, providing the possibility of automatic extraction of both linear and non-linear models that implement a piecewise approximation of the objective functions and surfaces, separating admissible and inadmissible (not satisfying the constraints) solutions. In this paper we developed two approaches to the synthesis of models of solutions choice from the empirical data. The first approach involves the synthesis of ’joint’ model of decision tree that implements both the regression and the classification of decision variants onto admissible and inadmissible. The second approach involves building a separate models: regression tree to approximate the objective function and classification tree for selection of admissible solutions.The approach based on extraction from data separately the model of the objective function and the model of admissible solution region allows to use as a regression model any known models appropriative for this goal. It may be random forests, bagging and boosting regression forests, regression equations (if one have the appropriate additional a priori information), or a neural networks.

Classification decision trees allow to obtain a logical description of area of admissible solutions in the form of disjunctive normal form (DNF) over the selected set of the featured predicates. The paper shows how it is possible to make more exact the construction of these DNF if instead a single decision tree use decision forest based on areas of competence or through the use of the so-called ’full’ decision tree.

Received the article the results are intended for use in the development of intelligent control algorithms and they present theoretical basis of Building Optimization Models from Data (named BOMD information technology).

Keywords: Building Optimization Models from Data, Decision Trees, Decision Forests, BOMD technology