Decision Trees

Decision Trees

Decision Trees Decision tree is a very common classification method. It is a decision analysis method based on the known probability of various situations, which is used to calculate the probability of net present value greater than or equal to zero by forming a decision tree, to evaluate the risk of one project and to judge its feasibility. Because the decision branches are graphically similar to the branches of a tree, it is called ‘decision tree’. Each internal node of the decision tree represents a test on one property, each branch represents a test output, and each leaf node represents a category. In machine learning, the decision tree is a predictive model that represents a mapping relationship between the object’s attribute and object’s value. Each internal node in the tree represents an object, and each bifurcation path represents a possible attribute value, and each leaf node represents the value of the object from the root node to the leaf node.

There are many advantages of the decision tree. Firstly, it’s logically clear and easy to understand. People do not need to know a lot of background knowledge, because it can directly reflect the data. Secondly, data preparation is often simple or unnecessary for the decision tree, and its ability to process both data and conventional attributes simultaneously makes it possible to get feasible and effective results for large data sources in a relatively short period of time. What’s more, it can measure the credibility of the model, and given an observable model, it is easy to derive logical expressions based on the resulting decision tree. However, it is difficult to predict fields with continuity, and a lot of preprocessing is required for data that has a chronological order. When there are too many categories, errors may increase faster in the decision tree.

Our Services

Our statistical experts will help you to build a clear and appropriate decision tree which is suitable for your study. Under this goal, we will choose the most proper algorithm of the decision tree from four algorithms below, which would help you obtain feasible and effective results in your study.

  • ID3 algorithm

ID3 algorithm builds a decision tree based on the information (information gain) obtained from the training instances and then uses the same to classify the test data. The decreasing rate of information entropy [Ent (D)] (Figure 1) is taken as the criterion to select the test attribute, that is, selecting the attribute with the highest information gain [Gain (D, a)] (Figure 2) as the partition standard and this information gain has not been used to divide in each node. The ID3 algorithm then continues the process until the generated decision tree can perfectly classify training samples.

At present, ID3 algorithm plays an increasingly important role in the clinical diagnosis of medical science. For example, based on a large number of historical cases, ID3 algorithm is used to establish rules to identify symptoms and syndromes in TCM (traditional Chinese medicine) clinical diagnosis data and to find the most essential relationship between symptoms and syndromes.

The calculation of information entropy

Figure 1. The calculation of information entropy ([Ent (D)] is the information entropy, and the proportion of k-class samples in sample set D is pk).

The calculation of information gain

Figure 2. The calculation of information gain ([Gain (D, a)] is the information gain, and the sample set D is divided by attribute a of the sample).

  • C4.5 algorithm

C4.5 algorithm developed from ID3 algorithm is one of data classification algorithms with decision technique which is popular and favored due to its advantages. For example, it can process numeric data both continuous and discrete, can handle a lost attribute value, and generate the rules that are easy to be interpreted and fastest among algorithms using main memory in the computer.

C4.5 algorithm can carry out data mining, which is a process of extracting effective, novel, potentially useful, and ultimately comprehensible knowledge, model or rule from a large amount of data. For example, C4.5 algorithm has been used in the concrete data processing of type 2 diabetes, and some rules have been established. From the testing results, the average rate of the correct recognition of the healthy person and the diabetic patient is 97%.

  • Classification and regression trees (CART)

Classification and regression trees are machine-learning methods for constructing data prediction models. The models are obtained by recursively partitioning the data space and fitting a simple prediction model within each partition. Therefore, the partitioning can be represented graphically as a decision tree. Classification trees are designed for dependent variables that take a finite number of unordered values, with prediction error measured in terms of misclassification cost. Regression trees are for dependent variables that take continuous or ordered discrete values, with prediction error typically measured by the squared difference between the observed and predicted values. Our experts can use the square error minimization criterion to build the regression tree or the Gini index minimization criterion to build the classification tree.

The current CART mostly applies to analyze the status of patients. For example, according to tumor cell proliferation, CART is used to the survival analysis of colon cancer patients. Besides, CART can also analyze the total hospitalization costs of common pediatric diseases.

We guarantee the confidentiality and sensitivity of our customers' data. We are committed to providing you with timely and high-quality deliverables. At the same time, we guarantee cost-effective, complete and concise reports.

If you are unable to find the specific service you are looking for, please feel free to contact us.

References:

1. Hssina B, Merbouha A, Ezzikouri H, et al. (2014) ‘A comparative study of decision tree ID3 and C4.5’, International Journal of Advanced Computer Science & Applications, 2014(2):13-19.
2. Dunham M. (2003) ‘Data mining: Introductory and advanced topics’, Upper Saddle River, N J: Pearson Education,2003.
3. Aut B R. (2011) ‘tree: Classification and Regression Trees’, Wiley Interdisciplinary Reviews Data Mining & Knowledge Discovery, 1(1):14-23.

Are you looking for a professional advisor for your trials?

Online Inquiry
×