Diagnostics Post Processing
Analyze the response surface quality.
Analyze the Predictive Model Quality
Analyze the Fit quality.
 From the Post Processing step, click the Diagnostics tab.
 In the work area, select the output response to analyze.

Click the tabs, below the output responses, to change the diagnostics used to
analyze the selected output response.
 Detailed Diagnostics displays diagnostic information for the Input matrix, CrossValidation matrix, and Testing Matrix.
 Regression Terms displays the confidence
intervals which consist of an upper and lower bound on the coefficients of
the regression equation.
Bounds represent the confidence that the true value of the coefficient lies within the bounds, based on the given sample.
Change the confidence value from the % Confidence settings. A higher confidence value will result in wider bounds; a 95% confidence interval is typically used.
Note: Only available for Least Squares Regression.  Regression Equation displays the complete formula
for the predictive model as a function of the input variables.Note: Only available for Least Squares Regression.
 ANOVA estimates the error variance and determines
the relative importance of various factors.
Often used to identify which variables are explaining the variance in the data. This is done by examining the resulting increase in the unexplained error when variables are removed.
Note: Only available for Least Squares Regression.  Confusion Matrix summarizes the performance of a
classifier. Correctly identified data is listed on the diagonal, and
misclassifications are presented on the offdiagonals.Tip: Click to toggle the confusion display from absolute count to percentages. Also, click to control the display of the confusion matrix between the input, crossvalidation, and testing data set.
Diagnostic Tab Settings
Settings to configure the diagnostics displayed in the Diagnostic post processing tab.
 % Confidence
 Change the confidence value.Note: Only available for Regression Terms diagnostics.
Diagnostic Definitions
Definitions used to describe diagnostic concepts.
For a given set of $n$ input values, denoted as ${y}_{i}$ , the Fit predictions at the same points are denoted as ${\overline{y}}_{i}$ . The mean of the input values is expressed $\overline{y}$ . For a Least Squares Regression, $p$ is the number of unknown coefficients in the regression.
 Total Sum of Squares
 Explained Sum of Squares
 Residual Sum of Squares
 Average Absolute Error
 Standard Deviation
Detailed Diagnostic
Data displayed in the Detailed Diagnostic tab of the Diagnostics post process tool.
Input Matrix
The Input Matrix column shows the diagnostic information using only the input matrix. For methods which go through the data points, such as HyperKriging or Radial Basis Functions, input matrix diagnostics are not useful.
CrossValidation Matrix
The CrossValidation Matrix column shows the diagnostic information using a kfold scheme, which means input data is broken into k groups. For each group, the group's data is used as a validation set for a new approximate model using only the other k1 group's data. This allows for diagnostic information without the need of a testing matrix.
Testing Matrix
The Testing Matrix column compares the approximate model, which was built using the input matrix, against a separate set of user supplied points. Using a Testing matrix is the best method to get accurate diagnostic information.
Criterion
 RSqaure
 Commonly called the coefficient of determination, is a measure of how well the Fit can reproduce known data points. Graphically, this can be visualized by scatter plotting the known values versus the predicted values. If the model perfectly predicts the known values, RSquare will have its maximum possible value of 1.0, and the scatter points will lie on a perfect diagonal line, as shown in the Figure 1.More typically, the Fit introduces modeling error, and the scatter points will deviate from the straight diagonal line, as shown in the Figure 2.
 RSqaure Adjusted
 Due to its formulation, adding a variable to the model will always increase RSquare. RSquare Adjusted is a modification of RSquare that adjusts for the explanatory terms in the model. Unlike RSquare, RSquare Adjusted increases only if the new term improves the model more than would be expected by chance. The adjusted RSquare can be negative, and will always be less than or equal to RSquare. If RSquare and RSquare Adjusted differ dramatically, it indicates that nonsignificant terms may have been included in the model.
 Multiple R
 The multiple correlation coefficient between actual and predicted
values, and in most cases it is the square root of RSquare. It is an
indication of the relationship between two variables.Note: Only available for Least Squares Regression.
 Relative Average Absolute Error
 The ratio of the average absolute error to the standard deviation. A low ratio is more desirable as it indicates that the variance in the Fit's predicted value are dominated by the actual variance in the data and not by modeling error.
 Maximum Absolute Error
 The maximum difference, in absolute value, between the observed and predicted values. For the input and validation matrices, this value can also be observed in the Residuals tab.
 Root Mean Square Error
 A measure of weighted average error. A higher quality Fit will have a lower value.
 Number of Samples
 The number of data points used in the diagnostic computations.
Regression Terms
Data displayed in the Regression Terms tab of the Diagnostics post process tool.
pvalues are computed using the standard error and tvalue to perform a student’s ttest. The pvalue indicates the statistical probability that the quantity in the Value column could have resulted from a random sample and that the real value of the coefficient is actually zero (the null hypothesis). A low value, typically less than 0.05, leads to a rejection of the nullhypothesis, meaning the term is statistically significant.
ANOVA
Data displayed in the ANOVA (Analysis of Variance) tab of the Diagnostics post process tool.
 Degrees of Freedom
 Number of terms in the regression associated with the variable. All degrees of freedom not associated with a variable are retained in the Error assessment. More degrees of freedom associated with the error increases the statistical certainty of the results: the pvalues. Higher order terms have more degrees of freedom; for example a second order polynomial will have two degrees of freedom for a variable: one for both the linear and quadratic terms.
 Sum of Squares
 For each variable, the quantity shown is the increase in unexplained
variance if the variable were to be removed from the regression. A
variable which has a small value is less critical in explaining the data
variance than a variable which has a larger value.
The row Error, represents the variance not explained by the model, which is SS_{err}.
The row Total, which is SS_{tot}, will generally not equal to the sum of the others rows.
 Mean Squares
 The ratio between unexplained error increase and degrees of freedom, computed as the Sum of Squares divided by the associated degrees of freedom.
 Mean Squares Percent
 Interpreted as the relative contribution of the variables to the Fit quality, computed as the ratio of the Mean Square to the summed total of the Mean Squares. A variable with a higher percentage is more critical to explaining the variance in the given data than a variable with a lower percentage.
 Fvalue
 The quotient of the mean squares from the variable to the mean squares from the error. This is a relative measure of the variable’s explanatory variance to overall unexplained variance.
 pvalue
 The result of an Ftest on the corresponding Fvalue. The pvalue indicates the statistical probability that the same pattern of relative variable importance could have resulted from a random sample and that the variable actually has no effect at all (the null hypothesis). A low value, typically less than 0.05, leads to a rejection of the nullhypothesis, meaning the variable is statistically significant.