Statistical Glossary

Founded 27-Jul-2003
Last update 12-Dec-2004


This brief glossary is intended to aid readers of the presented weight studies who are not familiar with some statistical terms.


Box-percentile plot
The box-percentile plot is a modified version of the well-known boxplot. At any height the width of the irregular “box” is proportional to the percentile of that height, up to the 50th percentile, and above the 50th percentile the width is proportional to 100 minus the percentile. Thus, the width at any given height is proportional to the percent of observations that are more extreme in that direction. As in boxplots, the median, 25th, and 75th percentiles are marked with line segments across the box. For details see Warren W. Esty and Jeffrey D. Banfield, The Box-Percentile Plot, Journal of Statistical Software, Volume 8, Number 17, 2003, pp. 1-14.
Confidence interval
The 100(1-α)% confidence interval is an interval for which approximately 100(1-α)% of similarly constructed intervals (for a large number of independent samples) will contain the parameter being estimated. Usual values of α are 0.1 (90% confidence level), 0.05 (95% confidence level) and 0.01 (99% confidence level).
Empirical cumulative distribution function
For each potential value x, the empirical cumulative distribution function is equal to the proportion of observations less than or equal to x.
Kurtosis is a measure of how outlier-prone a distribution is. The kurtosis of the normal distribution is 3. Distributions that are more outlier-prone than the normal distribution have kurtosis greater than 3; distributions that are less outlier-prone have kurtosis less than 3.
Normal probability plot
Normal probability plot is a graph for assessing whether data comes from a normal distribution. The plot has three graphical elements. The plus signs ‘+’ show the data value versus the empirical probability for each point in the sample. If all the data points fall near the line, the assumption of normality is reasonable. The solid line connects the 25th and 75th percentiles of the data and represents a robust linear fit of the sample order statistics. The dotted line extends the solid line to the ends of the sample to help evaluate the linearity of the data. The scale of the y-axis is not uniform. The y-axis values are probabilities and, as such, go from zero to one. The distance between the tick marks on the y-axis matches the distance between the quantiles of a normal distribution.
The p-value is the probability that a test statistic would assume a value greater than or equal to the observed value, i.e. the probability of observing the given sample result under the assumption that the hypothesis is true. In other words, the p-value is the smallest significance level at which the null hypothesis would be rejected for the given sample. Note that the p-value does not measure the probability that the hypothesis is true. Also note that the p-value is not the probability of rejecting a true hypothesis because this probability is determined by the chosen significance level α.
Significance level α
The significance level of a statistical hypothesis test is a fixed probability of wrongly rejecting the hypothesis, if it is in fact true. In other words, the significance level α is the probability of a type I error (a type II error β is the probability that the hypothesis is not rejected when it is in fact false). Usually, the significance level is chosen to be α = 0.05 = 5% or α = 0.01 = 1%.
Skewness is a measure of the asymmetry of a distribution. If skewness is negative, the data are spread out more to the left of the mode than to the right. If skewness is positive, the data are spread out more to the right. The skewness of any symmetric distribution is zero.