Last month we discussed that learning the language of statistics is crucial to communicating your research. Now is the time to look a bit more closely at some ambiguous terms that are often thrown around in conversations on analysis of research data. Obviously, this is not an exhaustive list and if you would like to check out some other statistical dictionaries, the Oxford reference and the UC Berkeley Glossary of Statistical Terms are good starting points.
Kaplan et al. (2009) have published a table of 36 lexically ambiguous words, of which they discuss five in more detail in their paper. I randomly selected another 5 to put under the looking glass here.
Bias
In English, bias refers to the inclination or prejudice for or against one person or group, especially in a way considered to be unfair. In the context of dressmaking it means a direction diagonal to the weave of a fabric (e.g. a silk dress cut on the bias).
In statistics, bias refers to a systematic as opposed to a random discrepancy between a statistic (estimated from the data) and the truth (expected in the population). So, put simply, a measurement procedure is said to be biased if, on average, it gives an answer that differs from the truth. Statistical bias can be unknown, unintended or deliberate. Whilst often undesired, there is no specific negative statistical connotation as there is in English.
Error
When somebody makes an error, it’s commonly understood as making a mistake. In statistics, an error (or residual) is not a mistake but rather a difference between a computed, estimated, or measured value and the accepted true, specified, or theoretically correct value. The error often contains useful information around distributional assumptions and model fit.
We also often talk about Type I and Type II errors. These errors result from hypothesis testing where the hypothesis conclusion does not align with the underlying truth. In this context, the error is indeed a “mistake”. A Type I error occurs if the null hypothesis is rejected when in fact it is true (i.e. false positive), while a Type II error is not rejecting a false null hypothesis (i.e. false negative).
Mode
The English dictionary defines mode as a manner of acting or doing, or a particular type or form of something. It can also be a designated condition or status, as for performing a task or responding to a problem (e.g. a machine in the automatic mode). In philosophy, a mode is an appearance, form or disposition taken by a thing, or by one of its essential properties or attributes. While in music, mode refers to any of various arrangements of the diatonic tones of an octave, differing from one another in the order of the whole and half steps.
In statistics the mode is a measure of central tendency and it is the value that appears most often in a set of data values. The numerical value of the mode is the same as that of the mean and median in a normal distribution, and it may be very different in highly skewed distributions. While the mean can only be calculated for continuous data, the mode applies to all data scales including nominal and ordinal data.
Null
Null means without value, effect, consequence or significance. In law, it refers to having no legal or binding force. In electronics, it is a point of minimum signal reception, as on a radio direction finder or other electronic meter. In mathematics, the word null is often associated with the concept of zero or the concept of nothing. And it is probably this association which drives the misconception that the statistical null hypothesis is by definition a test against 0. However, a null hypothesis is a type of conjecture used in statistics that proposes that there is no difference/relationship between certain characteristics of a population or data-generating process.
Significant
Something significant in English indicates that it is sufficiently great or important to be worthy of attention. In research the term statistically significant is used when the null hypothesis is rejected with a sufficiently small p-value. The confusion arises when a statistically significant result is advertised as being significant (i.e. important) and meaning is attached to the size of the p-value. Lots of ink has flown over the pros and cons of this process and the misconceptions arising. But as a principle, you as a researcher will need to keep in mind that a statistically significant effect does not equal by definition an important effect.
From experience when talking to clients, I would like to add two more words to this list that are often quite confusing because they indicate different things in different areas or even software.
Factor
In English a factor can be a circumstance, fact, or influence that contributes to a result. In mathematics t is also a number or quantity that when multiplied with another produces a given number or expression. In statistics a factor can take on different meanings depending on the context. In an experiment, the factor (also called an independent variable) is an explanatory variable manipulated by the experimenter. In a broader context, especially in software packages like SPSS and R, a factor refers to an independent categorical variable. In factor analysis, a factor is a latent (unmeasured) variable that expresses itself through its relationship with other measured variables. The purpose of factor analysis is to analyse patterns of response as a way of getting at this underlying factor.
Covariate
Covariate does not necessarily have a specific meaning in English but in statistics two connotations are attached to it. Note that mathematically there is no distinction between the two interpretations.
Covariates are often used to refer to continuous predictor variables. SPSS is particularly guilty of that. However, in its original sense a covariate is a control variable. And sometimes researchers use the term covariate to mean any control variable, including controlling for the effects of a categorical variable. In SPSS however, you would enter your categorical covariate as a fixed factor. Still following?
At the end of the day, lexical ambiguity can be resolved by being careful in setting up hypotheses, running analyses and interpreting results. This requires knowledge of the language as well as an understanding of the context in which you are applying statistics. Just like Rome wasn’t built in one day, these are just baby steps in the right direction. Through interaction with peers and experts, you will find your way and develop your statistical language.
References
Kaplan, J.J., Fisher, D.G., and Rogness, N.T. (2009). Lexical ambiguity in statistics: what do students know about the words association, average, confidence, random and spread? Journal of Statistics Education 17(3).
Marijke joined the Statistical Consulting Unit in May 2019. She is passionate about explaining statistics, especially to those who deem themselves not statistically gifted.