What is Considered to Be a "Strong" Correlation?

The closer the coefficient is to -1.0, the stronger the negative relationship will be. A correlation coefficient of zero, or close to zero, shows no meaningful relationship between variables. A coefficient of -1.0 or +1.0 indicates a perfect correlation, where a change in one variable perfectly predicts the changes in the other.

It’s important to note that two variables could have a strong positive correlation or a strong negative correlation. When we look at the matrix graph or the pairwise Pearson correlations table we see that we have six possible pairwise combinations (every possible pairing of the four variables). Let’s say we wanted to examine the relationship between exercise and height. We would find the row in the pairwise Pearson correlations table where these two variables are listed for sample 1 and sample 2. The correlation between exercise and height is 0.118 and the p-value is 0.026.

In this course, we have been using Pearson’s $r$ as a measure of the correlation between two quantitative variables. Label your variables $x$ and $y$ as it is easier to work with letters compared to names of variables. In this example denote ‘test score (out of 10)’ by $x$ and ‘hours playing video games per week’ by $y$. Experimentation is an important aspect of statistical measures and can be used to determine whether a strong correlation indicates a cause-effect relationship. For example, before the effects of smoking were better known, we could not have said that smoking causes lung cancer if we were only given that there was a strong correlation between the two. Further experimentation needed to be done to confirm that smoking does indeed cause lung cancer.

However, the definition of a “strong” correlation can vary from one field to the next.
As the numbers approach 1 or -1, the values demonstrate the strength of a relationship; for example, 0.92 or -0.97 would show, respectively, a strong positive and negative correlation.
In this context, the utmost importance should be given to avoid misunderstandings when reporting correlation coefficients and naming their strength.
Cramer’s V is an alternative to phi in tables bigger than 2 × 2 tabulation.
All a strong correlation between two variables means is that the pairs of variables are likely to lie in a similar relative space (positive r) or dissimilar space (negative r).

Age (in years) and height (in centimeters) are both quantitative variables. From the scatterplot below we can see that the relationship is linear (or at least not non-linear). Remember that Pearson correlation detects only a linear relationship! For coefficients that can detect other types of relationship, see our correlation calculator. There is a positive, moderately strong, relationship between WileyPlus scores and midterm exam scores in this sample.

For non-normal distributions (for data with extreme values, outliers), correlation coefficients should be calculated from the ranks of the data, not from their actual values. The coefficients designed for this purpose are Spearman’s rho (denoted as rs) and Kendall’s Tau. In fact, normality is essential for the calculation of the significance and confidence intervals, not the correlation coefficient itself. It should be used when the same rank is repeated too many times in a small dataset. Some authors suggest that Kendall’s tau may draw more accurate generalizations compared to Spearman’s rho in the population. When writing a manuscript, we often use words such as perfect, strong, good or weak to name the strength of the relationship between variables.

Beware of Non-Linear Relationships

Also, keep in mind that even weak correlations can be statistically significant, as you will learn shortly. To obtain the rank variables, you just need to order the observations (in each sample separately) from lowest to highest. The smallest observation then gets rank 1, the second-smallest rank 2, and so on – the highest observation will have rank n. You only need to be careful when the same value appears in the data set more than once (we say there are ties).

Click here to read about other mind-blowing examples of crazy correlations.
It allows you to easily compute all of the different coefficients in no time.
Ranking observations from lowest to highest is necessary in many statistical procedures, we’ve already covered it, e.g., in our Wilcoxon rank-sum test calculator.
The linear regression equation, in this case, will be a reliable model for future forecasts or predictions.
Authors of those definitions are from different research areas and specialties.

However, it is unclear where a good relationship turns into a strong one. The same strength of r is named differently by several researchers. Therefore, there is an absolute necessity to explicitly report the strength and direction of r while reporting correlation coefficients in manuscripts. One of the most commonly used correlation coefficients measures the strength of a linear relationship between two variables. It is known as the Pearson correlation coefficient, or Pearson’s r, and is denoted as r.

Correlation coefficient

The more data there is, the less likely that an outlier will skew the data to any significant degree. You may have noticed that all three of the regressions shown above also show an r2 value or an R2 value. Note also in the plot above that there are two individuals with apparent heights of 88 and 99 inches. A height of 88 social security tax rates inches (7 feet 3 inches) is plausible, but unlikely, and a height of 99 inches is certainly a coding error. Obvious coding errors should be excluded from the analysis, since they can have an inordinate effect on the results. It’s always a good idea to look at the raw data in order to identify any gross mistakes in coding.

Korean age

Generally, whenever the term “correlation coefficient” is used without specifying the type, this is the correlation coefficient being referenced. It is calculated using different formulas depending whether the collected data represents a population or a sample. We previously created a scatterplot of quiz averages and final exam scores and observed a linear relationship. Here, we will compute the correlation between these two variables.

Examples of Positive and Negative Correlation Coefficients

Scatterplot of systolic and diastolic blood pressures of a study group according to sex. The correlation between two variables is considered to be strong if the absolute value of r is greater than 0.75. However, the definition of a “strong” correlation can vary from one field to the next. These data are from the Journal of Statistics Education data archive. The Pearson correlation coefficient is most often denoted by r (and so this coefficient is also referred to as Pearson’s r). Our next step is to multiply each student’s WileyPlus $z$ score with his or her midterm exam score.

From this scatterplot we can determine that the relationship may be weak, but that it is reasonable to consider a linear relationship. If we were to draw a line of best fit through this scatterplot we would draw a straight line with a slight upward slope. The first step is to convert every WileyPlus score to a $z$ score and every midterm score to a $z$ score. When we constructed the scatterplot in Minitab we were also provided with summary statistics including the mean and standard deviation for each variable which we need to compute the $z$ scores. This Pearson correlation calculator helps you determine Pearson’s r for any given two variable dataset.

There is quite a lot of scatter, and the large number of data points makes it difficult to fully evaluate the correlation, but the trend is reasonably linear. Cramer’s V is an alternative to phi in tables bigger than 2 × 2 tabulation. However, a value bigger than 0.25 is named as a very strong relationship for the Cramer’s V (Table 2). In this section, we show you step-by-step how to calculate Spearman’s correlation coefficient by hand.