How to Calculate the Correlation Coefficient in Excel

Last reviewed: November 4, 1994
Article ID: Q66403

SUMMARY

The formula for the correlation coefficient is as follows:

{=SQRT(SUM((TREND(Y's,X's)-AVERAGE(Y's))^2/SUM((Y's-AVERAGE(Y's))^2))}

This MUST be entered as an array formula.

The example assumes the following:

   Known-Ys      Known-Xs
   --------      --------

   A1: 3100     B1: 3000
   A2: 4500     B2: 4000
   A3: 4400     B3: 5000
   A4: 5400     B4: 6000
   A5: 7500     B5: 7000
   A6: 8100     B6: 8000

{=SQRT(SUM((TREND(A1:A6,B1-B6)-AVERAGE(A1:A6)) ^2/SUM((A1:A6-AVERAGE(A1:A6))^2)))}

In this case, the known "Y" values are close to the estimated "Y" values. Applying the above formula for the correlation coefficient returns a value of .9335. This value is quite close to +1, indicating that the TREND function returns Y values that are good predictions.

MORE INFORMATION

You may want to know how closely the TREND function predicts new Ys according to your actual data. The correlation coefficient is a measure used primarily in psychological and educational research and helps to predict the "error" of the New Ys. The correlation coefficient is plus or minus the square root of the coefficient of determination. The correlation coefficient gives an approximation of the error of the Known-Ys in relation to the New-Ys (which are returned by the TREND function). The correlation coefficient is a number between -1 and +1.

A value close to 0 means that the residuals (distance between the Known-Ys and New-Ys) are quite large, and that the plotted points are relatively far from the regression line. In other words, the New-Ys do not predict the Known-Ys very well, and the model (best-fit line) is a failure. Conversely, if the coefficient is close to +1, the residuals are relatively small and the plotted points are close to the regression line.

In such a situation, the model is a relatively good predictor of the dependent variable (Known-Ys).


Additional reference words: 2.00 2.01 2.10 2.10c 2.10d 3.00 2.20 2.21
2.0 2.1 2.1c 2.1d 3.0 2.2 r r-squared


THE INFORMATION PROVIDED IN THE MICROSOFT KNOWLEDGE BASE IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND. MICROSOFT DISCLAIMS ALL WARRANTIES, EITHER EXPRESS OR IMPLIED, INCLUDING THE WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL MICROSOFT CORPORATION OR ITS SUPPLIERS BE LIABLE FOR ANY DAMAGES WHATSOEVER INCLUDING DIRECT, INDIRECT, INCIDENTAL, CONSEQUENTIAL, LOSS OF BUSINESS PROFITS OR SPECIAL DAMAGES, EVEN IF MICROSOFT CORPORATION OR ITS SUPPLIERS HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. SOME STATES DO NOT ALLOW THE EXCLUSION OR LIMITATION OF LIABILITY FOR CONSEQUENTIAL OR INCIDENTAL DAMAGES SO THE FOREGOING LIMITATION MAY NOT APPLY.

Last reviewed: November 4, 1994
© 1998 Microsoft Corporation. All rights reserved. Terms of Use.