Excel: Calculating Linear Regression

Last reviewed: November 2, 1994
Article ID: Q51026

SUMMARY

Microsoft Excel can be very useful when you need to make predictions of future data based on historical data. One of the most common techniques used for making such predictions is that of linear regression.

MORE INFORMATION

For example, assume you have accumulated eight years' worth of sales figures for a company. These figures are as follows:

     |   A  |       B
   ------------------------
   1 | Year | Quantity Sold
   2 |   1  |     1250
   3 |   2  |     1540
   4 |   3  |     1798
   5 |   4  |     3300
   6 |   5  |     4000
   7 |   6  |     3700
   8 |   7  |     3100
   9 |   8  |     3484

From these numbers, you want to predict the sales for the next years (years 9, 10, and 11). To accomplish this, use the Excel TREND() function. The TREND() function accepts arrays of known y's and known x's, and returns a linear trend of values along the best fit line for these x's and y's. (Note: The cells in column C are formatted as "0.00".)

      |   A  |        B      |     C
    -------------------------------------
    1 | Year | Quantity Sold | Predicted
    2 |   1  |      1250     |  1528.00
    3 |   2  |      1540     |  1883.29
    4 |   3  |      1798     |  2238.57
    5 |   4  |      3300     |  2593.86
    6 |   5  |      4000     |  2949.14
    7 |   6  |      3700     |  3304.43
    8 |   7  |      3100     |  3659.71
    9 |   8  |      3484     |  4015.00
   10 |   9  |               |  4370.29
   11 |  10  |               |  4725.57
   12 |  11  |               |  5080.86

In cells C2:C12, enter the formula =TREND(B2:B9,A2:A9,A2:A12) as an array (by selecting C2:C12, typing the formula, and pressing COMMAND+ENTER). In this formula, cells B2:B9 are the known y's, A2:A9 are the known x's, and A2:A12 are the new x's (the points on the line for which we want to have y values returned).

You can now chart the data to view your results. If you're using Excel 2.20, do the following:

  1. Select the area A1:C12.

  2. From the Edit menu, choose Copy to copy this area to the Clipboard.

  3. From the File menu, choose New, select Chart, and click OK.

  4. From the Edit menu, choose Paste Special (Values in Columns, Series Names in First Row, Categories in First Column).

  5. From the Gallery menu, choose Line.

If you're using Excel version 1.00, 1.03, 1.04, 1.06, or 1.50, do the following:

  1. Select the area C1:C12.

  2. From the Edit menu, choose Copy to copy this area to the Clipboard.

  3. From the File menu, choose New, select Chart, and click OK.

  4. From the Edit menu, choose Paste.

  5. Use the Gallery menu to make a line chart.

  6. Using the Window menu, activate your worksheet.

  7. Select the area B1:B9.

  8. From the Edit menu, choose Copy.

  9. Using the Window menu, activate the chart you created in Step 3.

  10. From the Edit menu, choose Paste.

The chart created should look similar to the following (without the labels):

  6000.00|
         |
         |
  5000.00|     Predicted ==> !
         |              ___ /
         |             /  /\  % <== Actual
  4000.00|            / /   \/
         |           |/
         |          /|
  3000.00|        / /
         |      /  |
         |    /   /
  2000.00|  /   _/
         |/  _/
         ! /
  1000.00%
         |
         |
     0.00|
         ----------------------
         1 2 3 4 5 6 7 8 9 10 11

After predicting the values, you need to determine how accurately the predicted line reflects the actual data. To do this, calculate the coefficient of determination, or the "R squared" value, as follows:

   r2=sum(Yest-Yavg)2/sum(Y-Yavg)2

To begin with, calculate the average for the y's (Quantity Sold). In cell B15, enter the formula =AVERAGE(B2:B9) and receive the result of 2771.50. The worksheet now looks like the following:

      |   A  |       B       |       C
    --------------------------------------
    1 | Year | Quantity Sold |  Predicted
    2 |   1  |     1250      |   1528.00
    3 |   2  |     1540      |   1883.29
    4 |   3  |     1798      |   2238.57
    5 |   4  |     3300      |   2593.86
    6 |   5  |     4000      |   2949.14
    7 |   6  |     3700      |   3304.43
    8 |   7  |     3100      |   3659.71
    9 |   8  |     3484      |   4015.00
   10 |   9  |               |   4370.29
   11 |  10  |               |   4725.57
   12 |  11  |               |   5080.86
   13 |      |               |
   14 |      |               |
   15 | Yavg |               |   2771.5

Next, establish two columns to represent the two parts of your equation. In column D, enter the formula for (Yest-Yavg)2, and in column E, enter the formula for (Y-Yavg)2. In cell D2, enter the formula =(C2-$B$15)^2 and then fill the formula through cell D9. In cell E2, enter the formula =(B2-$B$15)^2 and then fill this formula down through cell E9. Now, calculate the coefficient of determination in cell E15 by entering the formula =SUM(D2:D9)/SUM(E2:E9). (Note: Columns C, D, and E are formatted as "0.00", and cell E15 is formatted as "General".)

|   |   A  |       B       |     C     |       D       |      E     |
| 1 | Year | Quantity Sold | Predicted | (Yest-Yavg)^2 | (Y-Yavg)^2 |
| 2 |   1  |     1250      |  1528.00  |  1546292.25   | 2314962.25 |
| 3 |   2  |     1540      |  1883.29  |   788924.62   | 1516592.25 |
| 4 |   3  |     1798      |  2238.57  |   284012.86   |  947702.25 |
| 5 |   4  |     3300      |  2593.86  |    31556.98   |  279312.25 |
| 6 |   5  |     4000      |  2949.14  |    31556.98   | 1509212.25 |
| 7 |   6  |     3700      |  3304.43  |   284012.86   |  862112.25 |
| 8 |   7  |     3100      |  3659.71  |   788924.62   |  107912.25 |
| 9 |   8  |     3484      |  4015.00  |  1546292.25   |  507656.25 |
|10 |   9  |               |  4370.29  |               |            |
|11 |  10  |               |  4725.57  |               |            |
|12 |  11  |               |  5080.86  |               |            |
|13 |      |               |           |               |            |
|14 |      |               |           |               |            |
|15 | Yavg |     2771.5    |           |               | 0.658952019|

The closer the R squared value comes to 1, the better the fit of data points to the predicted values. In this case, the R squared value returned is .65895, which represents a fairly good fit, but also indicates that our predicted values are not very reliable.


KBCategory: kbother
KBSubcategory:

Additional reference words: 1.00 1.03 1.04 1.06 1.50 2.20


THE INFORMATION PROVIDED IN THE MICROSOFT KNOWLEDGE BASE IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND. MICROSOFT DISCLAIMS ALL WARRANTIES, EITHER EXPRESS OR IMPLIED, INCLUDING THE WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL MICROSOFT CORPORATION OR ITS SUPPLIERS BE LIABLE FOR ANY DAMAGES WHATSOEVER INCLUDING DIRECT, INDIRECT, INCIDENTAL, CONSEQUENTIAL, LOSS OF BUSINESS PROFITS OR SPECIAL DAMAGES, EVEN IF MICROSOFT CORPORATION OR ITS SUPPLIERS HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. SOME STATES DO NOT ALLOW THE EXCLUSION OR LIMITATION OF LIABILITY FOR CONSEQUENTIAL OR INCIDENTAL DAMAGES SO THE FOREGOING LIMITATION MAY NOT APPLY.

Last reviewed: November 2, 1994
© 1998 Microsoft Corporation. All rights reserved. Terms of Use.