Using+regression+analysis+to+derive+a+demand+curve,+also+t-stats,++R-squaredt-stats,+F-stat,+R-squared,+adjusted+R-square

Regression analysis is a tool that is used to determine a best fit line to a series of data points. For example, if a company has a set of data that gives the quantities of items sold at various prices then a regression analysis can be used to calculate a best fit line to this data, giving an estimated demand function. Typically one can use a spreadsheet package such as Microsoft Excel to calculate the actual regression equation. For example, if one has the following data:
 * Units Sold || Price ||
 * 400 || $50 ||
 * 482 || $45 ||
 * 525 || $40 ||
 * 600 || $35 ||
 * 643 || $30 ||
 * 702 || $25 ||
 * 749 || $20 ||
 * 800 || $15 ||

A linear regression can be performed on this data in Excel by clicking on Tools --> Data Analysis --> Regression. Once this is done simply input your X (independent variable, in this case price) and Y (dependent variable, in this case units sold) data and click OK. Once that is done the following information will be printed out.


 * Regression Statistics ||  ||
 * Multiple R || 0.99722459 ||
 * R Square || 0.994456883 ||
 * Adjusted R Square || 0.99353303 ||
 * Standard Error || 11.07343812 ||
 * Observations || 8 ||


 * || df || SS || MS || F || Significance F ||
 * Regression || 1 || 131992.1488 || 131992.1 || 1076.423 || 5.33356E-08 ||
 * Residual || 6 || 735.7261905 || 122.61 ||  ||   ||
 * Total || 7 || 132727.875 ||  ||   ||   ||


 * || Coefficients || Standard Error || t Stat || P-value ||
 * Intercept || 977.0119048 || 11.77618562 || 82.96506 || 2.07E-10 ||
 * X Variable 1 || -11.21190476 || 0.341733719 || -32.8089 || 5.33E-08 ||

With all the information that is given here it takes a few moments to completely understand what is being told at this point. The first thing that one would want to look at would be the Intercept and X Variable 1 coefficients. This data gives the equation of the line, as the Y intercept would be 977.01 and the slope would be -11.21. In other words the equation of the line would be Units Sold = 977.01 – (Price * 11.21). With the equation to the line now known the next step is to determine if the coefficients are significant. For this either the t Stat or the P-value can be used. The typical rule of thumb is that if the absolute value of the t Stat is greater than 2 or the P-value is less than 0.05 then the coefficient is significant. Either of these two values can be used as they give the same information. It is important to note that significance is somewhat subjective. These rules of thumb are based around the idea of using a confidence interval of 95%. Depending on the type of data that is being evaluated different confidence intervals can be used, typically 90% or 99%. If using different confidence intervals then the values needed for significance will be altered. A full explanation of this is beyond the scope of this article, however it is important to realize this.

The remaining data that we will look at will give us an idea of how well the regression line actually fits the real world data. In order to do this the first value that should be inspected is the R-square value. This value gives the percentage of the total variation in the dependent variable that can be explained by the regression (Baye 100). What this means is that, in our case, the estimated demand equation explains 99.44 percent of the total variation in units sold. The R-square value ranges between 0 and 1, with 1 being a perfect fit. As can be seen from the above chart, this example has an extremely good fit to the data.

The next value that will be explored is the adjusted R-square value. This value takes into consideration the number of observations and the number of estimated coefficients, as the number of estimated parameters cannot exceed the number of observations (Baye 101). If there would have been a large number of estimated coefficents compared to observations then this number can become dramtically lower than R-square and even become negative. Based on the fact that the adjusted R-square in this example was close to the R-square then one would conclude that an overly large number of coeffecients were not estimated.

The final value that will be explored is the F-statistic. This value gives one an idea of the amount of total variation that the regression can explain compared to the variation that cannot be explained (Baye 102). Simply put, the greater this value the better the regression line fits the experimental data. The regression in the example above has an F-statistic of 1076. While this is a large number, by itself it means very little. The value that does mean something is the Significance of F, which is 5.33E-5. What this means is that there is a .00533 percent chance that it is purely coincidental that the regression fit the data. The typical rule of thumb for this value is that a significance value of less than 5 percent is considered significant. Based on this, the above example is significant.

The use of regression analysis has a wide variety of real world uses. As in the example discussed above, this can be used to determine the demand curve for a particular item. If a company has a variety of data on prices and units sold then a regression can be performed and with this curve the company could determine what would happen if prices were increased or decreased for that particular product. With an estimate of the total number of sales the company could then estimate if changing the price would lead to a positive or negative change in income.


 * Sample Questions:**

1: Based on the rules of thumb, the absolute value of the T-stat should be greater than which number for a coefficient to be significant? A. 1 B. 1.50 C. 2 D. 2.5

2: Based on the rules of thumb, the P-value should be less than which value in order for the coefficient to be considered significant? A. .20 B. .15 C. .10 D. .05

3: The possible values for R-square revolve are in what range? A. -1 to 1 B. -1 to 0 C. 0 to -1 D. 0 to infinity

4: True or False? The difference between the R-square value and the adjusted R-square value is that the adjusted R-square takes into consideration the number of estimated coefficients compared to the number of observations.

5: True or False? Based on the rule of thumb if a regression has a Significance of F equal to .13 then the data is significant.

Answers: 1: C 2: D 3: C 4: True 5: False


 * Works Cited**

Baye, Michael R. __Managerial Economics and Business Strategy__. New York: McGraw-Hill Irwin, 2006.