A Primer on Regression

Regression Analysis
© P.V. Viswanath, 2002

Regression Analysis

In regression analysis, we start with the set of points that we have in the graph above. We then posit a linear relation between the variable on the y-axis (r_Tr) and the variable on the x-axis (r_Ma), .

However, it is obvious that this relationship is not exact. Hence, we add an additional term, called the residual that takes account of this departure from linearity. Symbolically, we write: .

We then choose values for the constants a and b, such that the importance of the residual or error term, t, is minimized. To find a unique set of values a and b, we impose, the restriction that the sum of the squared residuals, , be minimized, so that no individual residual becomes too large. This defines a unique value for b, which is related to the covariance:

.

We put a hat over the b, to indicate that this is an estimate of the true value of b in the linear relationship between r_Tr and r_Ma in the population.

The estimate of a is . Then, for any given value of r_Ma, the predicted value of r_Tr is simply . For our sample, this estimated relationship is shown in the output from a spreadsheet regression command:

Constant = 0.0044271 X Coefficient(s) = 0.4557292

R Squared = 0.6995842 Std Err of Coef. = 0.1055852

The return on Trump stock, r_Tr, is the dependent variable, since we are predicting it, while r_Ma, the return on Marriott stock, is the independent variable. The Y-variable refers to the dependent variable, and the X-variable refers to the independent variable.

From this output, we see that the estimated regression equation is:

r_Tr = 0.004427 + 0.45573 r_Ma.

The standard error of the estimated coefficient, which is 0.1055852 indicates the degree of our confidence in the coefficient estimate. Just as the given sample is just one of the many samples that could have been generated by our posited underlying relationship describing r_Tr and r_Ma, i.e. , we can also think of this regression as one of the many corresponding regressions.

With this approach, the estimated coefficient for r_Ma is the estimated mean of the probability distribution of coefficients that such repeated regressions would generate. The standard error, 0.1056, is then like the estimated standard deviation of this probability distribution of coefficients.

This graph shows the actual relationship in the sample between r_Tr and r_Ma, and the estimated relationship:

As we saw earlier in our discussion of linear functions, the estimated intercept â is the predicted value of r_Tr if r_Ma were zero. The estimated coefficient of r_Ma in the regression equation, , is simply the slope of the estimated linear relationship.

For each pair of values, r_Ma and r_Tr, in the sample, we can compute the residual, which is equal to . In the graph, this is simply the vertical distance between each actual value and its predicted value.

Example:

2. Here are the results of a regression of the returns on Cisco Systems (CSCO) on the returns on a portfolio tracking the S&P 500, using data from February 1994 to December 1999:

Beta Regression for CSCO

Regression Statistics

Multiple R

0.510354

R Square

0.260461

Adjusted R Square

0.249743

Standard Error

0.093309

Observations

71

ANOVA

df

SS

MS

F

Significance F

Regression

1

0.211581

0.211581

24.30136

5.45E-06

Residual

69

0.600752

0.008707

Total

70

0.812332

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

0.027427

0.012218

2.244826

0.027988

0.003053

0.051801

R_CSCO

1.387554

0.281472

4.929641

5.45E-06

0.826034

1.949075

a. What is the estimated beta of CSCO? (5 points)
b. What percentage of the return on CSCO cannot be explained by movements in the S&P 500? (5 points)
c. Can you reject the hypothesis that the average return on CSCO can be explained by the CAPM over the sample period, taking into account the uncertainty in the system? Use the point estimate of the CSCO beta to answer this question. (10 points)
d. The beta estimation for CSCO over the sample period is given below:

Regression using April 1990 to Dec. 1993

Regression Statistics

Multiple R

0.447362

R Square

0.200133

Adjusted R Square

0.181531

Standard Error

0.121373

Observations

45

ANOVA

df

SS

MS

F

Significance F

Regression

1

0.158494

0.158494

10.7589

0.002062

Residual

43

0.633449

0.014731

Total

44

0.791943

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

0.062248

0.018806

3.309956

0.001895

0.024321

0.100174

R_CSCO

1.62435

0.495217

3.280077

0.002062

0.62565

2.623049

g. (Bonus question) Did the true beta risk of CSCO change from the 1990-1993 period to the 1994-99 period? Explain your answer. If you believe it did change, what in your opinion, caused the change? (5 points)

Solution:

2. a. The estimated beta can be read out from the Coefficients column as 1.387554, or approximately 1.39.

2. b. The R² of a regression measures the proportion of the variance of the dependent variable that can be explained by the independent variable. Hence, in this case, the percentage of the return on CSCO that cannot be explained by movements in the S&P 500 is 1 - 0.260461 or approximately 74%.

2. c. The predicted value of the intercept, using the CAPM, and assuming that the estimated beta is correct is (1-1.3876)(0.004432) = -0.001718, which is outside the 95% confidence limits for the intercept. This suggests that the CAPM has underestimated the return on CSCO for the period.

2. d. It certainly looks like the beta changed; the point estimate went from 1.624 in the earlier period to 1.388 in the later period. However, the 1.624 number is within the 95% confidence interval for the later period beta; the 1.388 in the later period is also within the 95% confidence interval for the earlier period beta. This makes it more difficult to insist that the beta has decreased. Similarly, it would probably be difficult to claim that the operations of CSCO have changed from the first period to the second. On the other hand, CSCO was a new stock in the earlier period with an unproven technology; in the last five to ten years, it has matured somewhat, and it is conceivable that this has decreased the sensitivity of CSCO's stock price to market movements.

Go to the Probability Primer

Go to the FIN 320 Home Page

Go to the FIN 652 Home Page

Go to Prof. Viswanath's Home Page

Beta Regression for CSCO

Regression Statistics
Multiple R	0.510354
R Square	0.260461
Adjusted R Square	0.249743
Standard Error	0.093309
Observations	71

ANOVA
	df	SS	MS	F	Significance F
Regression	1	0.211581	0.211581	24.30136	5.45E-06
Residual	69	0.600752	0.008707
Total	70	0.812332

	Coefficients	Standard Error	t Stat	P-value	Lower 95%	Upper 95%
Intercept	0.027427	0.012218	2.244826	0.027988	0.003053	0.051801
R_CSCO	1.387554	0.281472	4.929641	5.45E-06	0.826034	1.949075

Regression using April 1990 to Dec. 1993
Regression Statistics
Multiple R	0.447362
R Square	0.200133
Adjusted R Square	0.181531
Standard Error	0.121373
Observations	45

ANOVA
	df	SS	MS	F	Significance F
Regression	1	0.158494	0.158494	10.7589	0.002062
Residual	43	0.633449	0.014731
Total	44	0.791943

	Coefficients	Standard Error	t Stat	P-value	Lower 95%	Upper 95%
Intercept	0.062248	0.018806	3.309956	0.001895	0.024321	0.100174
R_CSCO	1.62435	0.495217	3.280077	0.002062	0.62565	2.623049

Regression Analysis © P.V. Viswanath, 2002

Regression Analysis

Regression Analysis
© P.V. Viswanath, 2002