Regression (Chapters 5 and 14)

Site hosted by Angelfire.com: Build your free website today!

Regression (Chapters 5 and 14)

Regression

- We use regression to identify a line that best describes the relationship between the two variables.

- data points won’t all fall on a line, but regression gives us the one line that best fits the data.

- can also use regression to see how one variable can be predicted or estimated from the other.

Y = criterion variable

X = predictor variable

- This regression line has a form that is similar to the linear model we talked about before:

(This is known as the regression equation)

- Y hat is the predicted value of Y based on a given X score.

- we use predicted Y instead of actual Y in this equation because there is not a perfectly linear relationship between the two variables.

Calculation of the Slope and Intercept (defining the regression line)

- If we know the slope and Y-intercept, we can put in any X value and get the predicted Y value.

Slope (b):

SCP is the numerator from our calculation of the correlation coefficient, and stands for the sum of cross products.

SS_X is the sum of squares for our X scores and is in the denominator of our correlation coefficient.

Intercept (a):

- To calculate our intercept we need first calculate the mean for each of our variables and we also put the value of the slope into this equation.

There is some error that we will encounter when we are using a regression equation to predict an individual’s Y value based on their X value

- We could think of this error as the discrepancy between the person’s actual Y score and their predicted Y score:

- Important to note that the higher the value of your correlation, the closer together these two values will be.

- The least squares criterion defines the regression line so that the squared value of these discrepancies are as small as possible across all data points.

- If we want to know about how much error will result when using our equation to make predictions, we calculate the standard error of estimate.

- Standard error of estimate is a statistic that we use as our index of how much average error will result when we use the regression equation to predict(or calculate) Y scores from X scores.

- We can compute the estimated SE of estimate by using this formula:

(In Chapter 5, there was a formula for computing the SE of estimate (used only for amount of error in prediction in the sample), but the above formula is in Chapter 14 and is for calculating estimated SE of estimate (used for determining amount of error in predicting Y from the equation for people beyond what is in the sample), and this is the formula you should use for exam. The difference between the two formulas is analogous to the difference between the formula for standard deviation and your standard deviation estimate.)

- Remember, the better X is in predicting the Y score, the smaller the estimated SE of estimate will be.

- Relating it to correlation, the higher your correlation, the lower your estimated SE of estimate will be.

Issues Associated with the Use of Correlation and Regression

Predicting X from Y

- The regression equation predicting X from Y is not the same as the regression equation predicting Y from X – we would get a different slope and intercept if we were to calculate a regression equation predicting GRE scores from 1^st year GPA.

- From a statistical standpoint, making one variable X and one variable Y is arbitrary.

- On a conceptual level, which variable is the predictor (X) and which is the criterion (Y) is important. If we want to try to predict whether or not people will be successful in a particular job, we would use an aptitude test for that job as a predictor of success, and not the reverse.

Non-Linear Relationships

- Correlation can only be used to assess linear relationships, however, variables can be related, but not linearly (e.g., curvilinear relationships).

- The linear regression technique is only appropriate for looking at relationships that we assume to be linear.

- Just like with correlation, this procedure is not appropriate for relationships that are curvilinear.