Psychology Textbook Unit 9 Simple Linear Regression

Explore the Psychology Textbook Unit 9 Simple Linear Regression study material pdf and utilize it for learning all the covered concepts as it always helps in improving the conceptual knowledge.

Subjects

Social Studies

Grade Levels

K12

Resource Type

PDF

Psychology Textbook Unit 9 Simple Linear Regression PDF Download

Unit . Simple Linear Regression CASTRO AND TOBY Summary . This unit further explores the relationship between two variables with linear regression analysis . In a simple linear regression , we examine whether a linear relationship can be established between one predictor ( or explanatory ) variable and one outcome ( or response ) variable , such as when you want to see if time socializing can predict life satisfaction . Linear regression is a statistical analysis , and the foundation for more advanced statistical techniques . Prerequisite Units Unit . to Statistical Significance Unit . Correlational Measures Linear Regression Concept Imagine that we examine the relationship between social media use and anxiety in adolescents and , when analyzing the data , we obtain an of , indicating a positive correlation between social media use and anxiety . The data points in the Simple Linear Regression

on the top of Figure illustrate this positive relationship . The relationship is even clearer and faster to grasp when you see the line included on the plot on the bottom . This line summarizes the relationship among all the data points , and it is called regression line . This line is the line for predicting anxiety based on the amount of social media use . It also allows us to better understand the relationship between social media use and anxiety . We will see in this unit how to obtain this line , and what exactly means . Simple Linear Regression 103

30 920 ooo . 0000 as . ooo 000800 00 oo 000 oooo oo oo 20 40 60 80 100 120 Social Media Use ( in minutes ) 30 in ) 20 40 60 80 100 120 Social Media Use ( in minutes ) Figure . showing the relationship between social media use and anxiety scores . The are exactly the same , except that the one on the bottom has the regression line added . A linear regression analysis , as the name indicates , tries to Simple Linear Regression

capture a linear relationship between the variables included in the analysis . When conducting a correlational analysis , you do not have to think about the underlying nature of the relationship between the two variables it does matter which ofthe two variables you call and which you call . You will get the same correlation coefficient ifyou swap the roles ofthe two variables . However , the decision of which variable you call and which variable you call does matter in regression , as you will get a different line if you swap the two variables . That is , the line that best predicts from is not the same as the line that predicts from , In the basic linear regression model , we are trying to predict , or trying to explain , a quantitative variable on the basis of a quantitative variable . Thus The variable in the relationship is typically named predictor or explanatory variable . This is the variable that may be responsible for the values on the outcome variable . The variable in the relationship is typically named outcome or response or criterion variable . This is the variable that we are trying to . In this unit , we will focus on the case of one single predictor variable , that why this unit is called simple linear regression , But we can also have multiple possible of an outcome if that is the case , the analysis to conduct is called multiple linear regression . For now , let see how things work when we have one possible predictor of one outcome variable . Linear Regression Equation You may be interested in whether the amount of caffeine Simple Linear Regression

intake ( predictor ) before a run can predict or explain faster running times ( outcome ) or whether the amount of hours studying ( predictor ) can predict or explain better school grades ( outcome ) When you are doing a linear regression analysis , you model the outcome variable as a function of the predictor variable . For example , you can model school grades as a function of study hours . The formula to model this relationship looks like this be And this is the meaning of each of its elements ( read as ) the predicted value ofthe variable the intercept or value of the variable when the predictor variable the slope of the regression line or the amount of in for each in one unit of the value ofthe predictor variable When we conduct a study , the data from our two variables and are represented by the points in the , showing the different values of and that we obtained . The line represents the values , that is , the predicted values when we use the and values that we measured in our sample to calculate the regression equation . Figure shows a regression line depicting the relationship between number of study hours and grades , and where you can identify the intercept and the slope for the regression line . Simple Linear Regression

100 Grade 70 10 12 Study Hours Figure . Regression line depicting the relationship between number hours and grades . For the sake , only observations are included . Here , the intercept is equal to 72 , and the slope is equal to . You can check and see that , regardless measure the slope , its value is always the same . We have defined the intercept as the value of when equals zero . Note that this value may be meaningful in some situations but not in others , depending on whether or not having a zero value for has meaning and whether that zero value is very near or within the range of ) that we used to estimate the intercept . Analyzing the relationship between the number of hours studying and grades , the intercept would be the grade obtained when the number of study hours is equal to . Here , the intercept tells us the expected grade in the absence of any time studying . That helpful to know . But let say , for example , that we are examining the relationship Simple Linear Regression

between weight in pounds ( variable ) and body image satisfaction ( variable ) The intercept will be the score in our body image satisfaction scale when weight is equal to , Whatever the value of the intercept may be in this case , it is meaningless , given that it is impossible that someone weighs pounds . Thus , you need to pay attention to the possible values of to decide whether the value of the intercept is telling you something meaningful . Also , make sure that you look carefully at the is scale . If the is does not start at zero , the intercept value is not depicted that is , the value at the point in which the regression line touches the is not the intercept if the does not start at zero . The critical term in the regression equation is bi or the slope , We have defined the slope as the amount of increase or decrease in the value of the variable for a increase or decrease in the variable . Therefore , the slope is a measure of the predicted rate of change in . Let say that the slope in the linear regression with number of study hours and grades is equal to . That would mean that for each additional hour of study , grade is expected to increase points , And , because the equation is modeling a linear relationship , this increase in 25 points will happen with increase at any point within our range of values that is , when the number of study increase from to , or when they increase from to , in both cases , the predicted grade will increase in points , The different meanings of The letter is used to represent a sample estimate of the coefficient in the population . Thus bo is the sample estimate of , and bi is the sample estimate of 31 , As we mentioned in Unit , for sample statistics Simple Linear Regression

we typically use Roman letters , whereas for the corresponding population parameters we use Creek letters . However , be aware that is also used to refer to the standardized regression , a sample statistic . Standardized data or are those that have been transformed so as to have a mean of zero and a standard deviation of one . So , when you encounter the symbol , make sure you know to what it refers . How to Find the Values for the Intercept and the Slope The statistical procedure to determine the straight line connecting two variables is called regression The question now is to determine what we mean by the line . It is the line that minimizes the distance to our data points or , in different words , the one that minimizes the error between the values that we obtained in our study and the values predicted by the regression model . Visually , it is the line that minimizes the vertical distances from each ofthe individual points to the regression line . The distance or error between each predicted value and the actual value observed in the data is easy to calculate error This error or residual , as is also typically called , is not an error in the sense of a mistake . It tries to capture variations in the outcome variable that may occur due to unpredicted or unknown factors . If the observed data point lies above the line , the error is positive , and the line underestimates the actual Simple Linear Regression

data value for . If the observed data point lies below the line , the error is negative , and the line overestimates that actual data value for ( see error for each data point in green in Figure ) For example , in Figure 93 , our regression model overestimates the grade when the number of study hours are , whereas it underestimates the grade when the number of study hours are 10 . 100 95 90 85 Grade 80 70 ' 10 12 Study Hours Figure . The same regression line as in Figure depicting the relationship between number hours and grades . Here , the errors or residuals , that is , the distance between each predicted value regression line ) and the actual value observed in the data ( data points in grey ) are indicated with the green lines . Because some data points are above the line and some are below the line , some error values will be positive while others will be negative , so that the sum of all the errors will be zero . Because it is not possible to conduct meaningful calculations Simple Linear Regression

with zero values , we need to calculate the sum of all the squared errors . The result will be a measure of overall or total squared error between the data and the line total squared error ( So , the line will be the one that has the smallest total squared error or the smallest sum of squared residuals . In addition , the residuals or error terms are also very useful for checking the linear regression model assumptions , as we shall see below . Getting back to our calculations , we know that our regression equation will be the one that minimizes the total squared error . So , how do we find the specific values of and that generate the line ?

We start calculating , the slope ) where the standard deviation of the values and 33 the standard deviation of ( that you learned to calculate in Unit ) and the correlation coefficient between and ( that you learned to calculate in ) Once we have found the value for the slope , it is easy to calculate the intercept Practice Any statistical software will make these calculations Simple Linear Regression

for you But , before learning how to do it with or Excel , let find the intercept and the slope of a regression equation for the small containing the number of study hours before an exam and the grade obtained in that exam , for 15 participants , that we used previously in and . These are the data 112 Simple Linear Regression

Pa Hours 16 14 12 15 18 20 16 17 13 12 14 Grade 78 89 85 84 86 95 96 83 81 93 92 84 83 88 As you can tell from the formulas above , we need to know the mean and the standard deviation for each of the variables ( see Unit ) From our calculations in Unit , we know that the mean number of hours ( and the mean grade obtained in the exam ( is . Regression 113

We also know that the standard deviation for hours ( ST ) is , and the standard deviation for grade ( sy ) is . In addition , from our calculations in i we know that for the relationship between study hours and grade is . So , let calculate first the slope for our regression line ( that is Once we have , we calculate the intercept ( the mean grade in the exam ) the mean number of study hours ) that is bo 6547 So , our regression equation is ?

This means that the expected grade of someone who studies hours will be , and that for each additional hour of study , a student is expected to increase their grade in points . Linear Regression as a Tool to Predict Individual Outcomes The regression equation is useful to make predictions about the expected value of the outcome variable given some specific value of the predictor variable . Following with our 114 Simple Linear Regression

example , if a student has studied for 15 hours , their predicted grade will be that is Of course , this prediction will not be perfect . As you have seen in Figure , our data points do not fit perfectly on the line . Normally , there is some error between the actual data points and the data predicted by the regression model . The closer the data points are to the line , the smaller the error is . In addition , be aware that we can only calculate the value for values of that are within the range of values that we included in the calculation of your regression model ( that is , we can interpolate ) We can not know if values that are smaller or larger than our range of values will display the same relationship . So , we can not make predictions for values outside the range values that we have ( that is , we can not extrapolate ) Linear Regression as an Explanatory Tool Note that , despite the possibility of making predictions , most of the times that we use regression in psychological research we are not interested in making actual predictions for specific cases . We typically are more concerned with finding general principles rather than making individual predictions . We want to know if studying for a longer amount of time will lead to have better grades ( although you probably already know this ) or we want to know if social media use leads to increased anxiety in adolescents . The linear regression analysis provides us with an estimate of the magnitude of the impact of a change in one variable on another . This way , we can better understand the overall relationship . Simple Linear Regression

Linear regression as a statistical tool in both correlational and experimental research Linear regression is a statistical technique that is independent of the design of your study . That is , whether your study is correlational or experimental , if you have two numerical variables , you could use a linear regression analysis . You need to be aware that , as mentioned at different points in this book , if the study is correlational , you can not make causal statements . Assumptions In order to conduct a linear regression analysis , our data should meet certain assumptions . If one or more of these assumptions are violated , then the results of our linear regression may be unreliable or even misleading . These are the four assumptions ) The relationship between the variables is linear The values ofthe outcome variable can be expressed as a linear function ofthe predictor variable . The easiest way to find ifthis assumption is met is to examine the with the data from the two variables , and see whether or not the data points fall along a straight line . Simple Linear Regression

) Observations are independent The observations should be independent of one another and , therefore , the errors should also be independent . That is , the error ( the distance from the obtained to the predicted by the model ) for one data point should not be predictable from knowledge of the error for another data point . Knowing whether this assumption is met requires knowledge of the study design , method , and procedure . If the linear regression model is not adequate , this does not mean that the data can not be analyzed rather , other analyses are required to take into account the dependence among the data observations . Constant variance at every value of Another assumption of linear regression is that the residuals should have constant variance at every value of the variable . In other words , variation of the observations around the regression line are constant . This is known as . You can see on the on the left side of Figure that the average distance between the data points above and below the line is quite similar regardless of the value of the variable . That what means . However , on the on the right , you can see the data points close to the line for the smaller values of , so that variance is small at these values but , as the value of increases , values of vary a lot , so some data points are close to the line but others are more spread out . In this case , the constant variance assumption is not met , and we say that the data show . Simple Linear Regression

no so 00 Figure . On the left , a showing that is , the variance is similar at all levels of the variable . On the right , a showing in this case , the variance of the observations for the lower values ofthe variable is much smaller ( the data points are tighter and closer to the line than the variance for the higher values of the variable . When ( that is , the variance is not constant but varies depending on the value of the predictor variable ) occurs , the results of the analysis become less reliable because the underlying statistical procedures assume that is true . Alternatives methods must be used when the data as is present . When we have a linear regression model with just one predictor , it may be possible to see whether or not the constant variance assumption is met just looking at the of and . However , this is not so easy when we have multiple , that is , when we are conducting a multiple linear regression . So , in general , in order to evaluate whether this assumption is met , once you fit the regression line to a set of data , you can generate a that shows the fitted values of the model against the residuals ofthose fitted values . This plot is called a residual by fit plot or residual by predicted plot or , simply , residuals plot . In a residual plot , the depicts the predicted or fitted values ( Ys ) whereas the is depicts the residuals or errors , as you can see in Figure . If the assumption of constant Simple Linear Regression

variance is met , the residuals will be randomly scattered around the center line of zero , with no obvious pattern that is , the residuals will look like an unstructured cloud of points , with a mean around zero . If you see some different pattern , is present . mam . Firm mam red Had Ah was values ma values Xi Figure . On the left , a residuals plot where the errors are evenly distributed without showing any pattern , an indication of . On the right , a residuals plot where the errors show different patterns depending on the fitted values of , a sign . Error or residuals are normally distributed The distribution of the outcome variable for each value of the predictor variable should be normal in other words , the error or residuals are normally distributed . The same as with the assumption of constant variance , it may be possible to visually identify whether this assumption is met by looking at the of and . In Figure above , for example , you can see that in both the distance between the actual values of and the predicted value of is quite evenly distributed for each value of the variable , suggesting therefore a normal distribution ofthe errors on each value of . So , note that , even if the constant variance assumption is not met , the residuals can still be normally distributed . Simple Linear Regression '

A normal or or plot of all of the residuals is a good way to check this assumption . In a plot , the depicts the ordered , observed , standardized , residuals . On the , the ordered theoretical residuals that is , the expected residuals if the errors are normally distributed ( see Figure ) If the points on the plot form a fairly straight diagonal line , then the normality assumption is met . a ' a , i . I Xi Figure . On the left , a plot showing a normal distribution ofthe residuals , so that this assumption is met . On the right , a plot showing a distribution ofthe residuals , so that this assumption is not met . It is important to check that your data meet these four assumptions . But you should also know that regression is reasonably robust to the equal variance assumption . Moderate degrees of violation will not be problematic . Regression is also quite robust to the normality assumption . So , in reality , you only need to worry about severe violations . Simple Linear Regression