Problem StatementMileage of used cars is often thought of as a good predictor of sale prices of used cars. Does this same conjecture hold for so called “luxury cars”: Porches, Jaguars, and BMWs? More precisely, do the slopes and intercepts differ when comparing mileage and price for these three brands of cars? To answer this question, data was randomly selected from an Internet car sale site. (Tweaked a bit from Cannon et al. 2013 [Chapter 1 and Chapter 4]) Show
Competing HypothesesThere are many hypothesis tests to run here. It’s important to first think about the model that we will fit to address these questions. We want to predict We are dealing with a more complicated example in this case
though. We need to also include in \[\hat{Price} = b_0 + b_1 * Mileage + b_2 * Porche + b_3 * Jaguar.\] This is not exactly what the problem is asking for though. It wants us to see if there is also a difference in the slopes of the three fitted lines for the three car types. To do so, we need to incorporate interaction terms on the dummy variables of \[\hat{Price} = b_0 + b_1 * Mileage + b_2 * Porche + b_3 * Jaguar + b_4 Mileage*Jaguar + b_5 Mileage*Porche.\] In words
In symbols (with annotations)
Set \(\alpha\)It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here. Exploring the sample data
The scatterplot below shows the relationship between mileage, price, and car type.
Guess about statistical significanceIt seems that there is a difference in the intercepts of linear regression for the three car types since Porches tend to be above BMWs, which tend to be above Jaguars. BMWs and Jaguars are a bit more clustered together though. It’s hard to tell exactly whether the slopes will also be statistically significantly different when looking at just the scatterplot. We add the lines below:
Based on the plot, we might guess that at least one of the coefficients will be statistically different since the BMW line does appear to not be parallel with the others. Check conditionsRemember that in order to use the shortcut (formula-based, theoretical) approach, we need to check that some conditions are met.
Test statisticThe test statistics are random variables based on the sample data. Here, we want to look at a way to estimate the population coefficients \(\beta_i\). A good guess is the sample coefficients \(B_i\). Recall that these sample coefficients are actually random variables that will vary as different samples are (theoretically, would be) collected. We next look at our fitted regression coefficients from our sample of data:
We are looking to see how likely is it for us to have observed sample coefficients \(b_{i, obs}\) or more extreme assuming that the population coefficients are 0 (assuming the null hypothesis is true). If the conditions are met and assuming \(H_0\) is true, we can “standardize” this original test statistic of \(B_i\) into \(T\) statistics that follow a \(t\) distribution with degrees of freedom equal to \(df = n - k\) where \(k\) is the number of parameters in the model: \[ T =\dfrac{ B_i - 0}{ {SE}_i } \sim t (df = n - k) \] where \({SE}_i\) represents the standard deviation of the distribution of the sample coefficients. Observed test statisticWhile one could compute these observed test statistics by “hand”, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. We can use the
Interpretations of the coefficients here need to also incorporate in the other terms in the model. We will address a couple of the \(b_i\) value interpretations below:
Note that an interpretation of the observed intercept can also be done:
We should be a little cautious of this prediction though since there are no cars in our sample of used cars that have zero mileage. Compute \(p\)-valuesThe \(p\)-values correspond to the probability of observing a \(t_{90
- 6}\) value of \(b_{i, obs}\) or more extreme in our null distribution. We see that the We show below how we can obtain one of these \(p\)-values (for
State conclusionWe, therefore, have sufficient evidence to reject the null hypothesis for Cannon, Ann R., George W. Cobb, Bradley A. Hartlaub, Julie M. Legler, Robin H. Lock, Thomas L. Moore, Allan J. Rossman, and Jeffrey A. Witmer. 2013. STAT2 - Building Models for a World of Data. What is an example of multiple linear regression?You can use multiple linear regression when you want to know: How strong the relationship is between two or more independent variables and one dependent variable (e.g. how rainfall, temperature, and amount of fertilizer added affect crop growth).
How do you solve multiple linear regression problems?Multiple Linear Regression by Hand (Step-by-Step). Step 1: Calculate X12, X22, X1y, X2y and X1X2.. Step 2: Calculate Regression Sums. Next, make the following regression sum calculations: ... . Step 3: Calculate b0, b1, and b2. ... . Step 5: Place b0, b1, and b2 in the estimated linear regression equation.. What is a real life example of linear regression?Linear regressions can be used in business to evaluate trends and make estimates or forecasts. For example, if a company's sales have increased steadily every month for the past few years, by conducting a linear analysis on the sales data with monthly sales, the company could forecast sales in future months.
How do you calculate multiple regression by hand?y = mx1 + mx2+ mx3+ b. Y= the dependent variable of the regression.. M= slope of the regression.. X1=first independent variable of the regression.. The x2=second independent variable of the regression.. The x3=third independent variable of the regression.. B= constant.. |