Faraway,.Julian.J.. .Practical.Regression.and.Anova.using.R.[sharethefiles.com]

[ Pobierz całość w formacie PDF ]

of the response at the join.
We might question which fit is preferable in this particular instance. For the highpop15countries, we
see that the imposition of continuity causes a change in sign for the slope of the fit. We might argue that
the two groups of countries are so different and that there are so few countries in the middle region, that we
might not want to impose continuity at all.
We can have more than one knotpoint simply by defining more pairs of basis functions with different
knotpoints. Broken stick regression is sometimes called segmented regression. Allowing the knotpoints to
be parameters is worth considering but this will result in a nonlinear model.
8.2.2 Polynomials
Another way of generalizing the X� part of the model is to add polynomial terms. In the one-predictor case,
we have
y �0 �1x �dxd �
which allows for a more flexible relationship although we usually don t believe it exactly represents any
underlying reality.
There are two ways to choose d:
1. Keep adding terms until the added term is not statistically significant.
2. Start with a large d eliminate not statistically significant terms starting with the highest order term.
Warning: Do not eliminate lower order terms from the model even if they are not statistically significant.
An additive change in scale would change the t-statistic of all but the highest order term. We would not want
the conclusions of our study to be so brittle to such changes in the scale which ought to be inconsequential.
Let s see if we can use polynomial regression on theddpivariable in the savings data. First fit a linear
model:
> summary(lm(sr � ddpi,savings))
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.883 1.011 7.80 4.5e-10
ddpi 0.476 0.215 2.22 0.031
Residual standard error: 4.31 on 48 degrees of freedom
Multiple R-Squared: 0.0929, Adjusted R-squared: 0.074
F-statistic: 4.92 on 1 and 48 degrees of freedom, p-value: 0.0314
p-value ofddpiis significant so move on to a quadratic term:
> summary(lm(sr � ddpi+I(ddpi�2),savings))
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.1304 1.4347 3.58 0.00082
8.2. TRANSFORMING THE PREDICTORS 94
ddpi 1.7575 0.5377 3.27 0.00203
I(ddpi�2) -0.0930 0.0361 -2.57 0.01326
Residual standard error: 4.08 on 47 degrees of freedom
Multiple R-Squared: 0.205, Adjusted R-squared: 0.171
F-statistic: 6.06 on 2 and 47 degrees of freedom, p-value: 0.00456
Again the p-value ofddpi2 is significant so move on to a cubic term:
> summary(lm(sr � ddpi+I(ddpi�2)+I(ddpi�3),savings))
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.145360 2.198606 2.34 0.024
ddpi 1.746017 1.380455 1.26 0.212
I(ddpi�2) -0.090967 0.225598 -0.40 0.689
I(ddpi�3) -0.000085 0.009374 -0.01 0.993
Residual standard error: 4.12 on 46 degrees of freedom
Multiple R-Squared: 0.205, Adjusted R-squared: 0.153
F-statistic: 3.95 on 3 and 46 degrees of freedom, p-value: 0.0137
p-value of ddpi3 is not significant so stick with the quadratic. What do you notice about the other
p-values? Why do we find a quadratic model when the previous analysis on transforming predictors found
that theddpivariable did not need transformation? Check that starting from a large model (including the
fourth power) and working downwards gives the same result.
To illustrate the point about the significance of lower order terms, suppose we transform ddpi by
subtracting 10 and refit the quadratic model:
> savings
> summary(lm(sr � mddpi+I(mddpi�2),savings))
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 13.4070 1.4240 9.41 2.2e-12
mddpi -0.1022 0.3027 -0.34 0.737
I(mddpi�2) -0.0930 0.0361 -2.57 0.013
Residual standard error: 4.08 on 47 degrees of freedom
Multiple R-Squared: 0.205, Adjusted R-squared: 0.171
F-statistic: 6.06 on 2 and 47 degrees of freedom, p-value: 0.00456
We see that the quadratic term remains unchanged but the linear term is now insignificant. Since there is
often no necessary importance to zero on a scale of measurement, there is no good reason to remove the
linear term in this model but not in the previous version. No advantage would be gained.
You have to refit the model each time a term is removed and for large d there can be problem with
numerical stability. Orthogonal polynomials get round this problem by defining
z1 a1 b1x
z2 a2 b2x c2x2
z3 a3 b3x c3x2 d3x3
8.3. REGRESSION SPLINES 95
etc. where the coefficients a b c are chosen so that zT zj 0 when i j. The z are called orthogonal
i
polynomials. The value of orthogonal polynomials has declined with advances in computing speeds al-
though they are still worth knowing about because of their numerical stability and ease of use. Thepoly()
function constructs Orthogonal polynomials.
> g
> summary(g)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.6710 0.5846 16.54
poly(ddpi, 4)1 9.5590 4.1338 2.31 0.025
poly(ddpi, 4)2 -10.4999 4.1338 -2.54 0.015
poly(ddpi, 4)3 -0.0374 4.1338 -0.01 0.993
poly(ddpi, 4)4 3.6120 4.1338 0.87 0.387
Residual standard error: 4.13 on 45 degrees of freedom
Multiple R-Squared: 0.218, Adjusted R-squared: 0.149
F-statistic: 3.14 on 4 and 45 degrees of freedom, p-value: 0.0232
Can you see how we come to the same conclusion as above with just this summary? We can verify the
orthogonality of the design matrix when using orthogonal polynomials:
> x
> dimnames(x)
> round(t(x) %*% x,3)
Int power1 power2 power3 power4
Int 50 0 0 0 0
power1 0 1 0 0 0
power2 0 0 1 0 0
power3 0 0 0 1 0
power4 0 0 0 0 1
You can have more than two predictors as can be seen in this response surface model:
y �0 �1x1 �2x2 �11x2 �22x2 �12x1x2
1 2
8.3 Regression Splines
Polynomials have the advantage of smoothness but the disadvantage that each data point affects the fit
globally. This is because the power functions used for the polynomials take non-zero values across the
whole range of the predictor. In contrast, the broken stick regression method localizes the influence of each
data point to its particular segment which is good but we do not have the same smoothness as with the
polynomials. There is a way we can combine the beneficial aspects of both these methods smoothness
and local influence by using B-spline basis functions.
We may define a cubic B-spline basis on the interval a b by the following requirements on the interior
basis functions with knot-points at t1 tk.
1. A given basis function is non-zero on interval defined by four successive knots and zero elsewhere.
This property ensures the local influence property.
8.3. REGRESSION SPLINES 96
2. The basis function is a cubic polynomial for each sub-interval between successive knots
3. The basis function is continuous and continuous in its first and second derivatives at each knot point.
This property ensures the smoothness of the fit.
4. The basis function integrates to one over its support
The basis functions at the ends of the interval are defined a little differently to ensure continuity in
derivatives at the edge of the interval. A full definition of B-splines and more details about their properties
may be found in A practical guide to splines by Carl De Boor.
Let s see how the competing methods do on a constructed example. Suppose we know the true model is
2
y sin3 2�x3 � � N 0 0 1
The advantage of using simulated data is that we can see how close our methods come to the truth. We
generate the data and display it in Figure 8.3.
> funky
> x
> y [ Pobierz całość w formacie PDF ]

Linki