* STT 646, Section 6.9 example to illustrate * multiple regression using a first order model; * ch06eg.sas; options ls=76; ; filename data1 'ch06eg.dat'; ; data dwaine; infile data1; input x1 x2 y; * second-order regressors; x1x2 = x1*x2; x1x1=x1**2; x2x2=x2**2; * variables for testing LOF; A1=x1; A2=x2; ; * print the raw data, sorted by x1 and x2; proc sort; by x1 x2; proc print; title 'Example from Section 6.9, first-order model, 2 predictors'; title2 'The raw data.'; ; * Fit the first order regression model, printing various statistics * of interest, and creating an output data set for residual analysis; proc reg; model y = x1 x2 / alpha=0.05 clb covb cli clm p; *** r; * we get residuals. the option r would cause analysis of the residuals; output out=B p=yhat r=e; title2 'Regression analysis.'; ; * check for model Lack-Of-Fit; proc plot; plot e*x1 e*x2 e*x1x2 / vref=0 vpos=18; title2 'Check for model LOF.'; ; * attempt generic LOF test (but fail for lack of d.f. for pure error); proc glm; class A1 A2; model y = x1 x2 A1*A2; title2 'Failed attempt to do a generic LOF test.'; ; * test for LOF due to second-order terms; proc reg; model y = x1 x2 x1x2 x1x1 x2x2 / ss1; title2 'Testing for LOF due to missing second-order terms.'; ; * duplicate test for LOF due to second-order terms; proc glm; model y = x1 x2 x1*x2 x1*x1 x2*x2; title2 'Testing for LOF due to missing second-order terms.'; contrast 'LOF' x1*x2 1, x1*x1 1, x2*x2 1; ; * check the constant-variance assumption; proc plot; plot e*yhat / vref=0 vpos=20; title2 'Check the constant-variance assumption.'; ; * check the normality assumption; proc rank normal=Blom out=B; var e; ranks nscore; proc plot; plot e*nscore / vref=0 href=0 vpos=20; title2 'Check the normality assumption.'; ; proc corr nosimple; var e nscore; title2 'Sample correlation for testing normality -- see Table B.6.'; Example from Section 6.9, first-order model, 2 predictors 1 The raw data. 15:43 Tuesday, October 7, 2003 Obs x1 x2 y x1x2 x1x1 x2x2 A1 A2 1 38.4 16.0 137.2 614.40 1474.56 256.00 38.4 16.0 2 41.3 16.5 146.4 681.45 1705.69 272.25 41.3 16.5 3 42.9 15.8 145.3 677.82 1840.41 249.64 42.9 15.8 4 45.2 16.8 164.4 759.36 2043.04 282.24 45.2 16.8 5 46.9 17.3 181.6 811.37 2199.61 299.29 46.9 17.3 6 47.8 16.3 154.6 779.14 2284.84 265.69 47.8 16.3 7 48.9 16.6 145.4 811.74 2391.21 275.56 48.9 16.6 8 49.5 15.9 152.8 787.05 2450.25 252.81 49.5 15.9 9 51.7 16.3 144.0 842.71 2672.89 265.69 51.7 16.3 10 52.0 17.2 163.2 894.40 2704.00 295.84 52.0 17.2 11 52.3 16.0 166.5 836.80 2735.29 256.00 52.3 16.0 12 52.5 17.8 161.1 934.50 2756.25 316.84 52.5 17.8 13 53.1 17.7 . 939.87 2819.61 313.29 53.1 17.7 14 65.4 17.6 . 1151.04 4277.16 309.76 65.4 17.6 15 66.1 18.2 207.5 1203.02 4369.21 331.24 66.1 18.2 16 68.5 16.7 174.4 1143.95 4692.25 278.89 68.5 16.7 17 72.8 17.1 191.1 1244.88 5299.84 292.41 72.8 17.1 18 82.7 19.1 224.1 1579.57 6839.29 364.81 82.7 19.1 19 85.7 18.4 209.7 1576.88 7344.49 338.56 85.7 18.4 20 87.9 18.3 241.9 1608.57 7726.41 334.89 87.9 18.3 21 88.4 17.4 232.0 1538.16 7814.56 302.76 88.4 17.4 22 89.6 18.1 232.6 1621.76 8028.16 327.61 89.6 18.1 23 91.3 18.2 244.2 1661.66 8335.69 331.24 91.3 18.2 Example from Section 6.9, first-order model, 2 predictors 2 Regression analysis. 15:43 Tuesday, October 7, 2003 The REG Procedure Model: MODEL1 Dependent Variable: y Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 24015 12008 99.10 <.0001 Error 18 2180.92741 121.16263 Corrected Total 20 26196 Root MSE 11.00739 R-Square 0.9167 Dependent Mean 181.90476 Adj R-Sq 0.9075 Coeff Var 6.05118 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -68.85707 60.01695 -1.15 0.2663 x1 1 1.45456 0.21178 6.87 <.0001 x2 1 9.36550 4.06396 2.30 0.0333 Parameter Estimates Variable DF 95% Confidence Limits Intercept 1 -194.94801 57.23387 x1 1 1.00962 1.89950 x2 1 0.82744 17.90356 Covariance of Estimates Variable Intercept x1 x2 Intercept 3602.0346742 8.7459395806 -241.4229923 x1 8.7459395806 0.0448515096 -0.672442604 x2 -241.4229923 -0.672442604 16.515755794 Example from Section 6.9, first-order model, 2 predictors 3 Regression analysis. 15:43 Tuesday, October 7, 2003 The REG Procedure Model: MODEL1 Dependent Variable: y Output Statistics Dep Var Predicted Std Error Obs y Value Mean Predict 95% CL Mean 1 137.2000 136.8460 4.0074 128.4268 145.2653 2 146.4000 145.7470 3.7331 137.9041 153.5899 3 145.3000 141.5184 4.1735 132.7502 150.2866 4 164.4000 154.2294 3.5558 146.7591 161.6998 5 181.6000 161.3849 4.4300 152.0778 170.6921 6 154.6000 153.3285 3.2331 146.5361 160.1210 7 145.4000 157.7382 2.9628 151.5136 163.9628 8 152.8000 152.0551 4.1696 143.2952 160.8150 9 144.0000 159.0013 3.2529 152.1672 165.8354 10 163.2000 167.8666 3.3310 160.8684 174.8649 11 166.5000 157.0644 4.0792 148.4944 165.6344 12 161.1000 174.2132 5.0377 163.6294 184.7971 13 . 174.1494 4.5986 164.4881 183.8107 14 . 191.1039 2.7668 185.2911 196.9168 15 207.5000 197.7414 4.3786 188.5424 206.9404 16 174.4000 187.1841 3.8409 179.1146 195.2536 17 191.1000 197.1849 3.4109 190.0188 204.3510 18 224.1000 230.3161 5.8120 218.1054 242.5267 19 209.7000 228.1239 4.1214 219.4652 236.7826 20 241.9000 230.3874 4.2012 221.5610 239.2137 21 232.0000 222.6857 5.3808 211.3810 233.9904 22 232.6000 230.9870 4.4176 221.7059 240.2681 23 244.2000 234.3963 4.5882 224.7569 244.0358 Output Statistics Obs 95% CL Predict Residual 1 112.2354 161.4566 0.3540 2 121.3276 170.1664 0.6530 3 116.7863 166.2506 3.7816 4 129.9271 178.5317 10.1706 5 136.4566 186.3132 20.2151 6 129.2260 177.4311 1.2715 7 133.7895 181.6869 -12.3382 8 127.3259 176.7843 0.7449 9 134.8870 183.1157 -15.0013 10 143.7053 192.0280 -4.6666 11 132.4018 181.7270 9.4356 12 148.7807 199.6458 -13.1132 13 149.0867 199.2121 . 14 167.2589 214.9490 . 15 172.8533 222.6295 9.7586 16 162.6910 211.6772 -12.7841 17 172.9744 221.3954 -6.0849 18 204.1647 256.4675 -6.2161 Example from Section 6.9, first-order model, 2 predictors 4 Regression analysis. 15:43 Tuesday, October 7, 2003 The REG Procedure Model: MODEL1 Dependent Variable: y Output Statistics Obs 95% CL Predict Residual 19 203.4304 252.8174 -18.4239 20 205.6346 255.1402 11.5126 21 196.9448 248.4266 9.3143 22 206.0684 255.9056 1.6130 23 209.3421 259.4506 9.8037 Sum of Residuals 0 Sum of Squared Residuals 2180.92741 Predicted Residual SS (PRESS) 3002.92331 Example from Section 6.9, first-order model, 2 predictors 5 Check for model LOF. 15:43 Tuesday, October 7, 2003 Plot of e*x1. Legend: A = 1 obs, B = 2 obs, etc. | 20 + A | | | A R | A A A A A e | s | A i | A A d 0 +----------A-A-------A----------------------------------------------- u | a | A A A l | | | A A A | A | A -20 + ---+--------+--------+--------+--------+--------+--------+--------+-- 30 40 50 60 70 80 90 100 x1 NOTE: 2 obs had missing values. Example from Section 6.9, first-order model, 2 predictors 6 Check for model LOF. 15:43 Tuesday, October 7, 2003 Plot of e*x2. Legend: A = 1 obs, B = 2 obs, etc. | 20 + A | | | A R | A A A B e | s | A i | A A d 0 +-------------AA------A----------------------------------------------- u | a | A A A l | | | AA A | A | A -20 + --+------------+------------+------------+------------+------------+-- 15 16 17 18 19 20 x2 NOTE: 2 obs had missing values. Example from Section 6.9, first-order model, 2 predictors 7 Check for model LOF. 15:43 Tuesday, October 7, 2003 Plot of e*x1x2. Legend: A = 1 obs, B = 2 obs, etc. | 20 + A | | | A R | A A A A A e | s | A i | A A d 0 +--A--A-----A--------------------------------------------------------- u | a | A A A l | | | A A A | A | A -20 + --+----------+----------+----------+----------+----------+----------+- 600 800 1000 1200 1400 1600 1800 x1x2 NOTE: 2 obs had missing values. Example from Section 6.9, first-order model, 2 predictors 8 Failed attempt to do a generic LOF test. 15:43 Tuesday, October 7, 2003 The GLM Procedure Class Level Information Class Levels Values A1 23 38.4 41.3 42.9 45.2 46.9 47.8 48.9 49.5 51.7 52 52.3 52.5 53.1 65.4 66.1 68.5 72.8 82.7 85.7 87.9 88.4 89.6 91.3 A2 20 15.8 15.9 16 16.3 16.5 16.6 16.7 16.8 17.1 17.2 17.3 17.4 17.6 17.7 17.8 18.1 18.2 18.3 18.4 19.1 Number of observations 23 NOTE: Due to missing values, only 21 observations can be used in this analysis. Example from Section 6.9, first-order model, 2 predictors 9 Failed attempt to do a generic LOF test. 15:43 Tuesday, October 7, 2003 The GLM Procedure Dependent Variable: y Sum of Source DF Squares Mean Square F Value Pr > F Model 20 26196.20952 1309.81048 . . Error 0 0.00000 . Corrected Total 20 26196.20952 R-Square Coeff Var Root MSE y Mean 1.000000 . . 181.9048 Source DF Type I SS Mean Square F Value Pr > F x1 1 23371.80630 23371.80630 . . x2 1 643.47581 643.47581 . . A1*A2 18 2180.92741 121.16263 . . Source DF Type III SS Mean Square F Value Pr > F x1 0 0.000000 . . . x2 0 0.000000 . . . A1*A2 18 2180.927411 121.162634 . . Example from Section 6.9, first-order model, 2 predictors 10 Testing for LOF due to missing second-order terms. 15:43 Tuesday, October 7, 2003 The REG Procedure Model: MODEL1 Dependent Variable: y Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 5 24450 4890.03845 42.01 <.0001 Error 15 1746.01729 116.40115 Corrected Total 20 26196 Root MSE 10.78894 R-Square 0.9333 Dependent Mean 181.90476 Adj R-Sq 0.9111 Coeff Var 5.93109 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Type I SS Intercept 1 1255.19625 1788.35038 0.70 0.4935 694876 x1 1 9.82840 9.84668 1.00 0.3340 23372 x2 1 -175.34203 239.25906 -0.73 0.4749 643.47581 x1x2 1 -0.81196 0.73130 -1.11 0.2844 8.43363 x1x1 1 0.04329 0.02552 1.70 0.1104 344.43998 x2x2 1 6.84045 8.14816 0.84 0.4144 82.03651 Example from Section 6.9, first-order model, 2 predictors 11 Testing for LOF due to missing second-order terms. 15:43 Tuesday, October 7, 2003 The GLM Procedure Number of observations 23 NOTE: Due to missing values, only 21 observations can be used in this analysis. Example from Section 6.9, first-order model, 2 predictors 12 Testing for LOF due to missing second-order terms. 15:43 Tuesday, October 7, 2003 The GLM Procedure Dependent Variable: y Sum of Source DF Squares Mean Square F Value Pr > F Model 5 24450.19223 4890.03845 42.01 <.0001 Error 15 1746.01729 116.40115 Corrected Total 20 26196.20952 R-Square Coeff Var Root MSE y Mean 0.933348 5.931091 10.78894 181.9048 Source DF Type I SS Mean Square F Value Pr > F x1 1 23371.80630 23371.80630 200.79 <.0001 x2 1 643.47581 643.47581 5.53 0.0328 x1*x2 1 8.43363 8.43363 0.07 0.7915 x1*x1 1 344.43998 344.43998 2.96 0.1060 x2*x2 1 82.03651 82.03651 0.70 0.4144 Source DF Type III SS Mean Square F Value Pr > F x1 1 115.9695479 115.9695479 1.00 0.3340 x2 1 62.5161970 62.5161970 0.54 0.4749 x1*x2 1 143.4952938 143.4952938 1.23 0.2844 x1*x1 1 335.1044771 335.1044771 2.88 0.1104 x2*x2 1 82.0365082 82.0365082 0.70 0.4144 Contrast DF Contrast SS Mean Square F Value Pr > F LOF 3 434.9101168 144.9700389 1.25 0.3283 Standard Parameter Estimate Error t Value Pr > |t| Intercept 1255.196246 1788.350378 0.70 0.4935 x1 9.828403 9.846675 1.00 0.3340 x2 -175.342031 239.259063 -0.73 0.4749 x1*x2 -0.811962 0.731300 -1.11 0.2844 x1*x1 0.043293 0.025516 1.70 0.1104 x2*x2 6.840448 8.148161 0.84 0.4144 Example from Section 6.9, first-order model, 2 predictors 13 Check the constant-variance assumption. 15:43 Tuesday, October 7, 2003 Plot of e*yhat. Legend: A = 1 obs, B = 2 obs, etc. | 20 + A | | | | A A R | A A A A e | s | A i | A A d 0 +----------A----A---A------------------------------------------------- u | a | A l | A A | | | A A A | A | A -20 + --+----------+----------+----------+----------+----------+----------+- 120 140 160 180 200 220 240 Predicted Value of y NOTE: 2 obs had missing values. Example from Section 6.9, first-order model, 2 predictors 14 Check the normality assumption. 15:43 Tuesday, October 7, 2003 Plot of e*nscore. Legend: A = 1 obs, B = 2 obs, etc. | | 20 + | A | | | | | | | | A A R | | A A A A e | | s | | A i | | AA d 0 +---------------------------AA-A------------------------------ u | | a | A | l | A A | | | | | | A A A | | A | | A | -20 + | ---+-------------+-------------+-------------+-------------+-- -2 -1 0 1 2 Rank for Variable e NOTE: 2 obs had missing values. Example from Section 6.9, first-order model, 2 predictors 15 Sample correlation for testing normality -- see Table B.6. 15:43 Tuesday, October 7, 2003 The CORR Procedure 2 Variables: e nscore Pearson Correlation Coefficients, N = 21 Prob > |r| under H0: Rho=0 e nscore e 1.00000 0.97993 Residual <.0001 nscore 0.97993 1.00000 Rank for Variable e <.0001