* ch03ta08.sas; * example of Box-Cox transformation; options ls=75; ; data plasma; input age y; * y is plasma level; lines; 0 13.44 0 12.84 0 11.91 0 20.09 0 15.60 1 10.11 1 11.38 1 10.28 1 8.96 1 8.59 2 9.83 2 9.00 2 8.65 2 7.85 2 8.88 3 7.94 3 6.01 3 5.14 3 6.90 3 6.77 4 4.86 4 5.10 4 5.67 4 5.75 4 6.23 ; proc plot; plot y*age / vpos=22; title 'Modelling plasma level (y) as a function of age.'; ; * compute sample means and variances by age; proc sort; by age; title2 'Determine an appropriate Box-Cox transformation.'; proc means noprint mean var; var y; by age; output out=stats mean=mean var=var; * compute natural log of sample means and variances; data ln_stats; set stats; ln_mean=log(mean); ln_var=log(var); proc print; proc plot; plot ln_var*ln_mean / vpos=22; * obtain the LSE of slope as an estimate of q; proc reg; model ln_var = ln_mean; ; * transform the data, using q=3 based on prior results; data p2; set plasma; hy = 1/sqrt(y); proc plot; plot hy*age / vpos=22; title2 'Analysis of the transformed data.'; ; * try simple linear regression on the transformed observations, * and recheck the model assumptions; proc reg; model hy = age; output out=stats2 p=pred r=e; ; * create the normal score variable "nscore" for NP plots; proc rank normal=Blom; var e; ranks nscore; ; * generate residual plots; proc plot; plot e*age / vpos=17 vref=0; plot e*pred / vpos=17 vref=0; plot e*nscore / vpos=17 vref=0 href=0; Modelling plasma level (y) as a function of age. 1 Plot of y*age. Legend: A = 1 obs, B = 2 obs, etc. 20 + A | | | | | | A 15 + | | A y | A | A | A | 10 + B A | A A | A B | A A | A | A A | A B 5 + A B ---+-------------+-------------+-------------+-------------+-- 0 1 2 3 4 age Modelling plasma level (y) as a function of age. 2 Determine an appropriate Box-Cox transformation. Obs age _TYPE_ _FREQ_ mean var ln_mean ln_var 1 0 0 5 14.776 10.6661 2.69300 2.36707 2 1 0 5 9.864 1.2430 2.28889 0.21755 3 2 0 5 8.842 0.5059 2.17951 -0.68148 4 3 0 5 6.552 1.0957 1.87977 0.09137 5 4 0 5 5.522 0.2979 1.70874 -1.21110 Modelling plasma level (y) as a function of age. 3 Determine an appropriate Box-Cox transformation. Plot of ln_var*ln_mean. Legend: A = 1 obs, B = 2 obs, etc. ln_var | 3 + | | | A 2 + | | | 1 + | | | A 0 + A | | | A -1 + | A | | -2 + -+----------+----------+----------+----------+----------+----------+ 1.6 1.8 2.0 2.2 2.4 2.6 2.8 ln_mean Modelling plasma level (y) as a function of age. 4 Determine an appropriate Box-Cox transformation. The REG Procedure Dependent Variable: ln_var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -6.50326 2.25465 -2.88 0.0633 ln_mean 1 3.09767 1.03571 2.99 0.0581 Modelling plasma level (y) as a function of age. 5 Analysis of the transformed data. Plot of hy*age. Legend: A = 1 obs, B = 2 obs, etc. 0.5 + | | | A | A A | | A B 0.4 + A | B | hy | A A | A A | A B | B A 0.3 + A | B | A | A | | A | 0.2 + ---+-------------+-------------+-------------+-------------+-- 0 1 2 3 4 age Modelling plasma level (y) as a function of age. 6 Analysis of the transformed data. The REG Procedure Dependent Variable: hy Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 0.08025 0.08025 149.22 <.0001 Error 23 0.01237 0.00053778 Corrected Total 24 0.09262 Root MSE 0.02319 R-Square 0.8665 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 0.26803 0.00803 33.36 <.0001 age 1 0.04006 0.00328 12.22 <.0001 Modelling plasma level (y) as a function of age. 7 Analysis of the transformed data. Plot of e*age. Legend: A = 1 obs, B = 2 obs, etc. 0.050 + A | R | e | A s 0.025 + A A i | A A d | A A u | A B A a 0.000 +--------------------------------------------------------- l | A B A | A A B A | -0.025 + A | A A | | A -0.050 + ---+------------+------------+------------+------------+-- 0 1 2 3 4 age Modelling plasma level (y) as a function of age. 8 Analysis of the transformed data. Plot of e*pred. Legend: A = 1 obs, B = 2 obs, etc. 0.050 + A | R | e | A s 0.025 + A A i | A A d | A A u | A B A a 0.000 +--------------------------------------------------------- l | A B A | A A B A | -0.025 + A | A A | | A -0.050 + ---+------------+------------+------------+------------+-- 0.2680 0.3081 0.3481 0.3882 0.4283 Predicted Value of hy Modelling plasma level (y) as a function of age. 9 Analysis of the transformed data. Plot of e*nscore. Legend: A = 1 obs, B = 2 obs, etc. 0.050 + | A | | R | | e | | A s 0.025 + | A A i | | A A d | | AA u | |A AAA a 0.000 +----------------------------+---------------------------- l | AA AA | A A AA A | | | -0.025 + A | | A A | | | | A | -0.050 + | ---+------------+------------+------------+------------+-- -2 -1 0 1 2 Rank for Variable e