R version 2.5.1 (2007-06-27) Copyright (C) 2007 The R Foundation for Statistical Computing ISBN 3-900051-07-0 R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > # I use a program called R for tasks like this. You can download it for > # free from http://cran.r-project.org/ > # Input to the program is given here; lines beginning with # are comments. > # > # I first exported the data from Excell into a CSV file, and removed the > # quotes. This split some fields into two, fortunately in a regular manner. > # So my file now contains five fields that I will ignore, followed by months > # and miles. The function scan will read in the data, with a first argument > # my file name, the second argument a sampling of the data types for the > # fields, and the third the field seperator. The syntax <- is an assignment > # operator; data sets and the output of model fits are all assigned to > # objects. The data are put into the data set rr; variables within a > # data set are accessed by data set name, $, and the variable name; hence > # rr$monthssq is the months squared. > # I then call the regression > # fitter. R does this in two steps; the first fits the model, and the > # second formats the results for easy viewing. The fitter is lm; first > # argument to lm is formula, the second is the data source, and the third > # picks out the first three elements. The formula has the syntax first > # response variable, then a tilde, then the explanatory variables. The > # -1 removes the intercept. > # Here's code to do the intercept-set-to-zero fit: > rr<-as.data.frame( + scan("UnionPacificDec3.csv",what=list(a="",b="",c="",d="",e="", + months=0,miles=0), sep=",")) > rr$monthssq<-rr$months^2 > cat("Model with 0 intercept, linear in months, for first three ponts\n") Model with 0 intercept, linear in months, for first three ponts > summary(lma<-lm(miles~-1+months, data = rr, subset = 1:3)) Call: lm(formula = miles ~ -1 + months, data = rr, subset = 1:3) Residuals: 1 2 3 -1.332e-14 -1.217e+00 6.421e-01 Coefficients: Estimate Std. Error t value Pr(>|t|) months 8.45906 0.07966 106.2 8.87e-05 *** --- Signif. codes: 0 ƒà±***ƒà² 0.001 ƒà±**ƒà² 0.01 ƒà±*ƒà² 0.05 ƒà±.ƒà² 0.1 ƒà± ƒà² 1 Residual standard error: 0.9728 on 2 degrees of freedom Multiple R-Squared: 0.9998, Adjusted R-squared: 0.9997 F-statistic: 1.128e+04 on 1 and 2 DF, p-value: 8.867e-05 > cat("Model with 0 intercept, quadratic in months, for next three portions\n") Model with 0 intercept, quadratic in months, for next three portions > summary(lmb<-lm(miles~months+monthssq, data = rr, subset = 3:7)) Call: lm(formula = miles ~ months + monthssq, data = rr, subset = 3:7) Residuals: 3 4 5 6 7 -6.8536 11.8011 -12.2062 7.5404 -0.2817 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -553.2594 110.4730 -5.008 0.0376 * months 79.4179 13.9553 5.691 0.0295 * monthssq -1.7627 0.4165 -4.232 0.0515 . --- Signif. codes: 0 ƒà±***ƒà² 0.001 ƒà±**ƒà² 0.01 ƒà±*ƒà² 0.05 ƒà±.ƒà² 0.1 ƒà± ƒà² 1 Residual standard error: 14 on 2 degrees of freedom Multiple R-Squared: 0.9904, Adjusted R-squared: 0.9807 F-statistic: 102.7 on 2 and 2 DF, p-value: 0.009644 > summary(lmc<-lm(miles~months+monthssq, data = rr, subset = 7:12)) Call: lm(formula = miles ~ months + monthssq, data = rr, subset = 7:12) Residuals: 7 8 9 10 11 12 -1.449 -3.829 6.328 5.137 -16.077 9.891 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1475.8547 281.5272 -5.242 0.01351 * months 121.4414 20.0489 6.057 0.00903 ** monthssq -1.8057 0.3485 -5.182 0.01395 * --- Signif. codes: 0 ƒà±***ƒà² 0.001 ƒà±**ƒà² 0.01 ƒà±*ƒà² 0.05 ƒà±.ƒà² 0.1 ƒà± ƒà² 1 Residual standard error: 12.1 on 3 degrees of freedom Multiple R-Squared: 0.9901, Adjusted R-squared: 0.9835 F-statistic: 150.2 on 2 and 3 DF, p-value: 0.0009835 > summary(lmd<-lm(miles~months+monthssq, data = rr, subset = 12:17)) Call: lm(formula = miles ~ months + monthssq, data = rr, subset = 12:17) Residuals: 12 13 14 15 16 17 0.04727 0.46150 -2.72684 6.14177 -5.22139 1.29769 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -7479.3259 408.6379 -18.30 0.000356 *** months 373.5847 20.9013 17.87 0.000382 *** monthssq -4.0934 0.2658 -15.40 0.000595 *** --- Signif. codes: 0 ƒà±***ƒà² 0.001 ƒà±**ƒà² 0.01 ƒà±*ƒà² 0.05 ƒà±.ƒà² 0.1 ƒà± ƒà² 1 Residual standard error: 4.977 on 3 degrees of freedom Multiple R-Squared: 0.9995, Adjusted R-squared: 0.9992 F-statistic: 3318 on 2 and 3 DF, p-value: 9.604e-06 > # Here's my suggestion about seasonality terms using sines and cosines. > # -1 after cosine forces it to be zero at 0 months. > rr$smonths<-sin(rr$months/12) > rr$cmonths<-cos(rr$months/12)-1 > lm0<-lm(miles~-1+months+monthssq,data=rr) > lmout<-lm(miles~-1+months+monthssq+smonths+cmonths, data = rr) > cat("Model with 0 intercept, quadratic in months, with seasonal effects\n") Model with 0 intercept, quadratic in months, with seasonal effects > summary(lmout) Call: lm(formula = miles ~ -1 + months + monthssq + smonths + cmonths, data = rr) Residuals: Min 1Q Median 3Q Max -74.2769 -28.4740 0.2435 28.0400 58.1315 Coefficients: Estimate Std. Error t value Pr(>|t|) months -51.5789 29.1761 -1.768 0.1005 monthssq 1.8885 0.6819 2.770 0.0159 * smonths 594.9099 258.1877 2.304 0.0384 * cmonths -4.8753 81.1076 -0.060 0.9530 --- Signif. codes: 0 ƒà±***ƒà² 0.001 ƒà±**ƒà² 0.01 ƒà±*ƒà² 0.05 ƒà±.ƒà² 0.1 ƒà± ƒà² 1 Residual standard error: 41.28 on 13 degrees of freedom Multiple R-Squared: 0.9961, Adjusted R-squared: 0.9949 F-statistic: 832.4 on 4 and 13 DF, p-value: 1.613e-15 > pdf("railroad.pdf") > plot(rr$months,rr$miles,main="Data",xlab="Months",ylab="Miles") > lines(rr$months,fitted(lmout)) > detrend<-list(months=rr$months,miles=lm0$residuals) > plot(detrend$months,detrend$miles, + main="Residuals from quadratic fit, to show the potential for a trig. seasonal effect",xlab="Months",ylab="Miles", + sub="Note poor performance during the final winter") >