by
Tim Cohn
USGS National Center MS:107
Reston, VA 22092
703/648-5711
B = (X'X)-1X'YS2 = (Y-XB)'(Y-XB)/(n-k)
where Y is an n x 1 vector of responses, X is an n?k matrix of explanatory
variables, B is a k x 1 vector of parameter estimates, and S2
is an unbiased estimate of the residual mean square error. OLS has several
advantages: it is easy to apply, and it leads to estimates whose properties
are well understood [see Draper and Smith, 1981]. OLS procedures are implemented
in MINITAB using the Command REGRESS.
However, in some cases environmental data are subject to censoring. That is, some of the observations are reported as "less than" an analytical detection limit. Statistical procedures for dealing with this situation have been addressed extensively in the statistical and economics literature. Perhaps the most widely accepted method is the Tobit estimator, named after the economist James Tobin. The Tobit estimator is simply a maximum likelihoodestimator. Its properties are well understood [see Chapter 18, Judge et al., 1980]. However, Cohn [1988], among others, has observed that the Tobit estimator can be substantially biased in some cases. While it can be proven that it is not possible to eliminate the bias entirely, one can easily derive an estimator which is unbiased to first order. This is called an Adjusted Maximum Likelihood Estimator, or AMLE. A simple FORTAN program, called AMLEREG.F77, has been written which implements both the Tobit estimator (MLE) and the AMLE.
Example 1: The Minitab Result
MTB > READ 'TEST4.DAT' C1-C6
100 ROWS READ
ROW C1 C2 C3 C4 C5 C6
| 1 | -0.08635 | -0.09670 | 0.17524 | 0.24622 | 1.50243 | 0 |
| 2 | 0.42794 | -0.14250 | -0.84265 | 0.76608 | -1.60559 | 0 |
| 3 | -0.87022 | -0.48002 | 1.87135 | -0.91082 | -0.54038 | 0 |
| 4 | 0.02228 | 0.02247 | -0.72753 | -1.08101 | -2.64420 | 0 |
MTB > REGRESS C5 4 C1-C4
The regression equation is
C5 = - 0.009 + 0.994 C1 + 0.914 C2 + 0.873 C3 + 1.06 C4
| Predictor | Coef | Stdev | t-ratio | p |
| Constant | -0.0085 | 0.1042 | -0.08 | 0.935 |
| C1 | 0.9937 | 0.1082 | 9.19 | 0.000 |
| C2 | 0.9142 | 0.1297 | 7.05 | 0.000 |
| C3 | 0.8729 | 0.1078 | 8.10 | 0.000 |
| C4 | 1.0611 | 0.1281 | 8.28 | 0.000 |
s = 1.034 R-sq = 74.7% R-sq(adj) = 73.6%
Analysis of Variance
SOURCE DF SS MS F p
Regression 4 299.229 74.807 69.94 0.000
Error 95 101.610 1.070
Total 99 400.839
Continue?
SOURCE DF SEQ SS
C1 1 124.059
C2 1 40.872
C3 1 60.911
C4 1 73.387
Unusual Observations
| Obs. | C1 | C5 | Fit | Stdev.Fit | Residual | St.Resid |
| 14 | 1.38 | 0.369 | 0.388 | 0.420 | -0.019 | -0.02X |
| 16 | 0.73 | 0.080 | 2.488 | 0.225 | -2.408 | -2.39R |
| 41 | 0.45 | 5.604 | 2.832 | 0.236 | 2.772 | 2.75R |
| 46 | 1.51 | 2.716 | 0.438 | 0.277 | 2.278 | 2.29R |
| 54 | 0.17 | 4.582 | 1.759 | 0.172 | 2.823 | 2.77R |
| 59 | -0.05 | 3.736 | 1.604 | 0.183 | 2.132 | 2.09R |
R denotes an obs. with a large st. resid.
X denotes an obs. whose X value gives it large influence.
Example 2: Tobit Results with Threshold Corresponding to 50% Censoring:
RVARES:
R TOBIT
TOBIT REGRESSION ANALYSIS PROGRAM USING
EITHER MLE OR ADJUSTED MLE ESTIMATORS
**** VERSION 90.09 ****
TIM COHN, SEPTEMBER 1990
ENTER THE INPUT FILE NAME (OR ?)
TEST4.DAT
ENTER NO. VARS.(<20) IN FILE
ENTER NO. OF EXPLANATORY VARIABLES IN MODEL
(NOT COUNTING A CONSTANT TERM)
ENTER THE COLUMN NO. OF PREDICTOR 1
ENTER THE COLUMN NO. OF PREDICTOR 2
ENTER THE COLUMN NO. OF PREDICTOR 3
ENTER THE COLUMN NO. OF PREDICTOR 4
IS THERE A CONSTANT IN THE MODEL? (Y/N)
ENTER THE COLUMN NO. OF RESPONSE VAR.
ENTER THE COLUMN NO. OF DET. LIMIT VAR.
NO. OBS. READ IN: 100
NUMBER OF COLUMNS: 6
FILE NAME: TEST4.DAT
MAXIMUM LIKELIHOOD ESTIMATES (TOBIT)
The regression equation is
C05 = -1.707E-01 + 1.162E+00*C01 + 9.432E-01*C02 + 9.222E-01*C03 +
1.133E+00*C04
| predictor | Coef | Stdev | -2*L-ratio | Approx-p | |
| Constant | -1.706510E-01 | 1.868682E-01 | 0.945 | 0.330940 | |
| Column | 1 | 1.162288E+00 | 1.848199E-01 | 43.203 | 0.000000 |
| Column | 2 | 9.432051E-01 | 1.756538E-01 | 27.898 | 0.000000 |
| Column | 3 | 9.222313E-01 | 1.560587E-01 | 32.678 | 0.000000 |
| Column | 4 | 1.133125E+00 | 1.828878E-01 | 36.061 | 0.000000 |
S = 1.111476E+00
LIKELIHOOD = 4.875195E-20
APPROX. DF: 36.4
ENTER 1 FOR AMLE ESTIMATES
ADJUSTED MAXIMUM LIKELIHOOD ESTIMATES
N.B. THESE ARE, AT PRESENT, EXPERIMENTAL
The regression equation is
C05 = -1.646E-01 + 1.157E+00*C01 + 9.392E-01*C02 + 9.227E-01*C03 + 1.131E+00*C04
| predictor | Coef | Stdev | -2*L-ratio | Approx-p | |
| Constant | -1.646218E-01 | 1.939814E-01 | 0.945 | 0.330940 | |
| Column | 1 | 1.156585E+00 | 1.918551E-01 | 43.203 | 0.000000 |
| Column | 2 | 9.391652E-01 | 1.823401E-01 | 27.898 | 0.000000 |
| Column | 3 | 9.226745E-01 | 1.619991E-01 | 32.678 | 0.000000 |
| Column | 4 | 1.130771E+00 | 1.898494E-01 | 36.061 | 0.000000 |
S = 1.153785E+00
LIKELIHOOD = 4.875195E-20
APPROX. DF: 36.4
**** STOP
Driver program and main subroutines
dhumsl subroutine
imslfake subroutines
tacit subroutines
Test Data Set
This page last modified on 01 February 2001
Please email comments or suggestions to Tim Cohn at: software@timcohn.com