Abstracts of Papers by John Kolassa
This thesis addresses a variety of problems involved in series approximations
for distribution functions, including modifying formal Edgeworth and
saddlepoint expansions to apply to the case of lattice random variables.
I demonstrate that the classical Edgeworth series provides
an asymptotic expansion for the cumulative distribution function of a
lattice random variable if the cumulants in this series are adjusted
using Sheppard's corrections, and the series is evaluated at midpoints
of the intervals between lattice points. This result is motivated
using a smoothing argument, and proven rigorously.
These results for the Edgeworth series are extended to the multivariate case.
Again, I demonstrate that the classical Edgeworth series can be
modified to provide an asymptotic expansion for the cumulative distribution
function of a lattice random variable. In this case, the multivariate
cumulants must be adjusted by the multivariate version of Sheppard's
corrections, and the series must be evaluated at the midpoints of
hypercubes whose vertices are lattice points.
These results also extend the saddlepoint approximation in the
univariate case. The thesis demonstrates that regular saddlepoint
methods for distribution functions can be used in the case of lattice
variables if the cumulant generating function involved in this
approximation is modified by adding the generating function for
Sheppard's corrections.
Approximations for non-rectangular sets, in particular, those arising in the
likelihood ratio tests, are discussed. Connections with Bartlett's corrections
are also explored. Numeric results indicate that using this
correction and a continuity correction improve the accuracy of the
Chi square approximation to the distribution of the log likelihood ratio
statistic.
This thesis also discusses a modification to the saddlepoint approximation
that requires only the Legendre transform of an approximating cumulant
generating
function, for use when the exact Legendre transform is difficult to compute.
An expansion is derived which, while not behaving as well as the exact
saddlepoint approximation, performs better than the Edgeworth approximation.
These methods are applied to the problem of calculating moments of the
response in a logistic random effects model.
This paper investigates the use of Edgeworth expansions for
approximating the distribution function of the normalized sum
of n independent and identically distributed lattice-valued
random variables. We prove that the continuity--corrected
Edgeworth series, using Sheppard--adjusted cumulants, is
accurate to the same order in n as the usual Edgeworth
approximation for continuous random variables, Finally,
as a partial justification for the Sheppard adjustments,
it is shown that if a continuous random variable Y is rounded
to a discrete part D and a truncation error U, such that Y=D+U,
then under suitable limiting conditions the truncation error
is approximately uniformly distributed and independent of Y, but
not independent of D.
Saddlepoint Approximations have long been used to approximate
densities and distribution functions of random variables with
known cumulant generating function defined on an open interval about
the origin. This approximation has very desirable asymptotic properties
when approximating densities and tail probabilities for sums of random
variables, and also often performs remarkably well for small sample
sizes, including samples of one.
Calculating the saddlepoint approximation requires calculating
the Legendre transform of the log of the cumulant generating function.
In some cases this cumulant generating function may be unavailable; in other
cases the Legendre transform is difficult to calculate analytically.
This paper discusses modifications to the saddlepoint approximation necessary
when the cumulant generating function is replaced by a similar but more
tractable function whose Legendre transform can be given explicitly.
Calculations for the logistic distribution are presented to illustrate the
case of a known but intractable cumulant generating function, and an
example involving an overdispersed binomial model is presented to
illustrate the case of an unavailable cumulant generating function.
An application to a random effects logistic linear model is discussed.
Phase equilibrium experiments using the same reactants and
run at different temperatures and pressures give rise to linear constraints
on thermodynamic variables, such as entropies, enthalpies, and volume
changes of reaction.
Linear programming has been applied to this problem to determine maxima and
minima for these parameters separately.
Since these temperatures and pressures are measured or set with uncertainty,
their nominal values as reported by the experimenter need not define the
same feasible region as that defined by their true values.
Clearly, then, the ranges resulting from the linear programming procedure
will not represent the reasonable ranges of parameter values.
This paper explores the problem of inference on these parameters when a
normal error distribution is postulated for temperature and pressure.
Current methodologies and their relation to the field of statistical inference
are discussed.
Two algorithms for generating bounds on the entropies and enthalpies are
presented.
The first uses a normal approximation to the location of feasible region
vertices, and makes allowances for variations in the constraints contributing
to this vertex to construct confidence intervals for the extrema in random
linear programs, to generate bounds holding for one parameter at a time.
The second generates bounds holding simultaneously for all parameters.
A comparison of these confidence intervals with those generated by
conventional methods demonstrates that estimates of thermodynamic variables
have far less variability than commonly believed.
This paper analyzes data from a study conducted by the United States
Office of Naval Research on the effects of pulsed magnetic fields on chick
embryos. The experiment involved incubation of eggs under carefully
controlled conditions in six different laboratories. The original analysis
included inappropriate statistical methodology for analyzing the experimental
results. Since the conclusions from this study rest so heavily on the results
of statistical analyses, choosing the proper methodology is imperative. The
major aim of this paper then is to introduce more appropriate analytic tools
and illustrate their use in the present context. Qualitatively our results
agree with those of the original analysis; our findings about interactions
between effects, however, makes interpretation of these effects more subtle.
We apply linear logistic modeling to counts of damaged embryos, using as
covariates factors corresponding to exposure, laboratory, incubator, run, and
measurements of background radiation. This facilitates estimation of the size
of the effects. The effects of laboratory, incubator, and run are explored
both as fixed and random effects. We find statistically significant exposure
and laboratory effects, in accordance with the original study. However, we
also find that the inter-laboratory variation in exposure effect is at least
as large as the exposure effect itself. The presence of such effects
fundamentally alters the interpretation of the fitted model, as is
graphically presented.
Paper had no abstract.
Book had no abstract.
OBJECTIVE: To determine the effects of carbamazepine versus placebo on
ratings of behavior in agitated nursing home patients with dementia.
DESIGN: Nonrandomized, placebo-controlled, crossover trial conducted in 25
patients in two nursing homes. INTERVENTION: Carbamazepine and placebo were
administered during two 5-week periods separated by a 2-week washout. The
carbamazepine dose was determined for each patient by a nonblinded physician
who did not participate in ratings (modal dose 300 mg/day). MEASUREMENTS:
The primary outcome measures were Brief Psychiatric Rating Scale scores and
Clinical Global Impression of Change, rated by blind observers. Secondary
measures of behavior, adversity, cognition, and functional status were also
included. MAIN RESULTS: Median total Brief Psychiatric Rating Scale score
decreased 7 points on carbamazepine versus 3 on placebo (P = 0.03). Sixteen
subjects were rated as improved globally on carbamazepine versus four on
placebo (P = 0.001). Secondary measures of behavior showed similar changes at
significant or suggestive (P < 0.10) levels. One subject developed
carbamazepine-induced tics, and one died with a pneumonia. There was minimal
other adversity. CONCLUSION: This preliminary study suggests that
carbamazepine in low doses can reduce agitated behaviors in some patients,
with limited adversity resulting. Further research is required to confirm and
extend this finding before it can be considered routine clinical practice.
This article presents the Gibbs--Skovgaard algorithm for approximate
frequentist inference. The method makes use of the double saddlepoint
approximation of Skovgaard to
the conditional cumulative distribution function of a sufficient statistic
given the remaining sufficient statistics.
This approximation is then used
in the Gibbs sampler to generate a Markov chain.
The equilibrium distribution of this chain approximates the joint
distribution of the sufficient
statistics associated with the parameters of interest conditional on
the observed values of the sufficient statistics
associated with the nuisance parameters.
This Gibbs--Skovgaard algorithm is applied to the cases of logistic and Poisson
regression.
This paper compares the accuracy of approximations to test size and power
based on the score and the Fisher information, with
calculations using Edgeworth and Cornish--Fisher approximations, and
demonstrates that in some cases one must exercise much care in using
the simpler approximations.
Hettmansperger (1984) quotes a
result showing that the distribution function of the Wilcoxon signed rank
statistic is approximated by the usual Edgeworth series using the first four
cumulants, to o(1/n).
In light of standard Edgeworth series results for random variables confined
to a lattice, this result is counterintuitive. One expects correction terms
to be necessary because of the lattice nature of the Wilcoxon statistic.
This paper explains this apparent paradox, provides an alternative proof
relying on basic Edgeworth series results, and provides a sharper result.
Interesting features in this problem highlighting limitations of expansions
for random variables on a lattice are discussed.
Kolassa and Tanner (1994) present the Gibbs--Skovgaard algorithm for
approximate conditional inference.
This algorithm makes use of the double saddlepoint approximation to the
conditional distribution function of a sufficient statistic given the
remaining sufficient statistics.
This approximation is used with the Gibbs Sampler to generate a Markov
chain. The equilibrium distribution of this chain approximates the joint
distribution of the sufficient statistics associated with the parameters of
interest conditional on the observed values of the sufficient statistics
associated with the nuisance parameters.
In this paper issues relating to existence of an equilibrium
distribution and the quality of the resulting approximation to the
desired distribution are discussed. This algorithm is applied to
inference for non--linear regression parameters.
Abstract not available.
This paper presents methods for calculating Monte Carlo estimates of sizes
of conditional tests in hierarchical models for multiway contingency
tables. It extends the results of Kolassa and Tanner (1994),
and presents an alternate parameterization for higher--way tables to
facilitate sampling.
This paper derives higher order terms in the double-saddlepoint expansion
of Skovgaard (1987) for
a unidimensional conditional cumulative distribution function.
Expansions for continuous and lattice random variables are derived.
Results are applied to the sufficient statistic in logistic regression.
This article presents an algorithm for approximate frequentist
inference. The method makes use of the double saddlepoint approximation
of Skovgaard (1987) to the conditional CDF of a sufficient
statistic given the remaining sufficient statistics. This approximation is
then used iteratively in conjunction with MC methods to generate
a sample from a distribution that approximates the joint distribution of
the sufficient statistics associated with the parameters of interest
conditional on the observed values of the sufficient statistics associated
with the nuisance parameters. This algorithm is an alternate approach to
that presented by Kolassa and Tanner (1994), in which the Gibbs
sampler was used in conjunction with these univariate conditional
distribution function approximations.
An example involving logistic regression is presented.
This paper has no abstract.
OBJECTIVE: To evaluate the incidence and impact of rhinovirus and coronavirus
infections in older persons attending daycare. DESIGN: Prospective
descriptive study. SETTING: Three senior daycare centers in Rochester, New
York. PATIENTS: Frail older persons and staff members of the daycare centers
who developed signs or symptoms of an acute respiratory illness.
MEASUREMENTS: Demographic, medical, and physical findings were recorded on
subjects at baseline and during respiratory illness. Nasopharyngeal specimens
for viral culture as well as acute and convalescent sera for coronavirus 229E
enzyme immunoassay (EIA) were obtained for all illnesses. RESULTS: During the
44 months of study, 352 older persons experienced 522 illnesses. Thirty-five
(7%) of 522 cultures were positive for rhinovirus and 37 (8%) of 451
serologies were positive for coronavirus 229E infection. The clinical
syndromes associated with rhinovirus and coronavirus infection were similar
and characterized by nasal congestion, cough, and constitutional symptoms.
No patient died or was hospitalized, but approximately 50% had evidence of
lower respiratory tract involvement. The average illness lasted 14 days.
During the same period, 113 staff developed 338 respiratory illnesses. Eight
percent were identified as coronavirus and 9% as rhinovirus. Cough, sputum
production, and constitutional symptoms were significantly more common among
older persons. CONCLUSIONS: Rhinovirus and coronavirus 229E are common causes
of moderately debilitating acute respiratory illnesses among older persons
attending daycare.
This paper discusses recovery of information regarding logistic regression
parameters in cases when maximum likelihood estimates of some parameters are infinite.
An algorithm for detecting such cases and characterizing the divergence of
the parameter estimates is presented.
A method for fitting the remaining parameters is also presented.
All of these methods rely only on sufficient statistics rather
than less aggregated quantities, as required for inference according to the
method of Kolassa and Tanner (1994).
These results are applied to approximate conditional inference via
saddlepoint methods. Specifically, the double saddlepoint method of
Skovgaard (1987) is adapted to the case when the solution to the
saddlepoint equations exists as a point at infinity.
Abstract not available.
OBJECTIVE: This study assessed the interaction of genetic risk and
rearing-family risk as a subsyndromal test measure of schizophrenic thought
disorder in adoptees. METHOD: A group of 58 adoptees with schizophrenic
biological mothers was compared with 96 comparison adoptees at ordinary
genetic risk; putative adoptee vulnerability was assessed blindly and reliably
by using the Rorschach Index of Primitive Thought. Environmental risk was
measured by using frequency of communication deviance as a continuous
variable, scored independently from Rorschach assessments of the adoptive
parents. RESULTS: High genetic risk in itself was not associated with greater
vulnerability to schizophrenic thought disorder in the adoptees, as indicated
by the Index of Primitive Thought. Also, greater communication deviance in the
adoptive parents was not associated with greater thought disorder in the
comparison adoptees. However, there was a highly significant gene-environment
interaction. Among the offspring of the adoptive parents with high levels of
communication deviance, a higher proportion of high-risk than comparison
adoptees showed evidence of thought disorder. In contrast, among the
offspring of adoptive parents with low communication deviance, a lower
proportion of high-risk than comparison adoptees showed evidence of thought
disorder. The distribution of communication deviance scores did not differ
significantly between the adoptive parents of high-risk offspring and the
adoptive parents of comparison offspring. CONCLUSIONS: The findings are
consistent with genetic control of sensitivity to the environment. There is
no evidence that high genetic risk of schizophrenia among offspring is
associated with high levels of communication problems in rearing parents.
This book has no abstract.
This paper presents results showing that the error involved in using
double saddlepoint distribution function approximations of
Skovgaard (1987) are uniformly bounded. Particular attention
is paid to distributions of sufficient statistics arising from generalized
linear models. This work is intended in part to validate the use of the
Markov Chain Monte Carlo by Kolassa and Tanner (1994)
using these conditional distribution function
approximations.
This paper presents algorithms for computing confidence intervals and
regions for
elements of a parameter vector when the signs of linear
combinations of unknown parameters are observed, but the coefficients
contain experimental error.
These methods were proposed in the geochemical literature by
Kolassa (1992) as a method specific to petrology.
Experimental data is used to give linear constraints,
involving quantities measured with error,
on unknown free energies and entropies of a chemical reaction.
Confidence intervals are given for these parameters, and these are
compared to more naive approaches.
This article presents an algorithm for approximate frequentist
conditional inference on two or more parameters for any
regression model in the GLIM family.
We thereby extend highly accurate inference beyond the cases
of logistic regression and contingency tables implimented in commercially
available software.
The method makes use of the double saddlepoint approximations of
Skovgaard (1987) and
Jensen (1992)
to the conditional cumulative distribution function of a sufficient
statistic given the remaining sufficient statistics.
This approximation is then used in conjunction with noniterative
Monte Carlo methods to generate
a sample from a distribution that approximates the joint distribution of
the sufficient statistics associated with the parameters of interest
conditional on the observed values of the sufficient statistics associated
with the nuisance parameters.
This algorithm is an alternate approach to
that presented by Kolassa and Tanner (1994), in which
a Markov chain is generated whose equilibrium distribution under certain
regularity conditions approximates the joint distribution of interest.
In Kolassa and Tanner (1994) the
Gibbs sampler was used in conjunction with these univariate conditional
distribution function approximations.
The method of this paper does not require the construction and simulation
of a Markov chain, thus avoiding
the need to develop regularity conditions under which the algorithm
converges and the need for the data analyst to check convergence of
the particular chain.
Examples involving logistic and truncated Poisson regression are presented.
Local Markers
The University of Rochester Home Page
University of Rochester Medical Center Home Page
Biostat Home Document