Abstracts of Papers by John Kolassa

Topics in Series Approximations to Distribution Functions

This thesis addresses a variety of problems involved in series approximations for distribution functions, including modifying formal Edgeworth and saddlepoint expansions to apply to the case of lattice random variables. I demonstrate that the classical Edgeworth series provides an asymptotic expansion for the cumulative distribution function of a lattice random variable if the cumulants in this series are adjusted using Sheppard's corrections, and the series is evaluated at midpoints of the intervals between lattice points. This result is motivated using a smoothing argument, and proven rigorously.

These results for the Edgeworth series are extended to the multivariate case. Again, I demonstrate that the classical Edgeworth series can be modified to provide an asymptotic expansion for the cumulative distribution function of a lattice random variable. In this case, the multivariate cumulants must be adjusted by the multivariate version of Sheppard's corrections, and the series must be evaluated at the midpoints of hypercubes whose vertices are lattice points.

These results also extend the saddlepoint approximation in the univariate case. The thesis demonstrates that regular saddlepoint methods for distribution functions can be used in the case of lattice variables if the cumulant generating function involved in this approximation is modified by adding the generating function for Sheppard's corrections.

Approximations for non-rectangular sets, in particular, those arising in the likelihood ratio tests, are discussed. Connections with Bartlett's corrections are also explored. Numeric results indicate that using this correction and a continuity correction improve the accuracy of the Chi square approximation to the distribution of the log likelihood ratio statistic.

This thesis also discusses a modification to the saddlepoint approximation that requires only the Legendre transform of an approximating cumulant generating function, for use when the exact Legendre transform is difficult to compute. An expansion is derived which, while not behaving as well as the exact saddlepoint approximation, performs better than the Edgeworth approximation. These methods are applied to the problem of calculating moments of the response in a logistic random effects model.

Edgeworth Series for Lattice Distributions

This paper investigates the use of Edgeworth expansions for approximating the distribution function of the normalized sum of n independent and identically distributed lattice-valued random variables. We prove that the continuity--corrected Edgeworth series, using Sheppard--adjusted cumulants, is accurate to the same order in n as the usual Edgeworth approximation for continuous random variables, Finally, as a partial justification for the Sheppard adjustments, it is shown that if a continuous random variable Y is rounded to a discrete part D and a truncation error U, such that Y=D+U, then under suitable limiting conditions the truncation error is approximately uniformly distributed and independent of Y, but not independent of D.

Saddlepoint Approximations in the case of Intractable Cumulant Generating Functions

Saddlepoint Approximations have long been used to approximate densities and distribution functions of random variables with known cumulant generating function defined on an open interval about the origin. This approximation has very desirable asymptotic properties when approximating densities and tail probabilities for sums of random variables, and also often performs remarkably well for small sample sizes, including samples of one.

Calculating the saddlepoint approximation requires calculating the Legendre transform of the log of the cumulant generating function. In some cases this cumulant generating function may be unavailable; in other cases the Legendre transform is difficult to calculate analytically. This paper discusses modifications to the saddlepoint approximation necessary when the cumulant generating function is replaced by a similar but more tractable function whose Legendre transform can be given explicitly. Calculations for the logistic distribution are presented to illustrate the case of a known but intractable cumulant generating function, and an example involving an overdispersed binomial model is presented to illustrate the case of an unavailable cumulant generating function. An application to a random effects logistic linear model is discussed.

Confidence Intervals for Thermodynamic Constants

Phase equilibrium experiments using the same reactants and run at different temperatures and pressures give rise to linear constraints on thermodynamic variables, such as entropies, enthalpies, and volume changes of reaction. Linear programming has been applied to this problem to determine maxima and minima for these parameters separately.

Since these temperatures and pressures are measured or set with uncertainty, their nominal values as reported by the experimenter need not define the same feasible region as that defined by their true values. Clearly, then, the ranges resulting from the linear programming procedure will not represent the reasonable ranges of parameter values. This paper explores the problem of inference on these parameters when a normal error distribution is postulated for temperature and pressure. Current methodologies and their relation to the field of statistical inference are discussed. Two algorithms for generating bounds on the entropies and enthalpies are presented. The first uses a normal approximation to the location of feasible region vertices, and makes allowances for variations in the constraints contributing to this vertex to construct confidence intervals for the extrema in random linear programs, to generate bounds holding for one parameter at a time. The second generates bounds holding simultaneously for all parameters. A comparison of these confidence intervals with those generated by conventional methods demonstrates that estimates of thermodynamic variables have far less variability than commonly believed.

Statistical Review of the Henhouse experiments: The Effects of a Pulsed Magnetic Field on Chick Embryos

This paper analyzes data from a study conducted by the United States Office of Naval Research on the effects of pulsed magnetic fields on chick embryos. The experiment involved incubation of eggs under carefully controlled conditions in six different laboratories. The original analysis included inappropriate statistical methodology for analyzing the experimental results. Since the conclusions from this study rest so heavily on the results of statistical analyses, choosing the proper methodology is imperative. The major aim of this paper then is to introduce more appropriate analytic tools and illustrate their use in the present context. Qualitatively our results agree with those of the original analysis; our findings about interactions between effects, however, makes interpretation of these effects more subtle. We apply linear logistic modeling to counts of damaged embryos, using as covariates factors corresponding to exposure, laboratory, incubator, run, and measurements of background radiation. This facilitates estimation of the size of the effects. The effects of laboratory, incubator, and run are explored both as fixed and random effects. We find statistically significant exposure and laboratory effects, in accordance with the original study. However, we also find that the inter-laboratory variation in exposure effect is at least as large as the exposure effect itself. The presence of such effects fundamentally alters the interpretation of the fitted model, as is graphically presented.

Discussion on the Meeting on the Gibbs Sampler and other Markov Chain Monte Carlo Methods

Paper had no abstract.

Series Approximation Methods in Statistics

Book had no abstract.

Carbamazepine Treatment of Agitation in Nursing Home Patients with Dementia: A Preliminary Study

OBJECTIVE: To determine the effects of carbamazepine versus placebo on ratings of behavior in agitated nursing home patients with dementia. DESIGN: Nonrandomized, placebo-controlled, crossover trial conducted in 25 patients in two nursing homes. INTERVENTION: Carbamazepine and placebo were administered during two 5-week periods separated by a 2-week washout. The carbamazepine dose was determined for each patient by a nonblinded physician who did not participate in ratings (modal dose 300 mg/day). MEASUREMENTS: The primary outcome measures were Brief Psychiatric Rating Scale scores and Clinical Global Impression of Change, rated by blind observers. Secondary measures of behavior, adversity, cognition, and functional status were also included. MAIN RESULTS: Median total Brief Psychiatric Rating Scale score decreased 7 points on carbamazepine versus 3 on placebo (P = 0.03). Sixteen subjects were rated as improved globally on carbamazepine versus four on placebo (P = 0.001). Secondary measures of behavior showed similar changes at significant or suggestive (P < 0.10) levels. One subject developed carbamazepine-induced tics, and one died with a pneumonia. There was minimal other adversity. CONCLUSION: This preliminary study suggests that carbamazepine in low doses can reduce agitated behaviors in some patients, with limited adversity resulting. Further research is required to confirm and extend this finding before it can be considered routine clinical practice.

Approximate Conditional Inference in Exponential Families Via the Gibbs Sampler

This article presents the Gibbs--Skovgaard algorithm for approximate frequentist inference. The method makes use of the double saddlepoint approximation of Skovgaard to the conditional cumulative distribution function of a sufficient statistic given the remaining sufficient statistics. This approximation is then used in the Gibbs sampler to generate a Markov chain. The equilibrium distribution of this chain approximates the joint distribution of the sufficient statistics associated with the parameters of interest conditional on the observed values of the sufficient statistics associated with the nuisance parameters. This Gibbs--Skovgaard algorithm is applied to the cases of logistic and Poisson regression.

A Comparison of Size and Power Calculations for the Wilcoxon Statistic for Ordered Categorical Data

This paper compares the accuracy of approximations to test size and power based on the score and the Fisher information, with calculations using Edgeworth and Cornish--Fisher approximations, and demonstrates that in some cases one must exercise much care in using the simpler approximations.

Edgeworth Approximations for Rank Sum Test Statistics

Hettmansperger (1984) quotes a result showing that the distribution function of the Wilcoxon signed rank statistic is approximated by the usual Edgeworth series using the first four cumulants, to o(1/n). In light of standard Edgeworth series results for random variables confined to a lattice, this result is counterintuitive. One expects correction terms to be necessary because of the lattice nature of the Wilcoxon statistic. This paper explains this apparent paradox, provides an alternative proof relying on basic Edgeworth series results, and provides a sharper result. Interesting features in this problem highlighting limitations of expansions for random variables on a lattice are discussed.

Small Sample Conditional Inference in Biostatistics

Kolassa and Tanner (1994) present the Gibbs--Skovgaard algorithm for approximate conditional inference. This algorithm makes use of the double saddlepoint approximation to the conditional distribution function of a sufficient statistic given the remaining sufficient statistics. This approximation is used with the Gibbs Sampler to generate a Markov chain. The equilibrium distribution of this chain approximates the joint distribution of the sufficient statistics associated with the parameters of interest conditional on the observed values of the sufficient statistics associated with the nuisance parameters. In this paper issues relating to existence of an equilibrium distribution and the quality of the resulting approximation to the desired distribution are discussed. This algorithm is applied to inference for non--linear regression parameters.

Acute Respiratory Tract Infection in Daycare Centers for the Elderly

Abstract not available.

Monte Carlo Sampling in Multiway Contingency Tables

This paper presents methods for calculating Monte Carlo estimates of sizes of conditional tests in hierarchical models for multiway contingency tables. It extends the results of Kolassa and Tanner (1994), and presents an alternate parameterization for higher--way tables to facilitate sampling.

Higher--Order Approximations to Conditional Distribution Functions

This paper derives higher order terms in the double-saddlepoint expansion of Skovgaard (1987) for a unidimensional conditional cumulative distribution function. Expansions for continuous and lattice random variables are derived. Results are applied to the sufficient statistic in logistic regression.

Approximate Monte Carlo Conditional Inference in Exponential Families

This article presents an algorithm for approximate frequentist inference. The method makes use of the double saddlepoint approximation of Skovgaard (1987) to the conditional CDF of a sufficient statistic given the remaining sufficient statistics. This approximation is then used iteratively in conjunction with MC methods to generate a sample from a distribution that approximates the joint distribution of the sufficient statistics associated with the parameters of interest conditional on the observed values of the sufficient statistics associated with the nuisance parameters. This algorithm is an alternate approach to that presented by Kolassa and Tanner (1994), in which the Gibbs sampler was used in conjunction with these univariate conditional distribution function approximations. An example involving logistic regression is presented.

This paper has no abstract.

The "common cold" in frail older persons: impact of rhinovirus and coronavirus in a senior daycare center.

OBJECTIVE: To evaluate the incidence and impact of rhinovirus and coronavirus infections in older persons attending daycare. DESIGN: Prospective descriptive study. SETTING: Three senior daycare centers in Rochester, New York. PATIENTS: Frail older persons and staff members of the daycare centers who developed signs or symptoms of an acute respiratory illness. MEASUREMENTS: Demographic, medical, and physical findings were recorded on subjects at baseline and during respiratory illness. Nasopharyngeal specimens for viral culture as well as acute and convalescent sera for coronavirus 229E enzyme immunoassay (EIA) were obtained for all illnesses. RESULTS: During the 44 months of study, 352 older persons experienced 522 illnesses. Thirty-five (7%) of 522 cultures were positive for rhinovirus and 37 (8%) of 451 serologies were positive for coronavirus 229E infection. The clinical syndromes associated with rhinovirus and coronavirus infection were similar and characterized by nasal congestion, cough, and constitutional symptoms. No patient died or was hospitalized, but approximately 50% had evidence of lower respiratory tract involvement. The average illness lasted 14 days. During the same period, 113 staff developed 338 respiratory illnesses. Eight percent were identified as coronavirus and 9% as rhinovirus. Cough, sputum production, and constitutional symptoms were significantly more common among older persons. CONCLUSIONS: Rhinovirus and coronavirus 229E are common causes of moderately debilitating acute respiratory illnesses among older persons attending daycare.

Infinite Parameter Estimates in Logistic Regression, with Application to Approximate Conditional Inference

This paper discusses recovery of information regarding logistic regression parameters in cases when maximum likelihood estimates of some parameters are infinite. An algorithm for detecting such cases and characterizing the divergence of the parameter estimates is presented. A method for fitting the remaining parameters is also presented. All of these methods rely only on sufficient statistics rather than less aggregated quantities, as required for inference according to the method of Kolassa and Tanner (1994). These results are applied to approximate conditional inference via saddlepoint methods. Specifically, the double saddlepoint method of Skovgaard (1987) is adapted to the case when the solution to the saddlepoint equations exists as a point at infinity.

Medical Practice with Nursing Home Residents: Results from the National Physician Activities Census

Abstract not available.

Gene-environment interaction in vulnerability to schizophrenia: findings from the Finnish Adoptive Family Study of Schizophrenia.

OBJECTIVE: This study assessed the interaction of genetic risk and rearing-family risk as a subsyndromal test measure of schizophrenic thought disorder in adoptees. METHOD: A group of 58 adoptees with schizophrenic biological mothers was compared with 96 comparison adoptees at ordinary genetic risk; putative adoptee vulnerability was assessed blindly and reliably by using the Rorschach Index of Primitive Thought. Environmental risk was measured by using frequency of communication deviance as a continuous variable, scored independently from Rorschach assessments of the adoptive parents. RESULTS: High genetic risk in itself was not associated with greater vulnerability to schizophrenic thought disorder in the adoptees, as indicated by the Index of Primitive Thought. Also, greater communication deviance in the adoptive parents was not associated with greater thought disorder in the comparison adoptees. However, there was a highly significant gene-environment interaction. Among the offspring of the adoptive parents with high levels of communication deviance, a higher proportion of high-risk than comparison adoptees showed evidence of thought disorder. In contrast, among the offspring of adoptive parents with low communication deviance, a lower proportion of high-risk than comparison adoptees showed evidence of thought disorder. The distribution of communication deviance scores did not differ significantly between the adoptive parents of high-risk offspring and the adoptive parents of comparison offspring. CONCLUSIONS: The findings are consistent with genetic control of sensitivity to the environment. There is no evidence that high genetic risk of schizophrenia among offspring is associated with high levels of communication problems in rearing parents.

Series Approximation Methods in Statistics, 2nd Edn.

This book has no abstract.

Uniformity of Double Saddlepoint Conditional Probability Approximations

This paper presents results showing that the error involved in using double saddlepoint distribution function approximations of Skovgaard (1987) are uniformly bounded. Particular attention is paid to distributions of sufficient statistics arising from generalized linear models. This work is intended in part to validate the use of the Markov Chain Monte Carlo by Kolassa and Tanner (1994) using these conditional distribution function approximations.

Confidence Intervals for Parameters Lying in a Random Polygon

This paper presents algorithms for computing confidence intervals and regions for elements of a parameter vector when the signs of linear combinations of unknown parameters are observed, but the coefficients contain experimental error. These methods were proposed in the geochemical literature by Kolassa (1992) as a method specific to petrology. Experimental data is used to give linear constraints, involving quantities measured with error, on unknown free energies and entropies of a chemical reaction. Confidence intervals are given for these parameters, and these are compared to more naive approaches.

Approximate Monte Carlo Conditional Inference in Exponential Families

This article presents an algorithm for approximate frequentist conditional inference on two or more parameters for any regression model in the GLIM family. We thereby extend highly accurate inference beyond the cases of logistic regression and contingency tables implimented in commercially available software. The method makes use of the double saddlepoint approximations of Skovgaard (1987) and Jensen (1992) to the conditional cumulative distribution function of a sufficient statistic given the remaining sufficient statistics. This approximation is then used in conjunction with noniterative Monte Carlo methods to generate a sample from a distribution that approximates the joint distribution of the sufficient statistics associated with the parameters of interest conditional on the observed values of the sufficient statistics associated with the nuisance parameters. This algorithm is an alternate approach to that presented by Kolassa and Tanner (1994), in which a Markov chain is generated whose equilibrium distribution under certain regularity conditions approximates the joint distribution of interest. In Kolassa and Tanner (1994) the Gibbs sampler was used in conjunction with these univariate conditional distribution function approximations. The method of this paper does not require the construction and simulation of a Markov chain, thus avoiding the need to develop regularity conditions under which the algorithm converges and the need for the data analyst to check convergence of the particular chain. Examples involving logistic and truncated Poisson regression are presented.

Local Markers

  • The University of Rochester Home Page
  • University of Rochester Medical Center Home Page
  • Biostat Home Document