Maximizing the usefulness of data obtained with planned missing value patterns: An application of maximum likelihood procedures

John W. Graham, Scott Hofer, David P. MacKinnon

Research output: Contribution to journalArticle

205 Citations (Scopus)

Abstract

Researchers often face a dilemma: Should they collect little data and emphasize quality, or much data at the expense of quality? The utility of the 3-form design coupled with maximum likelihood methods for estimation of missing values was evaluated. In 3-form design surveys, four sets of items, X, A, B, and C are administered: Each third of the subjects receives X and one combination of two other item sets - AB, BC, or AC. Variances and covariances were estimated with pairwise deletion, mean replacement, single imputation, multiple imputation, raw data maximum likelihood, multiple-group covariance structure modeling, and Expectation-Maximization (EM) algorithm estimation. The simulation demonstrated that maximum likelihood estimation and multiple imputation methods produce the most efficient and least biased estimates of variances and covariances for normally distributed and slightly skewed data when data are missing completely at random (MCAR). Pairwise deletion provided equally unbiased estimates but was less efficient than ML procedures. Further simulation results demonstrated that non-maximum likelihood methods break down when data are not missing completely at random. Application of these methods with empirical drug use data resulted in similar covariance matrices for pairwise and EM estimation, however, ML estimation produced better and more efficient regression estimates. Maximum likelihood estimation or multiple imputation procedures, which are now becoming more readily available, are always recommended. In order to maximize the efficiency of the ML parameter estimates, it is recommended that scale items be split across forms rather than being left intact within forms.

Original languageEnglish (US)
Pages (from-to)197-218
Number of pages22
JournalMultivariate Behavioral Research
Volume31
Issue number2
StatePublished - 1996
Externally publishedYes

Fingerprint

Missing Values
Maximum Likelihood
Multiple Imputation
Values
Missing Completely at Random
Pairwise
Maximum Likelihood Estimation
Deletion
Estimate
Survey Design
Regression Estimate
simulation
Expectation Maximization
Likelihood Methods
Dilemma
Imputation
Maximum Likelihood Method
Covariance Structure
Expectation-maximization Algorithm
Usefulness

ASJC Scopus subject areas

  • Mathematics (miscellaneous)
  • Statistics and Probability
  • Psychology(all)
  • Experimental and Cognitive Psychology
  • Social Sciences (miscellaneous)

Cite this

Maximizing the usefulness of data obtained with planned missing value patterns : An application of maximum likelihood procedures. / Graham, John W.; Hofer, Scott; MacKinnon, David P.

In: Multivariate Behavioral Research, Vol. 31, No. 2, 1996, p. 197-218.

Research output: Contribution to journalArticle

@article{ab583406dd1c4b2db1709622afdd6d50,
title = "Maximizing the usefulness of data obtained with planned missing value patterns: An application of maximum likelihood procedures",
abstract = "Researchers often face a dilemma: Should they collect little data and emphasize quality, or much data at the expense of quality? The utility of the 3-form design coupled with maximum likelihood methods for estimation of missing values was evaluated. In 3-form design surveys, four sets of items, X, A, B, and C are administered: Each third of the subjects receives X and one combination of two other item sets - AB, BC, or AC. Variances and covariances were estimated with pairwise deletion, mean replacement, single imputation, multiple imputation, raw data maximum likelihood, multiple-group covariance structure modeling, and Expectation-Maximization (EM) algorithm estimation. The simulation demonstrated that maximum likelihood estimation and multiple imputation methods produce the most efficient and least biased estimates of variances and covariances for normally distributed and slightly skewed data when data are missing completely at random (MCAR). Pairwise deletion provided equally unbiased estimates but was less efficient than ML procedures. Further simulation results demonstrated that non-maximum likelihood methods break down when data are not missing completely at random. Application of these methods with empirical drug use data resulted in similar covariance matrices for pairwise and EM estimation, however, ML estimation produced better and more efficient regression estimates. Maximum likelihood estimation or multiple imputation procedures, which are now becoming more readily available, are always recommended. In order to maximize the efficiency of the ML parameter estimates, it is recommended that scale items be split across forms rather than being left intact within forms.",
author = "Graham, {John W.} and Scott Hofer and MacKinnon, {David P.}",
year = "1996",
language = "English (US)",
volume = "31",
pages = "197--218",
journal = "Multivariate Behavioral Research",
issn = "0027-3171",
publisher = "Psychology Press Ltd",
number = "2",

}

TY - JOUR

T1 - Maximizing the usefulness of data obtained with planned missing value patterns

T2 - An application of maximum likelihood procedures

AU - Graham, John W.

AU - Hofer, Scott

AU - MacKinnon, David P.

PY - 1996

Y1 - 1996

N2 - Researchers often face a dilemma: Should they collect little data and emphasize quality, or much data at the expense of quality? The utility of the 3-form design coupled with maximum likelihood methods for estimation of missing values was evaluated. In 3-form design surveys, four sets of items, X, A, B, and C are administered: Each third of the subjects receives X and one combination of two other item sets - AB, BC, or AC. Variances and covariances were estimated with pairwise deletion, mean replacement, single imputation, multiple imputation, raw data maximum likelihood, multiple-group covariance structure modeling, and Expectation-Maximization (EM) algorithm estimation. The simulation demonstrated that maximum likelihood estimation and multiple imputation methods produce the most efficient and least biased estimates of variances and covariances for normally distributed and slightly skewed data when data are missing completely at random (MCAR). Pairwise deletion provided equally unbiased estimates but was less efficient than ML procedures. Further simulation results demonstrated that non-maximum likelihood methods break down when data are not missing completely at random. Application of these methods with empirical drug use data resulted in similar covariance matrices for pairwise and EM estimation, however, ML estimation produced better and more efficient regression estimates. Maximum likelihood estimation or multiple imputation procedures, which are now becoming more readily available, are always recommended. In order to maximize the efficiency of the ML parameter estimates, it is recommended that scale items be split across forms rather than being left intact within forms.

AB - Researchers often face a dilemma: Should they collect little data and emphasize quality, or much data at the expense of quality? The utility of the 3-form design coupled with maximum likelihood methods for estimation of missing values was evaluated. In 3-form design surveys, four sets of items, X, A, B, and C are administered: Each third of the subjects receives X and one combination of two other item sets - AB, BC, or AC. Variances and covariances were estimated with pairwise deletion, mean replacement, single imputation, multiple imputation, raw data maximum likelihood, multiple-group covariance structure modeling, and Expectation-Maximization (EM) algorithm estimation. The simulation demonstrated that maximum likelihood estimation and multiple imputation methods produce the most efficient and least biased estimates of variances and covariances for normally distributed and slightly skewed data when data are missing completely at random (MCAR). Pairwise deletion provided equally unbiased estimates but was less efficient than ML procedures. Further simulation results demonstrated that non-maximum likelihood methods break down when data are not missing completely at random. Application of these methods with empirical drug use data resulted in similar covariance matrices for pairwise and EM estimation, however, ML estimation produced better and more efficient regression estimates. Maximum likelihood estimation or multiple imputation procedures, which are now becoming more readily available, are always recommended. In order to maximize the efficiency of the ML parameter estimates, it is recommended that scale items be split across forms rather than being left intact within forms.

UR - http://www.scopus.com/inward/record.url?scp=0030527014&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0030527014&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:0030527014

VL - 31

SP - 197

EP - 218

JO - Multivariate Behavioral Research

JF - Multivariate Behavioral Research

SN - 0027-3171

IS - 2

ER -