Analysis of Incomplete Climate Data: Estimation of Mean Values and Covariance Matrices and Imputation of Missing ValuesSource: Journal of Climate:;2001:;volume( 014 ):;issue: 005::page 853Author:Schneider, Tapio
DOI: 10.1175/1520-0442(2001)014<0853:AOICDE>2.0.CO;2Publisher: American Meteorological Society
Abstract: Estimating the mean and the covariance matrix of an incomplete dataset and filling in missing values with imputed values is generally a nonlinear problem, which must be solved iteratively. The expectation maximization (EM) algorithm for Gaussian data, an iterative method both for the estimation of mean values and covariance matrices from incomplete datasets and for the imputation of missing values, is taken as the point of departure for the development of a regularized EM algorithm. In contrast to the conventional EM algorithm, the regularized EM algorithm is applicable to sets of climate data, in which the number of variables typically exceeds the sample size. The regularized EM algorithm is based on iterated analyses of linear regressions of variables with missing values on variables with available values, with regression coefficients estimated by ridge regression, a regularized regression method in which a continuous regularization parameter controls the filtering of the noise in the data. The regularization parameter is determined by generalized cross-validation, such as to minimize, approximately, the expected mean-squared error of the imputed values. The regularized EM algorithm can estimate, and exploit for the imputation of missing values, both synchronic and diachronic covariance matrices, which may contain information on spatial covariability, stationary temporal covariability, or cyclostationary temporal covariability. A test of the regularized EM algorithm with simulated surface temperature data demonstrates that the algorithm is applicable to typical sets of climate data and that it leads to more accurate estimates of the missing values than a conventional noniterative imputation technique.
|
Collections
Show full item record
| contributor author | Schneider, Tapio | |
| date accessioned | 2017-06-09T15:56:05Z | |
| date available | 2017-06-09T15:56:05Z | |
| date copyright | 2001/03/01 | |
| date issued | 2001 | |
| identifier issn | 0894-8755 | |
| identifier other | ams-5697.pdf | |
| identifier uri | http://onlinelibrary.yabesh.ir/handle/yetl/4197255 | |
| description abstract | Estimating the mean and the covariance matrix of an incomplete dataset and filling in missing values with imputed values is generally a nonlinear problem, which must be solved iteratively. The expectation maximization (EM) algorithm for Gaussian data, an iterative method both for the estimation of mean values and covariance matrices from incomplete datasets and for the imputation of missing values, is taken as the point of departure for the development of a regularized EM algorithm. In contrast to the conventional EM algorithm, the regularized EM algorithm is applicable to sets of climate data, in which the number of variables typically exceeds the sample size. The regularized EM algorithm is based on iterated analyses of linear regressions of variables with missing values on variables with available values, with regression coefficients estimated by ridge regression, a regularized regression method in which a continuous regularization parameter controls the filtering of the noise in the data. The regularization parameter is determined by generalized cross-validation, such as to minimize, approximately, the expected mean-squared error of the imputed values. The regularized EM algorithm can estimate, and exploit for the imputation of missing values, both synchronic and diachronic covariance matrices, which may contain information on spatial covariability, stationary temporal covariability, or cyclostationary temporal covariability. A test of the regularized EM algorithm with simulated surface temperature data demonstrates that the algorithm is applicable to typical sets of climate data and that it leads to more accurate estimates of the missing values than a conventional noniterative imputation technique. | |
| publisher | American Meteorological Society | |
| title | Analysis of Incomplete Climate Data: Estimation of Mean Values and Covariance Matrices and Imputation of Missing Values | |
| type | Journal Paper | |
| journal volume | 14 | |
| journal issue | 5 | |
| journal title | Journal of Climate | |
| identifier doi | 10.1175/1520-0442(2001)014<0853:AOICDE>2.0.CO;2 | |
| journal fristpage | 853 | |
| journal lastpage | 871 | |
| tree | Journal of Climate:;2001:;volume( 014 ):;issue: 005 | |
| contenttype | Fulltext |