Robust Multivariate Outlier Detection Methods for Environmental DataSource: Journal of Environmental Engineering:;2010:;Volume ( 136 ):;issue: 011DOI: 10.1061/(ASCE)EE.1943-7870.0000271Publisher: American Society of Civil Engineers
Abstract: Outliers are an inevitable concern that needs to be identified and dealt with whenever one analyzes a large data set. Today’s water quality data are often collected on different scales, encompass several sites, monitor several correlated parameters, involve a multitude of individuals from several agencies, and span over several years. As such, the ability to identify outliers, which may affect the results of the analysis, is crucial. This note presents several statistical techniques that have been developed to deal with this problem, with particular emphasis on robust multivariate methods. These techniques are capable of isolating outliers while overcoming the effects of masking that can hinder the effectiveness of common outlier detection techniques such as Mahalanobis distances (MD). This note uses a comprehensive national metadata set on lake water quality as a case study to analyze the effectiveness of three robust outlier detection techniques, namely, the minimum covariance determinant (MCD), the minimum volume ellipsoid (MVE), and M-estimators. The note compares the results generated from these three techniques to assess the severity of each method when it comes to labeling observations as outliers. The results demonstrate the limitations of using MD to analyze multidimensional water quality data. The analysis also highlighted the differences between the three robust multivariate methods, whereby the MVE method was found to be the most severe when it came to outlier detection, while the MCD was the most lenient. Of the three robust multivariate outlier detection methods analyzed, the M-estimator proved to be the most flexible because it allowed for downweighting rather than censoring many borderline outlier observations.
|
Collections
Show full item record
| contributor author | Ibrahim Alameddine | |
| contributor author | Melissa A. Kenney | |
| contributor author | Russell J. Gosnell | |
| contributor author | Kenneth H. Reckhow | |
| date accessioned | 2017-05-08T21:41:44Z | |
| date available | 2017-05-08T21:41:44Z | |
| date copyright | November 2010 | |
| date issued | 2010 | |
| identifier other | %28asce%29ee%2E1943-7870%2E0000279.pdf | |
| identifier uri | http://yetl.yabesh.ir/yetl/handle/yetl/59682 | |
| description abstract | Outliers are an inevitable concern that needs to be identified and dealt with whenever one analyzes a large data set. Today’s water quality data are often collected on different scales, encompass several sites, monitor several correlated parameters, involve a multitude of individuals from several agencies, and span over several years. As such, the ability to identify outliers, which may affect the results of the analysis, is crucial. This note presents several statistical techniques that have been developed to deal with this problem, with particular emphasis on robust multivariate methods. These techniques are capable of isolating outliers while overcoming the effects of masking that can hinder the effectiveness of common outlier detection techniques such as Mahalanobis distances (MD). This note uses a comprehensive national metadata set on lake water quality as a case study to analyze the effectiveness of three robust outlier detection techniques, namely, the minimum covariance determinant (MCD), the minimum volume ellipsoid (MVE), and M-estimators. The note compares the results generated from these three techniques to assess the severity of each method when it comes to labeling observations as outliers. The results demonstrate the limitations of using MD to analyze multidimensional water quality data. The analysis also highlighted the differences between the three robust multivariate methods, whereby the MVE method was found to be the most severe when it came to outlier detection, while the MCD was the most lenient. Of the three robust multivariate outlier detection methods analyzed, the M-estimator proved to be the most flexible because it allowed for downweighting rather than censoring many borderline outlier observations. | |
| publisher | American Society of Civil Engineers | |
| title | Robust Multivariate Outlier Detection Methods for Environmental Data | |
| type | Journal Paper | |
| journal volume | 136 | |
| journal issue | 11 | |
| journal title | Journal of Environmental Engineering | |
| identifier doi | 10.1061/(ASCE)EE.1943-7870.0000271 | |
| tree | Journal of Environmental Engineering:;2010:;Volume ( 136 ):;issue: 011 | |
| contenttype | Fulltext |