On the Application of Cluster Analysis to Growing Season Precipitation Data in North America East of the Rockies

Gong, Xiaofeng; Richman, Michael B.

Source: Journal of Climate:;1995:;volume( 008 ):;issue: 004::page 897

Author:

Gong, Xiaofeng

Richman, Michael B.

DOI: 10.1175/1520-0442(1995)008<0897:OTAOCA>2.0.CO;2

Publisher: American Meteorological Society

Abstract: Cluster analysis (CA) has been applied to geophysical research for over two decades although its popularity has increased dramatically over the past few years. To date, systematic methodological reviews have not appeared in geophysical literature. In this paper, after a review of a large number of applications on cluster analysis, an intercomparison of various cluster techniques was carried out on a well-studied dataset (7-day precipitation data from 1949 to 1987 in central and eastern North America). The cluster methods tested were single linkage, complete linkage, average linkage between groups, average linkage within a new group, Ward's method, k means, the nucleated agglomerative method, and the rotated principal component analysis. Three different dissimilarity measures (Euclidean distance, inverse correlation, and theta angle) and three initial partition methods were also tested on the hierarchical and nonhierarchical methods, respectively. Twenty-two of the 23 cluster algorithms yielded natural grouping solutions. Monte Carlo simulations were undertaken to examine the reliability of the cluster solutions. This was done by bootstrap resampling from the full dataset with four different sample size, then testing significance by the t test and the minimum significant difference test. Results showed that nonhierarchical methods outperformed hierarchical methods. The rotated principal component methods were found to be the most accurate methods, the nucleated agglomerative method was found to be superior to all other hard cluster methods, and Ward's method performed best among the hierarchical methods. Single linkage always yielded ?chaining? solutions and, therefore, had poor matches to the input data. Of the three distance measures tested, Euclidean distance appeared to generate slightly more accurate solutions compared with the inverse correlation. The theta angle was quite variable in its accuracy. Tests of the initial partition method revealed a sensitivity of k- means CA to the selection of the seed points. The spatial patterns of cluster analysis applied to the full dataset were found to differ for various CA methods, thereby creating some questions on how to interpret the resulting spatial regionalizations. Several methods were shown to incorrectly place geographically separated portions of the domain into a single cluster. The authors termed this type of result ?aggregation error.? It was found to be most problematic at small sample sizes and more severe for specific distance measures. The choice of clustering technique and dissimilarity measure/initial partition may indeed significantly affect the results of cluster analysis. Cluster analysis accuracy was also found to be linearly to logarithmically related to the sample size. This relationship was statistically significant. Several methods, such as Ward's, k means, and the nucleated agglomerative were found to reach a higher level of accuracy at a lower sample size compared with other CA methods tested. The level of accuracy reached by the rotated principal component clustering compared with the other methods tested suggests that application of a hard and nonoverlapping clustering methodology to fuzzy and overlapping geophysical data results in a substantial degradation in the regionalizations presented.

Download: (2.824Mb)
Show Full MetaData Hide Full MetaData
Item Order
Go To Publisher
Price: 5000 Rial
Statistics

On the Application of Cluster Analysis to Growing Season Precipitation Data in North America East of the Rockies

URI

http://yetl.yabesh.ir/yetl1/handle/yetl/4182090

Collections

Journal of Climate

Show full item record

contributor author	Gong, Xiaofeng
contributor author	Richman, Michael B.
date accessioned	2017-06-09T15:25:28Z
date available	2017-06-09T15:25:28Z
date copyright	1995/04/01
date issued	1995
identifier issn	0894-8755
identifier other	ams-4332.pdf
identifier uri	http://onlinelibrary.yabesh.ir/handle/yetl/4182090
description abstract	Cluster analysis (CA) has been applied to geophysical research for over two decades although its popularity has increased dramatically over the past few years. To date, systematic methodological reviews have not appeared in geophysical literature. In this paper, after a review of a large number of applications on cluster analysis, an intercomparison of various cluster techniques was carried out on a well-studied dataset (7-day precipitation data from 1949 to 1987 in central and eastern North America). The cluster methods tested were single linkage, complete linkage, average linkage between groups, average linkage within a new group, Ward's method, k means, the nucleated agglomerative method, and the rotated principal component analysis. Three different dissimilarity measures (Euclidean distance, inverse correlation, and theta angle) and three initial partition methods were also tested on the hierarchical and nonhierarchical methods, respectively. Twenty-two of the 23 cluster algorithms yielded natural grouping solutions. Monte Carlo simulations were undertaken to examine the reliability of the cluster solutions. This was done by bootstrap resampling from the full dataset with four different sample size, then testing significance by the t test and the minimum significant difference test. Results showed that nonhierarchical methods outperformed hierarchical methods. The rotated principal component methods were found to be the most accurate methods, the nucleated agglomerative method was found to be superior to all other hard cluster methods, and Ward's method performed best among the hierarchical methods. Single linkage always yielded ?chaining? solutions and, therefore, had poor matches to the input data. Of the three distance measures tested, Euclidean distance appeared to generate slightly more accurate solutions compared with the inverse correlation. The theta angle was quite variable in its accuracy. Tests of the initial partition method revealed a sensitivity of k- means CA to the selection of the seed points. The spatial patterns of cluster analysis applied to the full dataset were found to differ for various CA methods, thereby creating some questions on how to interpret the resulting spatial regionalizations. Several methods were shown to incorrectly place geographically separated portions of the domain into a single cluster. The authors termed this type of result ?aggregation error.? It was found to be most problematic at small sample sizes and more severe for specific distance measures. The choice of clustering technique and dissimilarity measure/initial partition may indeed significantly affect the results of cluster analysis. Cluster analysis accuracy was also found to be linearly to logarithmically related to the sample size. This relationship was statistically significant. Several methods, such as Ward's, k means, and the nucleated agglomerative were found to reach a higher level of accuracy at a lower sample size compared with other CA methods tested. The level of accuracy reached by the rotated principal component clustering compared with the other methods tested suggests that application of a hard and nonoverlapping clustering methodology to fuzzy and overlapping geophysical data results in a substantial degradation in the regionalizations presented.
publisher	American Meteorological Society
title	On the Application of Cluster Analysis to Growing Season Precipitation Data in North America East of the Rockies
type	Journal Paper
journal volume	8
journal issue	4
journal title	Journal of Climate
identifier doi	10.1175/1520-0442(1995)008<0897:OTAOCA>2.0.CO;2
journal fristpage	897
journal lastpage	931
tree	Journal of Climate:;1995:;volume( 008 ):;issue: 004
contenttype	Fulltext

YaBeSH Engineering and Technology Library

Archive