Artificial Skill due to Predictor Screening

DelSole, Timothy; Shukla, Jagadish

Source: Journal of Climate:;2009:;volume( 022 ):;issue: 002::page 331

Author:

DelSole, Timothy

Shukla, Jagadish

DOI: 10.1175/2008JCLI2414.1

Publisher: American Meteorological Society

Abstract: This paper shows that if predictors are selected preferentially because of their strong correlation with a prediction variable, then standard methods for validating prediction models derived from these predictors will be biased. This bias is demonstrated by screening random numbers and showing that regression models derived from these random numbers have apparent skill, in a cross-validation sense, even though the predictors cannot possibly have the slightest predictive usefulness. This result seemingly implies that random numbers can give useful predictions, since the sample being predicted is separate from the sample used to estimate the regression model. The resolution of this paradox is that, prior to cross validation, all of the data had been used to evaluate correlations for selecting predictors. This situation differs from real-time forecasts in that the future sample is not available for screening. These results clarify the fallacy in assuming that if a model performs well in cross-validation mode, then it will perform well in real-time forecasts. This bias appears to afflict several forecast schemes that have been proposed in the literature, including operational forecasts of Indian monsoon rainfall and number of Atlantic hurricanes. The cross-validated skill of these models probably would not be distinguishable from that of a no-skill model if prior screening were taken into account.

Download: (1.074Mb)
Show Full MetaData Hide Full MetaData
Item Order
Go To Publisher
Price: 5000 Rial
Statistics

Artificial Skill due to Predictor Screening

URI

http://yetl.yabesh.ir/yetl1/handle/yetl/4208604

Collections

Journal of Climate

Show full item record

contributor author	DelSole, Timothy
contributor author	Shukla, Jagadish
date accessioned	2017-06-09T16:24:02Z
date available	2017-06-09T16:24:02Z
date copyright	2009/01/01
date issued	2009
identifier issn	0894-8755
identifier other	ams-67185.pdf
identifier uri	http://onlinelibrary.yabesh.ir/handle/yetl/4208604
description abstract	This paper shows that if predictors are selected preferentially because of their strong correlation with a prediction variable, then standard methods for validating prediction models derived from these predictors will be biased. This bias is demonstrated by screening random numbers and showing that regression models derived from these random numbers have apparent skill, in a cross-validation sense, even though the predictors cannot possibly have the slightest predictive usefulness. This result seemingly implies that random numbers can give useful predictions, since the sample being predicted is separate from the sample used to estimate the regression model. The resolution of this paradox is that, prior to cross validation, all of the data had been used to evaluate correlations for selecting predictors. This situation differs from real-time forecasts in that the future sample is not available for screening. These results clarify the fallacy in assuming that if a model performs well in cross-validation mode, then it will perform well in real-time forecasts. This bias appears to afflict several forecast schemes that have been proposed in the literature, including operational forecasts of Indian monsoon rainfall and number of Atlantic hurricanes. The cross-validated skill of these models probably would not be distinguishable from that of a no-skill model if prior screening were taken into account.
publisher	American Meteorological Society
title	Artificial Skill due to Predictor Screening
type	Journal Paper
journal volume	22
journal issue	2
journal title	Journal of Climate
identifier doi	10.1175/2008JCLI2414.1
journal fristpage	331
journal lastpage	345
tree	Journal of Climate:;2009:;volume( 022 ):;issue: 002
contenttype	Fulltext

YaBeSH Engineering and Technology Library

Archive