contributor author | Duong, Thai | |
contributor author | Nguyen | |
contributor author | Nguyen, Thinh | |
date accessioned | 2017-05-09T01:27:03Z | |
date available | 2017-05-09T01:27:03Z | |
date issued | 2016 | |
identifier issn | 0022-0434 | |
identifier other | ds_138_06_061009.pdf | |
identifier uri | http://yetl.yabesh.ir/yetl/handle/yetl/160690 | |
description abstract | Markov decision process (MDP) is a wellknown framework for devising the optimal decisionmaking strategies under uncertainty. Typically, the decision maker assumes a stationary environment which is characterized by a timeinvariant transition probability matrix. However, in many realworld scenarios, this assumption is not justified, thus the optimal strategy might not provide the expected performance. In this paper, we study the performance of the classic value iteration algorithm for solving an MDP problem under nonstationary environments. Specifically, the nonstationary environment is modeled as a sequence of timevariant transition probability matrices governed by an adiabatic evolution inspired from quantum mechanics. We characterize the performance of the value iteration algorithm subject to the rate of change of the underlying environment. The performance is measured in terms of the convergence rate to the optimal average reward. We show two examples of queuing systems that make use of our analysis framework. | |
publisher | The American Society of Mechanical Engineers (ASME) | |
title | Adiabatic Markov Decision Process: Convergence of Value Iteration Algorithm | |
type | Journal Paper | |
journal volume | 138 | |
journal issue | 6 | |
journal title | Journal of Dynamic Systems, Measurement, and Control | |
identifier doi | 10.1115/1.4032875 | |
journal fristpage | 61009 | |
journal lastpage | 61009 | |
identifier eissn | 1528-9028 | |
tree | Journal of Dynamic Systems, Measurement, and Control:;2016:;volume( 138 ):;issue: 006 | |
contenttype | Fulltext | |