Adiabatic Markov Decision Process: Convergence of Value Iteration Algorithm

Duong, Thai; Nguyen; Nguyen, Thinh

Source: Journal of Dynamic Systems, Measurement, and Control:;2016:;volume( 138 ):;issue: 006::page 61009

Author:

Duong, Thai

Nguyen

Nguyen, Thinh

DOI: 10.1115/1.4032875

Publisher: The American Society of Mechanical Engineers (ASME)

Abstract: Markov decision process (MDP) is a wellknown framework for devising the optimal decisionmaking strategies under uncertainty. Typically, the decision maker assumes a stationary environment which is characterized by a timeinvariant transition probability matrix. However, in many realworld scenarios, this assumption is not justified, thus the optimal strategy might not provide the expected performance. In this paper, we study the performance of the classic value iteration algorithm for solving an MDP problem under nonstationary environments. Specifically, the nonstationary environment is modeled as a sequence of timevariant transition probability matrices governed by an adiabatic evolution inspired from quantum mechanics. We characterize the performance of the value iteration algorithm subject to the rate of change of the underlying environment. The performance is measured in terms of the convergence rate to the optimal average reward. We show two examples of queuing systems that make use of our analysis framework.

Download: (764.9Kb)
Show Full MetaData Hide Full MetaData
Get RIS
Item Order
Go To Publisher
Price: 5000 Rial
Statistics

Adiabatic Markov Decision Process: Convergence of Value Iteration Algorithm

URI

http://yetl.yabesh.ir/yetl1/handle/yetl/160690

Collections

Journal of Dynamic Systems, Measurement, and Control

contributor author	Duong, Thai
contributor author	Nguyen
contributor author	Nguyen, Thinh
date accessioned	2017-05-09T01:27:03Z
date available	2017-05-09T01:27:03Z
date issued	2016
identifier issn	0022-0434
identifier other	ds_138_06_061009.pdf
identifier uri	http://yetl.yabesh.ir/yetl/handle/yetl/160690
description abstract	Markov decision process (MDP) is a wellknown framework for devising the optimal decisionmaking strategies under uncertainty. Typically, the decision maker assumes a stationary environment which is characterized by a timeinvariant transition probability matrix. However, in many realworld scenarios, this assumption is not justified, thus the optimal strategy might not provide the expected performance. In this paper, we study the performance of the classic value iteration algorithm for solving an MDP problem under nonstationary environments. Specifically, the nonstationary environment is modeled as a sequence of timevariant transition probability matrices governed by an adiabatic evolution inspired from quantum mechanics. We characterize the performance of the value iteration algorithm subject to the rate of change of the underlying environment. The performance is measured in terms of the convergence rate to the optimal average reward. We show two examples of queuing systems that make use of our analysis framework.
publisher	The American Society of Mechanical Engineers (ASME)
title	Adiabatic Markov Decision Process: Convergence of Value Iteration Algorithm
type	Journal Paper
journal volume	138
journal issue	6
journal title	Journal of Dynamic Systems, Measurement, and Control
identifier doi	10.1115/1.4032875
journal fristpage	61009
journal lastpage	61009
identifier eissn	1528-9028
tree	Journal of Dynamic Systems, Measurement, and Control:;2016:;volume( 138 ):;issue: 006
contenttype	Fulltext

YaBeSH Engineering and Technology Library

Archive