Hideaki Takeda's Publication
- Y. Yamashiro, A. Ueno and H. Takeda: Delayed
Reward-Based Gentic Algorithm for Partially Obserable Markov Decision
Problems, Systems and Computers in Japan, Vol. 35, No. 2, pp. 66 --
78 (2004), (Translated from IEICE Trans. Vol. J84-D-I, No.12, December 2001,
pp.1635-1647).
(Paper)
Reinforcement learning often involves assuming Markov
characteristics. However, the agent cannot always observe the environment
completely, and in such cases, different states are observed as the same
state. In this research, the authors develop a Dlayed Reward-based Genetic
Algorithm for POMDP (DRGA) as a means to solve a partially observable Markov
decision problem (POMDP) which has such perceptual aliasing problems. The
DRGA breaks down the POMDP into several subtasks, and then solves the POMDP
by breaking down the agent into several subagents. Each subagent acquires
policies adapted to the environment based on the delayed rewards from the
environment, and these policies are evolved using a genetic algorithm based
on the delayed rewards. The agent adapts to the environment by combining
effective policies that remain after natural selection. The authors apply
this method to maze search problems in which perception is limited in order
to demonstrate its validity.
Hideaki Takeda (National Institute of Informatics)