Hideaki Takeda's Publication

Y. Yamashiro, A. Ueno and H. Takeda: Delayed Reward-Based Gentic Algorithm for Partially Obserable Markov Decision Problems, Systems and Computers in Japan, Vol. 35, No. 2, pp. 66 -- 78 (2004), (Translated from IEICE Trans. Vol. J84-D-I, No.12, December 2001, pp.1635-1647). (Paper)
Reinforcement learning often involves assuming Markov characteristics. However, the agent cannot always observe the environment completely, and in such cases, different states are observed as the same state. In this research, the authors develop a Dlayed Reward-based Genetic Algorithm for POMDP (DRGA) as a means to solve a partially observable Markov decision problem (POMDP) which has such perceptual aliasing problems. The DRGA breaks down the POMDP into several subtasks, and then solves the POMDP by breaking down the agent into several subagents. Each subagent acquires policies adapted to the environment based on the delayed rewards from the environment, and these policies are evolved using a genetic algorithm based on the delayed rewards. The agent adapts to the environment by combining effective policies that remain after natural selection. The authors apply this method to maze search problems in which perception is limited in order to demonstrate its validity.

Hideaki Takeda (National Institute of Informatics)