Hideaki Takeda's Publication
- R. Ichise, H. Takeda and S. Honiden:
Integrating Multiple Internet Directories by Instance-based Learning, in
Proceedings of the Eighteenth International Joint Conference on
Artificial Intelligence, (IJCAI-03), pp. 22–28 (2003).
(Paper)
Finding desired information on the Internet is becoming
increasingly difficult. Internet directories such as Yahoo!, which organize
web pages into hierarchical categories, provide one solution to this problem;
however, such directories are of limited use because some bias is applied
both in the collection and categorization of pages. We propose a method for
integrating multiple Internet directories by instance-based learning. Our
method provides the mapping of categories in order to transfer documents from
one directory to another, instead of simply merging two directories into one.
We present herein an effective algorithm for determining similar categories
between two directories via a statistical method called the k-statistic. In
order to evaluate the proposed method, we conducted experiments using two
actual Internet directories, Yahoo! and Google. The results show that the
proposed method achieves extensive improvements relative to both the Naive
Bayes and Enhanced Naive Bayes approaches, without any text analysis on
documents.
Hideaki Takeda (National Institute of Informatics)