Previous: Information gathering on the WWW
Up: Information gathering on the WWW
Next: Example
Previous Page: Information gathering on the WWW
Next Page: Example

Algorithm

The algorithm is basically breadth-first searching. The difference is that IICA evaluates gathered pages and decides which anchor to access next. we show the algorithm as follows.

step1

Receive a set of keywords, starting URL address, scope of reasoning context and number of pages to gathered from the user.

step2

Match the keywords with terms in the ontology and list up terms relevant to the within the scope.

step3

If the specified URL address exists in the close-list, search the page from the archive. Otherwise, retrieve the page by accessing HTTP.

step4

If the number of pages is greater than the limit, exit the procedure. Otherwise, go to step5.

step5

Parse the gathered page to extract URL addresses and labels in anchors and titles. If the addresses already exist in the open-list and close-list, discard them. Otherwise, add them to the open-list.

step6

IF the terms listed up at step2 are included in the labels, score the labels using ontology. Otherwise, remove the label and the addresses from the open-list. Then Sort the open-list.

step7

If there is no anchor in the page, pick up a URL address from the open-list. Then Go to step3.

Figure 4 shows an example of gathering pages on the WWW using the ontology-based method.

mitiak-i@aist-mandara-net
Tue Jul 30 14:26:54 JST 1996