9 Concluding Remarks



 

In the introduction chapter, we defined our main goals to be: 1) To find a way to efficiently gather and sort information about Web pages into suitable contexts, and 2) To come up with a good user interface for flexible searching inside a context, using our classified and indexed data. We will now see if we have succeeded.
 
 

9.1 The Problems Solved

In Chapter 2.4 we identified and listed seven problems that both information providers and information seekers face on the Web today. We set out to solve as many of them as possible, with emphasis on the first one. Let us see how our system addresses the problems: As we see, the system we suggest provides at least partial solutions to all the main problems we encounter when indexing and searching the Web today. This implies that we have successfully addressed the goal of the thesis.
 
 

9.2 Summary: The Novelties

When the EDDIC system is fully implemented, it will not mean the end of the world to the rest of the Internet search tools. They will still have their use, but EDDIC introduces new possibilities for everyone.

Our index is in theory capable of at all times holding relatively up-to-date information about all pages on the World Wide Web. For storage efficiency reasons, the information held in our index can not be the actual, raw Web pages themselves as in most search engines. Instead we store metadata, that is information describing the location, contents and properties of the Web pages, much like Web directories do. In this way we combine the advantages from both search engines and directories, and add a few advantages on top of that as well. This is how some main aspects of search tools are improved by our system:

Use of the EDDIC tool requires that the users describe the context and subject of the information they want. This is a more natural way to search, and is opposed to the situation today, where people have to provide the search tool with words which they believe can be found on the pages they are interested in.

The core idea in our indexing system is that instead of describing each page by a large number of keywords connected to each other, we classify the page and its contents by connecting it to the context that best describes it, and store only a few of the most important keywords for the page in the actual Web page entry. The context code itself is equipped with many sets of keywords from a number of different ontologies, and these keywords can be used to guide the user to interesting documents.
 
 

9.3 The Next Steps To Take

This report has presented a framework for a new kind of search tool. Several areas must be explored further before an implementation can take place. First and most important, we have to perform a statistical study of the hyperlink structure of the Web, to decide the reliability of using URLs, in combination with traditional text analysis methods, as context indicators. This is necessary to prove that it is possible to create agents capable of assisting human classification personnel in the enormous task of classifying millions of Web pages. This matter can be settled quickly, provided that suitable data is gathered or found and made available for analysis.

If research shows that hypertext links and/or similarities in URLs in many cases can be used as an indication on the type of page and on what kind of material the page contains, the next step is to identify the most important document properties for searching among and filtering Web pages. Suggestions for what these properties may be are listed earlier in this report, but these suggestions are not necessarily the final selection of properties to include in the index. When the properties are identified, a set of codes that shall cover all thinkable Web page material must be constructed. An extension of the Dewey Decimal Classification for covering special modern technology- and electronic commerce-related subjects must probably also be created. At this stage classification and database experts must be consulted.

In parallell with the classification code creation, the work with implementing the software for the agent system can be done, and contracts with the necessary partners may be written. A strategy for the introduction of the new EDDIC tool must be chosen. Lately, most search tool Web sites have not concentrated on finding new and better ways to search the Web, but have added features like stock tickers, translation of Web pages, links to on-line bookstores, horoscopes, electronic versions of yellow pages, thesauruses and dictionaries, news services and weather forecasts. In short, the focus has shifted from improving the search service towards providing entertainment and newspaper-/magazine-like material. Teaming up with a major search site may help the search tool industry start concentrating on how to improve Internet search again. A future goal should be, when technology permits it, to combine the classification system suggested in this report with an index where pages are indexed with their full textual contents. This will result in very flexible and powerful search possibilities. If it is chosen not to cooperate with any existing search site, we must decide what profile our search page shall have.

It is very important that the work begins as soon as possible, as the Web is certainly not waiting for us to catch up. The world needs a better way to organize Web searching. There has not been much development among the search engines and directories lately, except that the indexes have grown a bit and that a lot of strategical partnerships and cross-promotion deals have been agreed upon. The user interface for searching the Web has basically not changed since search tools were introduced. Something needs to be done, and this report has suggested what to do.
 
 

9.4 The Conclusion

Through our report we have shown that by combining theory from library science with agent technology and novel classification techniques, it is possible to create a search tool capable of offering new possibilities for searching the Web. Compared to the search tools currently available for searching the Web, our proposition is that the two main achievements of implementing the system we have described will be:
 
  1. We can build larger high quality indexes, thanks to agent support of manual work.
  2. We can offer more flexible and powerful navagation and search mechanisms, thanks to how we classify and sort Web pages by subject and certain other properties.
While previous search tools often have disappointed and confused many of their users, with this system we may offer our guests a large and elegant search structure with detailed maps and personal, guided tours. With the situation being that the Web keeps growing, our main conclusion is that the challenge of classifying and indexing the Web must be met by an implementation of the system we suggest, starting with taking the next necessary steps as described above, as soon as possible.


Go to: Front page - Index - Ch. 1 - Ch. 2 - Ch. 3 - Ch. 4 - Ch. 5 - Ch. 6 - Ch. 7 - Ch. 8 - Ch. 9 - Glossary - References
Visit the author's homepage : http://www.pvv.org/~bct/
E-mail the author, Bjørn Christian Tørrissen:  bct@pvv.org