Sponsor: Technology Committee
Moderator: Amy Lucker, Librarian, Museum of Fine Arts, Boston
Speakers:
Lee Sorensen, Bibliographer for Art and Reference, Librarian, Lilly
Library, Duke University
Eileen Quam, Foundations Project Coordinator and Metadata Specialist,
State of Minnesota
Paula Hardin, Visual Resources Library, Louisiana State University
Recorder: Marilyn Czerniejewski, Associate Librarian, The Toledo Museum of Art Library, Toledo, Ohio
Amy Lucker introduced the speakers and got the session started with
a brief presentation of some basic vocabulary terms necessary to be familiar
with when exploring search engines for your site, including: Web Search
Engine, Hierarchical Index, Robot and Spider, Meta-Search engine, Multi-media
Search Engines, Query types, Relevance ranking, CGI, Client vs. Server,
AI (Artificial Intelligence):, Conceptual Search Engines, Vortal, and Visual
Result Displays.
Lee Sorensen explained a search engine as an algorithm that can gather information and place it into a holding file for display. He described how search engines work, using a spider to go out and find the information and CGI to translate it for your display. It begins with a search form on your client machine. The engine then scans for the search term across a data file, placing the information into a result file on the client. Engines harvest periodically. Placing information into an index file speeds retrieval.
Sorensen distinguished between the working of a local and an external search engine. Locally, indexing is done according to file location in a directory, rather than by file links, therefore it would not be affected by broken links as is possible with an external search. It does not harvest dynamic data such as CGI results.
Sorensen said that designing pages with a generic title would result in a higher retrieval ranking. Also meta descriptors and keywords should be created using terms that searchers would be likely to use. Repetition is also useful, e.g., include both art libraries and art librarians.
He briefly discussed cases when web sites do not need a search engine. A search engine is needed when:
Eileen Quam's presentation covered an actual extensive implementation of the Ultraseek search engine and use of metadata on Minnesota's gateway site to environmental information called "Bridges." Metadata was defines as data about data, or the story that tells the story. Dublin Core metadata is one application of metadata. (Others are TEI, SGML, GIS, GILS, XML.) It uses 15 basic elements. The two that Quam believes need the most professionalism to create are subject or keyword, and description. In many search engines, the description replaces what the engine would otherwise retrieve, so what you write is what gets used. Dublin Core was chose for Bridges because it is in world-wide use and is up for an ISO standard. Quam prefers DC data to be imbedded into the page so that it travels with the page.
Quam established standards to be used in the project. She recommends using ISO standards wherever possible. These standards were decided upon: metadata - Dublin Core; controlled vocabulary - LIV-MN; name authority - AACR2/LC; geographic areas - AACR2/LC; dates - ISO 8601. There were others for language, punctuation, capitalization, etc.
XML was described as the successor to HTML. It provides content control of HTML, using a style sheet to separate out instructions for the look of a page. So far, browsers cannot read the style sheet. It allows identifying the type of information by its field/tag type. XML does not have standardization as yet. It may be 3 to 5 years before there is consistent naming.
Quam described the North Star search Content Classification Engine.
It encompasses arts and humanities as well as many other topics. It uses
a Yahoo-style topic hierarchy, with only the top level visible at first,
requiring a click to see deeper levels. It allows search by keyword and
topic, as well as searching within a topic. At first the thesaurus and
the content classification engine were conceived as two separate things.
They were later connected, by making sure that the CCE categories used
terms from the thesaurus. Quam summarized by concluding that metadata is
worth the trouble, and that a CCE is effective, offering multiple ways
of searching.
Paula Hardin presented a discussion and images of innovations in user search interfaces that provide context for the display of results. Visual displays offer serendipitous options and intuitive information transfer not offered by hyperlinks. She selected key points from an excellent analysis of user interfaces by Marti Hearst found in the book Modern Information Retrieval and available on the web. These included a description of options for user interfaces that offer more than the traditional icons. They are: panning and zooming (which displays sibling-type data relationships); magic lenses (which act like filters or the Boolean AND); brushing and linking (e.g., all publishing dates of same year in the same color); focused + context (design options such as size or color to portray context); animation (clicking on a term repositions it); spotlighting (best matches cluster by showing the same color, etc.); Venn diagrams. Taking these innovations to the limit, perhaps the eventual result would be a n OPAC showing an image of the user in front of shelves of books. Clicking on a particular book would bring up a table of contents page. Clicking on an entry there would show the actual content.
After the formal text presentation, Paula Hardin showed images of what the new interfaces looked like. Featured web sites included Plumbdesign's Visual Thesarus, the Smithsonian Without Walls project called "Revealing Things," the hyperbolic tree interface of Inxight.
Respectfully submitted,
Marilyn Czerniejewski/Paula Hardin