Session 7
The Construct of Indexing: Thought Processes in Subject Analysis
March 20, 2000
Art Libraries Society of North America 28th Annual Conference, Pittsburgh , PA

Moderator: Ann Whiteside, Visual Resources Librarian, Harvard Design School, Harvard University

Sponsors:
Cataloging Section
Visual Resources Division

Ann Whiteside and Maryly Snow proposed this session in response to a discussion at last year’s conference concerning the thought processes of image catalogers when it comes to subject indexing.  As all of the papers from this session are going to be published in full in a future Art Documentation, only brief summaries of their contents will be presented here.

Ann Whiteside introduced the session by calling attention to the dichotomy that exists in our profession’s approach to subject indexing: some of us do and some of us don’t.  For every collection where subject indexing is considered an integral part of the cataloging process, there  is another  collection where the self-indexing, subject-based arrangement of the material serves the clientele very well.  The digital age is changing this, and now we all face the subjective process of image subject indexing.  Whiteside called attention to some basic questions that face all image indexers:

The papers in this session focused on different projects in different environments and what the future may hold for us all.

Linda McRae, Associate Visual Resources Librarian, College of Fine Arts, University of South Florida, a member of the VRA Data Standards Committee, presented an analysis of  Subject Indexing of Art and Architecture Images in the VISION Project.  After briefly describing the VISION Project, its goals, the number of participants, and the number of records created, McRae focused on the specific question of how subject terms were applied by the VISION participants.  Unlike other categories, “subject” carried no single recommended subject thesaurus for participants to use, but rather 9 suggested online authorities plus an options for local or other lists. The VISION template also allowed 4 repetitions for each field, and indexers were instructed to use one repetition per concept, which could be single words, phrases, or strings.  Each value was to be tagged with a code indicating which thesaurus or vocabulary had been used as an authority.   As the VRA CORE Categories version 2.0 were used for this project, there were two “subject” categories, one for the work and one for the image of the work..  The seeming redundancy of this proved to be useful, particularly for architectural images.

McRae stated that her first intention was to only analyze the responses in the subject fields, but that this led inevitably to consideration of the VISION records as a whole and particularly to the relationships between Work Type and Subject and Visual Document Type and Subject. McRae found many inconsistencies and contradictions.  The values included in Work Type for the same image, such as Maya Lin’s Vietnam War Memorial, differed according to the focus of the institution, with architectural collections using such terms as “architecture,” ”monument,” and “memorial” while more general or art-focused collections used values such as “site-specific sculpture” or “outdoor sculpture”.  Also problematic were the difficulties determining which of the multiple Work Type values went with which of the multiple Subject Values.

 McRae then focused her investigation on the relation of  Work Type to Subject and on differences between two sets of records, those for Architecture (Single Built Works)  and Paintings.  Data values were placed into classes: those for Architecture came from the AAT, but those for Paintings proved to be more difficult, with AAT, Iconclass, and local term lists merging to create the classes.  McRae found that it was difficult to measure the depth of subject indexing in the two defined sets.  There was great depth in Type for single built works, but greater depth in Subject for paintings.  Also, for built works, subject terms were often narrower, more specific forms of the terms used for Type.  McRae then presented an analysis of the data from her two sets, with many interesting observations, among them that subject terms for paintings included far fewer repeated terms from Work Type, and tended to be content-based rather than form-based.

McRae emphasized the differences seen between the application of terms for the two sets, stressing that while the AAT seems to offer an adequate number of appropriate terms for Work Type, it is inadequate for terms pertaining to the subject matter of works of art; and yet no other single thesaurus is satisfactory for that either.  This need could be addressed by publication of well-developed local lists, greater familiarity with the online versions of traditional, but unfamiliar to our constituency, subject authorities such as LCSH, TGM, and ICONCLASS, and  additional training through workshops at our national conferences.  In closing, McRae reiterated the value of the VISION project, our first attempt at national image cataloging, and the wealth of information and insight still to be gained from its close consideration.

Mary Elings, Pictorial Archivist at the Bancroft Library, University of California, Berkeley, presented "Pictorial Archives and EAD: Indexing Collections for Online Access."  Elings began by describing basic features of archival collections, and presented archival picture collections as a specialized sub-set with its own set of unique problems.  As most archival collections work with “collection level” records as the basic organizational element, cataloging and indexing at the “item level”, i.e. the individual photograph, has been less common.

EAD, Encoded Archival Description, is rapidly becoming a standard within the Archival community as more and more institutions attempt to make their collections accessible through online finding aids.  EAD, as Elings described it, is a standardized general markup language (SGML) Document Type Definition (DTD), in essence a template that wraps computer tags around both descriptive and administrative information. EAD can be used to encode arrangement, provenance, and biographical information pertaining to the collection, in addition to a hierarchical description of the contents.  It facilitates varying levels of description, with subject indexing possible at any level. Finding Aids are “inventories, registers, indexes, or guides to collections…”   EAD finding aids can be simple or highly complex in the level of detailed description provided.  In terms of subject indexing,  Elings suggested that a compromise approach is often necessary.  Elings stressed that finding aids themselves have always been “indispensable tools” for providing access to primary source materials in archives.  The adoption of EAD as a standard for the archival community is now making it possible to provide “access to archival finding aids in searchable electronic form”.

Several specific examples of EAD projects were detailed.  The Bancroft Library Collection, http://www.lib.berkeley.edu/BANC/ is one of the most heavily used special collection libraries in the United States.  The Pictorial collections number over 2.5 million images.  These are accessible through MARC records linked to EAD finding aids. Subject indexing is provided primarily at the collection level through the MARC records, using LCSH subject headings, but they are “beginning to test approaches to providing subject indexing within our finding aids themselves.”    EAD case studies detailed in the presentation were drawn from the Online Archive of California, where users can decide at what level they wish to search the collections.  These included the San Francisco Photographs of Carleton Watkins, c.1872-c.1879, which has no controlled access or subject tags; The San Francisco News-Call Bulletin Newspaper photograph archive, over 300,000 negatives, in which a select list of high-interest subject terms, taken primarily from Library of Congress Subject Headings,  was used to apply limited subject indexing, and the Robert B. Honeyman, Jr. Collection of early Californian and Western American Pictorial Material, where detailed item-level cataloging and subject indexing was carried out, making this collection fully searchable from detailed access points.

In summarizing what principles could be drawn from the Online Archive of California experience, Elings made the following observations:

Elings also reiterated that there is no agreed-upon standard for EAD subject indexing, and that finding aids can be very idiosyncratic, even within the same institution.  A cooperative, measured approach is needed, one which probably involves diverse solutions for diverse collections and a fair distribution of limited resources.

Carolyn Beebe, Head of the Digital Library Initiative at North Carolina State University, has been involved in helping the Slide Library at NC State go online and become a digital collection.  This collection serves the teaching needs of several departments, including Architecture, Art and Design, Landscape Architecture, Industrial Design, and Graphic Design.  Only two traditional Art History survey courses are required, and most faculty have their own collections.  The Slide Library collection of about 60,000 therefore serves students more than faculty, and needs to be accessible through many non-traditional approaches.  In conducting a needs assessment for the Slide Library, Beebe found that the structure of the filing system in use, one developed by the AIA for architecture collections, makes it difficult to retrieve materials thematically or typologically, or to correspond to “trends”.

In addition, the administration wants the collection to become a campus-wide resource, particularly in its digital, online form, which would support the very disparate curriculum offered, including textile technology and veterinary medicine.  Beebe sees her role as facilitating the metamorphosis of these discipline-specific collections into a useful online resource for the entire University.

The first problem Beebe addressed was the limited functionality of the existing classification scheme.  The “Related Arts” section, which encompassed virtually everything except Architecture, had grown to the point that it was very difficult for the Art Historians to use.  A new classification scheme, loosely based on the Fogg system, with additions from the AIA scheme, was adopted.  In place of the original
 CHRONOLOGY
  GEOGRAPHY
   BUILDING TYPE
    ARCHITECT
     LOCATION
      NAME OF BUILDING
       VIEWS

organization, the new scheme grouped the collection according to
 MEDIA
  REGION OR CULTURE
   PERIOD
    PLACE OR SITE NAME
     AGENTS
      TYPE OF WORK
       TITLE OF STRUCTURE/WORK
        DATE
         VIEWS.

A second aspect of Beebe’s needs assessment for the NC State collection addressed the problem of content-based image retrieval.  This is an area where much potentially useful and relevant research is being done on theoretical levels.  Beebe presented a survey of several significant trends in this research, including the 1982 Morelli 4x4 pixel matching system, content-based visual matching systems such as QBIC and Virage, GIS systems, MBR (Minimum Bounded Rectangle) relationships, and other image analysis techniques.  Beebe called attention to the problems of applying such theoretical research to really large image collections, like the WWW, and to the need for indexing methods to effectively link image and text.  An additional difficulty lies in the discrepancy between what researchers and developers are comfortable with and what patrons actually know and use.  In most cases they simply do not have the terminology at hand that we assume they have.
Beebe closed by emphasizing that we are in a great position, as a profession, to have an effect on  this emerging technology.

Maryly Snow, Architecture Slide Librarian at the University of California, Berkeley, responded to the papers presented.  She was particularly intrigued with the research trends presented in Beebe’s paper.  She pointed out that a 1984 panel dealt with one of the systems Beebe had mentioned, CUBIC, and wondered how far visual image recognition systems have come in the intervening years.  She also called attention to Beebe’s comments on the GIS relationship vocabulary, and called on us to become more familiar as a profession with research and developments in other areas that could potentially be of great benefit to us.

Snow responded to Elings’ paper on EAD by stating what many of us have mistakenly felt that EAD was something good for the archival community but not particularly applicable to us.  While it may not be applicable to the intensive image-level cataloging we do for the majority of our collections, it would be of great use in dealing with some of our legacy collections, or specialized collections of primary source material.  In addition, the observations Elings made regarding controlled lists, consistency, collaboration, and cooperation in the sharing of resources are applicable to our collections and practices as well.

Snow asked a number of insightful questions regarding Linda McRae’s analysis of the VISION records, including why “Architecture” showed as Work Type in so many records, while its equivalent “Art” did not.  She pointed out that in fact, “Buildings” equate to “Paintings”.  She also queried whether most of the participants only used individual terms, who used indexing strings, and how prevalent was the use of narrower terms as subject terms.  She also pointed out that a large part of this session actually happened in the halls, with lively discussions among participants as to whether subject indexing for architecture-oriented collections is fundamentally different from that for art-oriented collections.  Buildings, she pointed out, have attributes and deal with concepts, but they do not really have “meaning". The terms for parts, materials, components, processes, etc. that constitute the majority of subject indexing for architecture collections, do not seem like subject terms to art catalogers, who are inclined to think of iconography as constituting subject indexing.   In conclusion, Snow stated that, as interesting as the various approaches and topics presented were, they did not address her favorite subject: the thought processes involved in subject indexing and how our physical slide classification systems can help us in that process.

Questions from the floor followed. Liz Okeefe expanded the discussion of genre or object terms versus subject terms and the need for their “equal recognition” within our community.  Martha Mahard agreed with this, and stressed that its “providing additional access” that is important, whether that be subject-oriented or more specific terminology.  Ann Whiteside suggested calling it “subject indexing” is confusing, and that something like “Indexing Terms” would be more widely applicable.  In response to a question concerning the Barage model and how it relates to something like AltaVista searches, Beebe responded that she thinks search engine implementation is less than useful, in that it assumes an understanding of image composition on the part of the searcher, it requires verbal articulation of visual syntax, and there is no correlation allowed between verbal terms and visual syntax and features for refining searches.  The Morelli type of analysis is most useful in a large group of similar images, or perhaps, in identifying a large group of similar images from an even larger heterogeneous group.

Joe Miller, of HW Wilson Co., questioned the correlation between Number of terms assigned and Depth of indexing, postulating that quality or accuracy would be better measures.  Would having the same vocabulary on the site guaranty accuracy unless you had specificity and instruction as to whether the broader or narrower terms were always to be used?  This hinges on the problem of the difference between object attributes and subject attributes.  Snow responded that one of the problems with the VISION project was that there were no guidelines re: specificity, and a lively exchange of comments on -this topic, from both panelists and the floor, followed.

In response to one of the points raised by Snow, regarding the use of indexing strings in the VISON records, McRae responded that the use of indexing strings is not common practice in the Visual Resources Community.  Although she said she had not specifically counted the occurrences of strings in VISION, she thought there were not more than a half-dozen examples, other than the strings  which she would typify as “classification strings”, i.e. Painting,Italy,16thc.  Other than that, almost all the terms used were single terms or very simple phrases.

In response to a final question concerning the work of Morelli, Beebe responded that Morelli actually referred back to an 18-19thc. tool for the classification of paintings.  The computer system developed in 1982 is still the basis of much image recognition research.
She also suggested looking at Yahoo’s Image Surfer implementation, which uses very broad categories.

Time ran out with many other questions still pending.

Respectfully submitted,
Eileen Fry
Indiana University