Digital Gazetteer Information Exchange (DGIE)

Proposal to NSF, Program 98-121 (Digital Government), Directorate for Computer & Information Science and Engineering

Workshop Proposal

Requested amount: $50,001. Duration: one year; Requested starting date: 6/1/99

Principal Investigator: Linda L. Hill, UCSB. Co-PI: Michael F. Goodchild, UCSB

Please cite as

Summary

The development of interchangeable sets of geographic name data (gazetteers) could result in a major improvement in seamless access to and use of a wide variety of information resources through indirect geospatial referencing. It is proposed to convene a workshop on this topic as a follow-on to the Distributed Geolibraries workshop held June 15-16, 1998, hosted by the National Research Council. The workshop will (1) develop an understanding of the potential of indirect spatial referencing of information resources through geographic names and (2) identify the research and policy issues associated with the development of digital gazetteer information exchange. A report summarizing the workshop and a proposal for research and development funding will be the products. The report will present the current status of gazetteer building and availability, potential user applications of digital gazetteers, research issues, and reflect the range of discussions during the workshop. The workshop will involve principal producers and users of gazetteer data in the U.S. and other countries; participants will also include experts of various library, data center, and private sector groups who have related interests.

 

Background

Many types of information have reference to specific places on the Earthís surface. They include reports about the environmental status of regions, photographs of landscapes, images of the Earth from space, census and economic statistics, guidebooks to major cities, municipal plans, and even sounds and pieces of music. All of these (and many more) are examples of information that are georeferenced because they have some form of geographic footprint (adapted from Mapping Science Committee, 1999 forthcoming). A prevalent form of georeferencing is through place or feature names, frequently found in bibliographic publications, indexes, and catalogs. Those who seek to identify information relevant to their activities often need to do so by reference to a particular spatial location, which they are likely to be able to describe by a geographic name. An example query is "Find all information relevant to fish and wildlife studies about the Cottonwood Creek study area." The user, in this case, would like to find relevant items that are labeled with or contain the phrase "Cottonwood Creek study area" (e.g., reports and papers) and also those that are about the area but donít specifically mention the place name (e.g., aerial photos and remote sensing images). This form of indirect geospatial referencing is supported through the use of gazetteers.

A gazetteer is a list of geographic names, together with their geographic locations and other descriptive information. A geographic name is a proper name for a geographic place or feature, such as Santa Barbara County, Mount Washington, and St. Francis Hospital. Imprecise areas such as Southern California can be included. Names such as Abbeville 30x60 Minute Topographic Quadrangle, Grand Fort Tejon Earthquake Epicenter, and Habitat of the Red-legged Frog are also legitimate gazetteer entries because they name identifiable geographic locations.

The essential elements of a gazetteer entry are (1) name, (2) footprint, and (3) type or category. With these three key attributes, a gazetteer supports several functions of an information retrieval system:

  1. It answers the "Where is" question (for example, "Where is Santa Barbara?") by showing the location on a map.
  2. It translates between geographic names and locations so that a user of the information system can find collection objects through matching the footprint of a geographic name to the footprints of the collection objects. For example, "What aerial photographs cover parts of Santa Barbara County?"
  3. It allows a user to locate particular types of geographic features in a designated area. For example, the user can draw a box around an area on a map and find the schools, hospitals, lakes, or volcanoes in the area.

There is remarkable diversity in approaches to the description of geographic places and no standardization beyond authoritative sources for the geographic names themselves. Among the gazetteer products that need to be integrated are the products of the U.S. Board on Geographic Names, the name authority files of the Library of Congress, the geographic name sets created by indexing and abstracting services, the gazetteer products of other nations and international bodies, and the various sources of spatially-defined geographic names such as digital mapping, environmental research, and commercial gazetteer products. The goal of Digital Gazetteer Information Exchange (DGIE) is to enable the interchangeable use all of this data while documenting the original source, authority, and accuracy of the data for appropriate use and evaluation.

 

Potential uses of spatially-defined gazetteers are being illustrated by digital library projects (e.g., those at Berkeley <http://elib.cs.berkeley.edu/> and Santa Barbara <http://legacy.alexandria.ucsb.edu>), government activities such as NASAís EOSDIS, and within the biodiversity community. There is already an awareness of the vast potential of extending spatial referencing to library catalogs and online bibliographic files. One indexing and abstracting service, the AGIís GeoRef (American Geological Institute, 1998), has already implemented spatially-defined geographic names linked to the indexing of their database and the place names in the GeoRef Thesaurus (American Geological Institute, 1997). A result of the National Research Council (NRC) Distributed Geolibraries workshop, June 15-16, 1998 (Project BESR-U-97-04-A) was the identification of gazetteers as a key component of geolibraries (Mapping Science Committee, 1999 forthcoming).

 

The Alexandria Digital Library (ADL) <http://legacy.alexandria.ucsb.edu> has been engaged in major gazetteer development since the beginning of the DLI-1 funding period. A paper describing this development was recently published electronically in D-Lib (Hill, Frew, & Zheng, 1999). The ADL Implementation Team has combined the two major U.S. federal government gazetteers into one gazetteer containing nearly 6 million entries. The ADL Implementation Team has developed a Gazetteer Content Standard <http://legacy.alexandria.ucsb.edu/gazetteer/> and a Feature Type Thesaurus <http://legacy.alexandria.ucsb.edu/gazetteer/FeatureTypes/FTT_metadata.htm> and is reloading the U.S. federal gazetteer data to this new model. It has added additional gazetteer data pertaining to earthquakes, volcanoes, topographic map quadrangles, and political areas. Where possible, it has added spatial footprints that show the extent of the feature rather than just a representative point location. This gazetteer is one of two major collections in the Alexandria Digital Library and is accessed along with the ADL Catalog to answer both the "where is" and the "whatís there" questions cited above.

 

A meeting was held at the U.S. Geological Survey on December 11, 1998 to discuss gazetteer information exchange among federal government agencies. Linda Hill, PI on this proposal, organized the meeting. A report of this meeting can be found at <legacy.alexandria.ucsb.edu/~lhill/dgie/DGIE_report.htm>. There were 22 attendees from 8 government agencies (U.S. Geological Survey, National Imagery and Mapping Agency, National Aeronautics and Space Agency, National Oceanic and Atmospheric Administration, Census, National Park Service, Library of Congress, & Smithsonian), one professional association (American Geological Institute), and one university (University of California, Santa Barbara). The group agreed to call themselves the Digital Gazetteer Information Exchange (DGIE) planning group and to discuss proposals to advance the development of shareable gazetteer data in support of government information services. There have been two subsequent meetings with a smaller number of participants (some participating remotely by phone) to discuss details of how to proceed. The minutes of the last meeting on January 28, 1999 are at http://legacy.alexandria.ucsb.edu/~lhill/dgie/DGIE_minutes_012899.htm.

The Alexandria Digital Library Project was supported by NSF, DARPA, and NASA under NSF IR94-11330.

 

Technical Challenges

The gazetteer building experience of the Alexandria Digital Library (University of California, Santa Barbara) demonstrated both the value of online gazetteers in digital libraries and their current limitations as spatial identification and retrieval tools. Specifically, they identified the following criteria for integrating gazetteers into digital libraries:

  1. Content Standard: There is a need for a standard conceptual schema for gazetteer information, so that this information may be more easily created and shared. There are many sources of spatially referenced geographic names, but they are mostly for specific purposes only and not designed to be interoperable or shareable. The Content Standard must provide for the documentation of provenance, accuracy of geospatial coordinates, variant naming, and historical changes and include descriptive information, links to related features, associated feature data, and the like.
  2. Feature Types: There is a need for shared feature type schemes to categorize individual features for shared gazetteers. These schemes need to be hierarchical (e.g., "lakes" IsTypeOf "hydrographic features"), rich in term variants (i.e., synonymous terms for the same categories), and extensible to accommodate greater depth in terminology where needed. To be practical, schemes designed to facilitate sharing need to incorporate variant forms of terminology from established feature type schemes so that they can provide mappings between the various schemes. A basic research issue for type schemes (in general, for all thesauri) is how to translate between them or from the many type schemes from particular applications to type schemes designed to be generally applicable.
  3. Temporal aspects: Geographic names, their footprints, their relationships to other places, and their associated descriptive elements all change through time. Gazetteers must therefore incorporate temporal ranges for this data. Since these date ranges are often inexact or estimated, this uncertainty must also be represented.
  4. Fuzzy footprints: Since the extent of a geographic feature is often approximate or ill-defined (e.g., Southern California), there is a need for rules and methods of elicitation by which these fuzzy boundaries and locations are derived and presented to users.
  5. Quality aspects: Several aspects of gazetteer data quality need to be addressed. One is how to indicate the accuracy of latitude and longitude data. Another is the need to ensure that the reported coordinates agree with the other elements of the description. In general, data quality checks should be built in wherever possible for all data elements.
  6. Spatial extents: Many currently available gazetteers contain point locations only, often derived as by-products of map production. Points do not represent the extent of the geographic locations and are therefore only minimally useful. Bounding boxes, while sufficient for many search purposes, often misrepresent the feature by including too much territory (for example, the bounding box for California also includes Nevada). In general, there is a need to represent the spatial extents of gazetteer entries with more bounding boxes and detailed boundaries. Establishing the standards that will enable the sharing of gazetteer data will help harvest spatial extent data from many sources, but ultimately deriving spatial locations and extents from digital mapping products and other sources automatically will be needed.
  7. Frequent changes: Geographic names, their footprints, and the data and relationships associated with them are constantly changing, leading to maintenance and documentation issues.
  8. Merging information from different sources: Information about a particular gazetteer entry can be obtained from multiple sources. Footprints, description, name forms, geographic data, etc. from various sources are best combined into one record rather than separate records, with attribution for the source of each piece of information.
  9. Computational issues: The computational issues of gazetteers are largely those of a georeferenced digital library: efficient indexing and searching of large, distributed datasets; an effective combination of spatial and textual search processes; and the presentation and evaluation of spatially-referenced collection objects (i.e., gazetteer entries). The other issues listed here are in addition to the fundamental data processing considerations.
  10. Multilingual and text encoding issues: Digital gazetteers and associated toponymic exchange standards must accommodate an extended Roman alphabet and non-Roman writing systems to reflect accurate place name spellings. Selection and implementation of eight-bit (ISO 8859) and 16-bit (ISO 10646/Unicode) present issues of cross-platform compatibility, information storage and processing, and availability of fonts required for text display.

 

Policy Challenges

  1. How to formalize standard representational formats for gazetteer data for information exchange. Should a formal standards process be used and if so what is the best mechanism?
  2. What initiatives are needed to develop or compile interchangeable gazetteer sets with footprints that represent the extent of the feature (as opposed to a point only)?
  3. What standards are needed to support gazetteer services over the web that will translate between geographic names and footprints?
  4. Copyright, intellectual property, and warranty issues.

 

Action

A Digital Gazetteer Information Exchange (DGIE) Workshop will be held in the Fall of 1999 (between October 1 and November 18) in Washington D.C., with an expected attendance of 50-60 participants. The Smithsonian has agreed to host the workshop. It will be coordinated by a Steering Committee who will plan the workshop, determine the list of invitees, steer the workshop activities, write the workshop report, and prepare the proposal for follow-on research and development of DGIE. The following have agreed to serve on the Steering Committee (the PI and co-PI and names marked by an asterisk are confirmed):

 

Selection of participants

Workshop participants will be selected through a combination of invitation and open call. The following represents our analysis of the interests needed for effective discussion. We anticipate participation from government agencies, academia, and the private sector; representation from overseas (limited to no more than 25% of the total participants); and representation of diverse cultural and under-represented groups, to include:

 

Expected Impacts and Benefits of Workshop

  1. Clarification of the concept of spatially-defined gazetteers and their relationship to information services (in particular, digital government information dissemination).
  2. Facilitation of greater collaboration and synergy between various participants in this area - an interest area not previously identified as such.
  3. Increased awareness of the potential of spatially-defined gazetteers among a much wider audience than the current limited gazetteer community, including digital libraries and digital Earth information systems.
  4. Conceptual overview of Digital Gazetteer Information Systems/Services, identifying the requirements for software development, standards and protocols, data, and basic research.
  5. A major proposal for research under the Digital Government Initiative program that will address substantive needs of agencies, partner with one or more agencies, increase access to Federal data, and provide basic research results in the areas of intelligent information integration and cost-effective acquisition, integration, viewing, and using large sets of spatially referenced place and feature name data sets.
  6. Stimulus to researchers from computer and information science to work on problems associated with gazetteers.

 

Reports

The summary report resulting from the effort proposed shall be prepared and published electronically. Reports will be made available to the public without restrictions. All reports will be reviewed to help assure the highest research and technical standards. Reviewers will be asked to indicate whether: (1) the report is clear and concise, (2) its arguments and conclusions appear to rest on adequate data properly represented, (3) uncertainties and divergent viewpoints are recognized, (4) policy matters are handled appropriately, (5) the report reveals or suggests bias, and (6) the report seems to be complete, fair, and responsive to its charge.

A proposal for research and development funding will be developed by the Steering Committee based on the outcomes of the workshop.

References

American Geological Institute. (1997). GeoRef Thesaurus. Alexandria, VA: American Geological Institute.

Hill, L. L., Frew, J., & Zheng, Q. (1999). Geographic names: The implementation of a gazetteer in a georeferenced digital library. D-Lib (January 1999). http://www.dlib.org/dlib/.

Mapping Science Committee. (1999 forthcoming). Distributed Geolibraries: Spatial Information Resources: Report of a Workshop. Washington, DC: National Academies Press, 1999.