AUTOMATED GENERALIZATION OF GEOGRAPHIC INFORMATION

John W. N. van Smaalen, PhD Candidate


Dept. of Surveying and Remote Sensing,  Wageningen Agricultural University

Office: Hesselink van Suchtelenweg 6

Snail mail: P.O. Box 339, 6700 AH Wageningen, The Netherlands

Email:  [email protected]

GENERALIZATION IN A GIS CONTEXT

Geographic information consists of data with a spatial aspect, mostly related to the earth's surface. A characteristic of this information is that it normally consists of a thematic and a geometric or spatial component. The information is usually gathered, structured, and stored with a certain purpose in mind.
Optimizing data for a certain use often means that it is badly accessible-, or even useless for another purpose. One of the properties related to the use is the desired level of detail. A dataset can be meant to provide as much detail to its user as possible. For another user this amount of data can be overwhelming, making it impossible to get an overview. A solution to this problem is in generalizing the data. As Beard [Beard '91] states:

'A typical objective of generalization is to capture the essential character of some phenomena and remove unnecessary spatial and attribute detail.'

But which information is essential and which unnecessary? Just like the structuring of the original data this depends strongly on the purpose. Consequently the generalization strategy to be followed is also dependent on the use of the generalized dataset.

When looking at the various generalization processes distinguished by different authors, we find that most tend to divide generalization into two distinct parts.

'Geographical information abstraction mainly concerns managing geographical meaning in databases, and map generalization mainly concerns structuring map presentations. For these reasons, it is convenient and useful to separate geographical information abstraction and map generalization.' [Nyerges '91]

Nyerges' distinction roughly agrees with the distinction in conceptual generalization (= information abstraction) and cartographic generalization (= map generalization) that will be used in the research presented here (Figure 1). Thus the process of generalization can be divided into:

Conceptual generalization: What must be expressed to fulfil the purpose of the generalization.
Cartographic generalization: What can be expressed, taking into account graphical limitations (pixel size, ability to see).

Cartographic generalization is scale-dependent, conceptual generalization is not.

Figure 1: Conceptual vs. Cartographic generalization

In cartography generalization is mostly done by hand, interactively. Although organizations involved in the production of geographic and cartographic material are highly interested in automation of the generalization process there is not a complete solution yet. So far, the emphasis has been on the graphic representation instead of the underlying semantic structure.

THE RESEARCH TOPIC

The topic of this research project is the conceptual aspect of geographical generalization, i.e. data abstraction.
Objectives for conceptual generalization or information abstraction are:
- to create a new (temporary) database to perform analyses at a higher aggregation level than the original data,
- to create a new database to use as a basis for cartographic or map generalization in order to produce soft- or hard-copy maps.

The research will concentrate primarily on the thematic and topological issues. Both topology and thematics contain semantic information, the semantics of the data will therefore play a central role. The objective is to develop a framework for conceptual generalization as well as a working prototype. The proposed method will not try to imitate the generalization actions of human professionals. In the first place it would be very difficult, if not impossible, to include all considerations taken into account by human cartographers. Besides that, although proven more than usable human interpretations are not always very consistent. In this research it is avoided to speak about generalization in terms of scale (i.e. "generalizing from scale 1 : 1.000 to 1 : 25.000"). Instead, thematic issues and aggregation hierarchies will completely direct the generalization process.

Table 1.

	Terminology used (refers to the representation of reality in a model, not 	reality itself).

	'object class' 		(employees)

	'object'  		(employee Peter Peterson)

	'attribute'    		(salary of employees)

	'attribute value'   	($4000 salary of Peter Peterson)

	'relationship type' 	(employees works at departments)

	'relationship' 		(Peter Peterson works at Sales)

The strategy of the research is based on the definition of a powerful data structure to describe the basic information, see [Richardson '93]. Elementary terrain features and their mutual relationships are described using geometric and thematic components. The data will be stored in vector format, structured according to the Formal Data Structure [Molenaar '89], an object based model (for terminology used see Table 1). At this time, the value of the object-oriented approach for many applications is widely recognised. Also for generalization:

'An object-oriented approach is a good framework for implementing a prototype generalization system. It allows to correctly identify objects, their attributes and their behaviors.' [Kilpela-nen '92], [Armstrong & Bennett '90]

A characteristic of the object model is that not only attributes, but also functions can be assigned to the object class(es). In generalization this means that generalization functions, applicable for the geographical feature classes (rivers, roads, fields etc.), can be stored with the object class description.

Conceptual generalization based on a combination of classification hierarchies (top-down), topological/spatial analysis (bottom-up) and rules seems to be the most promising approach towards a more powerful system. Furthermore it is important to be sure whether the main aim is:

spatial abstraction, which can, but does not necessarily has to lead to thematic abstraction. It can even lead to a more complex thematic description if all information of the original units is represented in an aggregate unit.
thematic abstraction, which will mostly lead to spatial abstraction; but again, not necessarily.
temporal abstraction; this will not be further discussed as it is regarded less relevant to the (topographic) data used in the case study.

In case the aim is a combination of both thematic and spatial abstraction one has to be alert to assure that the thematic abstraction will not be a submissive issue. Therefore special attention should be given to the behavior of attributes when generalizing.

The system described is primarily being developed to generalize detailed cadastral and topographic data but is designed to provide a more generic approach. It consists of two clearly distinguishable parts (Figure 2):

Figure 2. Functional description of the system.

PHASE 1

Phase 1 consists of initially setting up the system. This is done by what is here called an 'expert user', a user who is familiar with both generalization/classification theory and the data to be generalized. The datasets used are typical examples of the type of dataset for which the system should be initialized. For example typical examples of detailed topographic datasets.

Figure 3. Example of a semantic network including generalization rules.

The system will provide a graphic interface to define, visualize and change class relationships. Figure 3 gives an example of such an interface. Using this interface the 'expert user' builds a semantic network containing the relationships between the object classes. Examples of these relationships are:

'is-a' relationships, referring to class generalization (Figure 4); the combination of several classes to a more general superclass.
'part-of' relationships, referring to aggregation (Figure 4); grouping multiple individuals to a new (complex) object [Frank & Egenhofer '88].
other, more loosely defined relationships; often referred to as associations.

Figure 4. Operations used in conceptual generalization.

Generalization rules also refer to one or more object classes and can therefore be included in the same structure.
Figure 3 also shows that not only object classes with a spatial representation (building, parcel) but also other object classes (resident, address) are included because these can also play a role in the generalization process. An example is the aggregation of residential areas based on the income of the residents.

In order to define the semantic network the 'expert user' is assisted by procedures to automatically derive as much information as possible from the (type of) dataset used. The attribute structure (common attributes) of the objects will be used to derive initial classification hierarchies. The attributes values will be used to provide measures of similarity for different object classes. If possible the spatial attributes (geometry, topology) will also be used for these purposes.

By implementing classification- and other relationships as a semantic network flexibility is assured. Apart from the semantic network, describing the relationships between the object classes, a network will be constructed containing the (elementary) generalization rules and operations and their mutual dependencies. This rule network is linked to the semantic network through the object classes to which the rules apply (Figure 3). The order in which the elementary generalization operations should be executed is considered essential and will be included in the rule network.

Resumed: in phase 1 is dealt with the definition of generic relationships at class level, unlike phase 2 which deals with specific relationships between the actual objects. In phase 2 the individual objects are known and located, the relationships will therefore generally have a more spatial character.

PHASE 2

In phase 2 the actual generalization process is carried out. A particular (spatial) situation leads to the application of generalization rules defined in phase 1. In this case not the class is addressed but the objects belonging to that class. It is now possible to address individual objects. Spatially related (e.g. neighboring) objects can be aggregated based on classification hierarchies (attribute structure) or class extension (attribute values).

The dataset to be generalized is stratified in order to select the generalization rules applicable to each area. An inference mechanism (rule based reasoning) will be used for the execution of the generalization rules.

User interaction will be limited to the absolute minimum in the second phase. After adjusting the required parameters applying to the rules, the generalization process is executed without interruption. Often used sets of parameters could be stored as predefined 'profiles', like the user-defined settings of a word processor.

ACKNOWLEDGMENTS

The research project described in this paper is currently taking place at Wageningen Agricultural University, funded by the Dutch 'Kadaster' and 'Meetkundige Dienst Rijkswaterstaat'.

REFERENCES

[Armstrong & Bennett '90] A knowledge based object-oriented approach to cartographic generalization GIS/LIS.
[ Beard] Constraints on rule formation, Map Generalization: Making rules for knowledge representation.
Frank & Egenhofer '88] Object-Oriented Database Technology for GIS: Seminar Workbook. San Antonio, Texas.
[Kilpela-nen '92] Multiple representations and knowledge-based generalization of topographic data ISPRS, Washington.
[Molenaar '89] Single Valued Vector Maps; a concept in Geo Information Systems. Geo-Informationssysteme 2(1).
[Nyerges '91] Representing geographical meaning, Map Generalization: Making rules for knowledge representation.
[Richardson '93] Automated Spatial and Thematic Generalization Using a Context Transformation Model. Ph.D. Dissertation, Wageningen Agricultural University, R&B Publications, Ottawa, Canada.