Essentials of Geographic Information Systems Chapter 5 Geospatial Data Management PDF Free Download

Essentials of Geographic Information Systems Chapter 5 Geospatial Data Management PDF Download

Chapter Data Management Every user of data has experienced the challenge of obtaining , organizing , storing , sharing , and visualizing their data The variety of formats and data structures , as well as the disparate quality , of data can result in a dizzying accumulation of useful and useless pieces of spatially explicit information that must be poked , prodded , and wrangled into a single , This chapter addresses the basic concerns related to data acquisition and management of the various formats and qualities of data currently available for use in modern geographic information system ( GIS ) projects . URL books 102

Geographic Data Acquisition LEARNING OBJECTIVE . The objective of this section is to introduce different data types , measurement scales , and data capture methods . Acquiring geographic data is an important factor in any geographic information system ( GIS ) effort . It has been estimated that data acquisition typically consumes 60 to 80 percent of the time and money spent on any given project . Therefore , care must be taken to ensure that GIS projects remain mindful of their stated goals so the collection of spatial data proceeds in an efficient and effective manner as possible . This chapter outlines the many forms and sources of data available for use in a GIS . Data Types The type of data that we employ to help us understand a given entity is determined by ( what we are examining , what we want to know about that entity , and ( our ability to measure that entity at a desired scale . The most common types of data available for use in a GIS are alphanumeric strings , numbers , values , dates , and . An alphanumeric string , or text , data type is any simple combination of letters and numbers that may or may not form coherent words . The number data type can be as either or integer . A is any data value that contains decimal digits , while an integer is any data value that does not contain decimal digits . Integers can be short or long depending on the amount of significant digits in that number . Also , they are based on the concept of the bit in a computer . As you may recall , a bit is the most basic unit of information in a computer and stores values in one of two states or . Therefore , an attribute would consist of eight 15 or os in any combination ( 11100111 ) Short integers are values and therefore can be used to characterize numbers ranging either from to or from to depending on whether the number is signed or unsigned ( URL books 103

contains a or sign ) integers , alternatively , are values and therefore can characterize numbers ranging either from to or from to . A single precision value occupies 32 bits , like the long integer . However , this data type provides for a value of up to bits to the left of the decimal ( a maximum value of 128 , or 127 if signed ) and up to values to the right of the decimal point ( approximately decimal digits ) A double precision value essentially stores two values as a single value . Double precision , then , can represent a value with up to 11 bits to the left of the decimal point and values with up to 52 bits to the right of the decimal ( approximately 16 decimal digits ) Figure Double Precision ( Value ) as Stored in a Computer ) Figure Double , Value ) as Stored in ( Exponent Fraction sign ( 11 bit ) 52 bit ) in I A A A 63 52 , date , and binary values are less complex . values are simply those values that are deemed true or false based on the application of a operator such as AND , OR , and NOT . The date data type is presumably , while the binary data type represents attributes whose values are either or . Measurement Scale In addition to data by type , a measurement scale acts to group data according to level of complexity ( Stevens 1946 ) For the purposes of GIS analyses , measurement scales can be grouped in to two general categories . Nominal and ordinal data represent categorical data interval and ratio data represent numeric data . URL books ) a 104

The most simple data measurement scale is the nominal , or named , scale . The nominal scale makes statements about what to call data points but does not allow for scalar comparisons between one object and another . For example , the attribution of nominal information to a set of points that represent cities will describe whether the given locale is Los Angeles or New However , no further , such as population or voting history , can be made about those locales . Other examples of nominal data include last name , eye color , type , ethnicity , and gender . Ordinal data places attribute information into ranks and therefore yields more precisely scaled information than nominal data . Ordinal data describes the position in which data occur , such as , as it second , third , and so forth . These scales may also take on names , such as very , unsatisfied , satisfied , and very Although this measurement scale indicates the ranking of each data point relative to other data points , the ordinal scale does not explicitly denote the exact quantitative difference between these rankings . For example , if an ordinal attribute represents which runner came in first , second , or third place , it does not state by how much time the winning runner beat the second place runner . Therefore , one can not undertake arithmetic operations with ordinal data . Only sequence is explicit . A measurement scale that does allow precise quantitative statements to be made about attributes is interval data . Interval data are measured along a scale in which each position is equidistant to one another . Elevation and temperature readings are common representations of interval data . For example , it can be determined through this scale that 30 is warmer than 25 . A notable property of the interval scale is that zero is not a meaningful value in the sense that zero does not represent nothingness , or the absence of a value . Indeed , does not indicate that no temperature exists . Similarly , an elevation of feet does not indicate a lack of elevation rather , it indicates mean sea level . Ratio data are similar to the interval measurement scale however , it is based around a meaningful zero value . Population density is an example of ratio data whereby a population density indicates that no people live in the area of interest . Similarly , the Kelvin temperature scale is a ratio scale as does imply that no heat ( temperature ) is measurable within the given attribute . URL books 105

to numeric , data values also can be considered to be discrete or continuous . Discrete data are those that maintain a number of possible values , while continuous data can be represented by an number of values . For example , the number of mature trees on a small property will necessarily be between one and one hundred ( for argument sake ) However , the height of those trees represents a continuous data value as there are an number of potential values ( one tree may be 20 feet tall , feet , or feet , feet , and so forth ) Primary Data Capture Now that we have a sense of the different data types and measurement scales available for use in a GIS , we must direct our thoughts to how this data can be acquired . Primary data capture is a direct data acquisition methodology that is usually associated with some type of effort . In the case of vector data , directly captured data commonly comes from a global positioning system ( or other types of surveying equipment such as a total station ( Figure Unit ( left ) and Total Station ( right ) Total stations are specialized , primary data capture instruments that combine a theodolite ( or transit ) which measures horizontal and vertical angles , with a tool to measure the slope distance from the unit to an observed point . Use of a total station allows crews to quickly and accurately derive the topography for a particular landscape . Figure Unit ( left ) and ( right ) In the case of , handheld units access positional data from satellites and log the information for subsequent retrieval . A network of navigation satellites is situated around the globe and URL books ) 106

provides precise coordinate information for any point on the earth surface ( Figure Earth Imaging Satellite Capturing Primary Data ) Maintaining a line of sight to four or more of these satellites provides the user with reasonably accurate location information . These locations can be collected as individual points or can be linked together to form lines or depending on user preference . Attribute data such as type , telephone pole number , and river name can be simultaneously entered by the user . This location and attribute data can then be uploaded to the GIS for visualization . Depending on the make and model , this upload often requires some type of intermediate file conversion via software provided by the manufacturer of the unit . However , there are some free online resources that can convert data from one format to another . is an example of such an online resource ( In addition to the typical unit shown in Figure Unit ( left ) and Total Station ( right ) is becoming increasingly incorporated into other new technologies . For example , smartphones now embed capabilities as a standard technological component . These units maintain comparable accuracy to similarly priced units and are largely responsible for a renaissance in facilitating portable , data capture and sharing to the masses . The ubiquity of this technology led to a proliferation of data acquisition alternatives . is a data collection method whereby users contribute freely to building spatial databases . This rapidly expanding methodology is utilized in such applications as application , Google Earth , Bing Maps , and . Raster data obtained via direct capture comes more commonly from remotely sensed sources ( Figure Earth Imaging Satellite Capturing Primary Data ) Remotely sensed data offers the advantage of the need for physical access to the area being imaged . In addition , huge tracts of land can be characterized with little to no additional time and labor by the researcher . On the other hand , validation is required for remotely sensed data to ensure that the sensor is not only operating correctly but properly calibrated to collect the desired information . Satellites and aerial cameras provide the most ubiquitous sources of raster data ( Chapter Data Models for GIS , Section Satellite Imagery ) URL books 107

Figure Secondary Data Capture Secondary data capture is an indirect methodology that utilizes the vast amount of existing data available in both digital and formats . Prior to initiating any GIS effort , it is always wise to mine online resources for existing GIS data that may fulfill your mapping needs without the potentially intensive step of creating the data from scratch . Such digital GIS data are available from a variety of sources including international agencies ( United Nations , World Bank , etc . federal governments ( NASA , US Census , etc . state governments ( URL books 108

Data Center , MARIS , GIS Resources , etc . local governments ( SAN DAG , etc . university websites ( UCLA , Duke , Stanford , University of Chicago , Indiana Spatial Data Portal , etc . and commercial websites ( These secondary data are available in a wide assortment of types , and sizes but is to be used in most GIS software packages . Often these data are free , but many sites will charge a fee for access to the proprietary information they have developed . Although these data sources are all cases where the information has been converted to digital format and properly projected for use in a GIS , there is also a great deal of spatial information that can be gleaned from existing , sources . Paper maps , for example , may contain current or historic information on a locale that can not be found in digital format . In this case , the process of digitization can be used to create digital files from the original paper copy . Three primary methods exist for digitizing spatial information two are manual , and one is automated . Tablet digitizing is a manual data capture method whereby a user enters coordinate information into a computer through the use of a digitizing tablet and a digitizing puck . To begin , a paper map is secured to a digitizing tablet . The backlight allows all features on the map to be easily observed , which reduces eyestrain . The coordinates of the point , line , or polygon features on the paper map are then entered into a digital as the user employs a puck , which is similar to a mouse with a , to click their way around the vertices of each desired feature . The resulting digital file will need to be properly following completion of the digitization task to ensure that this information will properly align with existing . digitizing , the second manual data capture method , is referred to as digitizing . digitizing can be used on either paper maps or existing digital files . In the case of a paper map , the map must first be scanned into the computer at a high enough resolution that will allow all pertinent features to be resolved . Second , the image must be registered so the map will conform to an existing coordinate system . To do this , the user can enter control points on the screen and transform , or , the scanned image into real world coordinates . Finally , the user simply zooms to specific areas on the map and traces the points , lines , similar to the tablet digitization example . URL books 109

digitizing is particularly simple when existing GIS , satellite images , or aerial photographs are used as a baseline . For example , if a user plans to digitize the boundary of a lake as seen from a satellite image , the steps of scanning and registering can be skipped , and projection information from the originating image can simply be copied over to the digitized . The third , automated method of secondary data capture requires the user to scan a paper map and the information therein . This method typically requires a software package that can convert a raster scan to vector lines . This requires a very , clean scan . If the image is not clean , all the imperfections on the map will likely be converted to false in the digital version . If a clean scan is not available , it is often faster to use a manual digitization methodology . Regardless , this method is much quicker than the aforementioned manual methods and may be the best option if multiple maps must be digitized if time is a limiting factor . Often , a semiautomatic approach is employed whereby a map is scanned and , followed by a digitizing session to edit and repair any errors that occurred during automation . The secondary data capture method worth noting is the use of information from reports and documents . Via this method , one enters information from reports and documents into the attribute table of an existing , digital GIS file that contains all the pertinent points , lines , and . For example , new information to census tracts may become available following a study . The GIS user simply needs to download the existing GIS of census tracts and begin entering the study document information directly into the attribute table . If the data tables are available digitally , the use of the join and relate functions in a GIS ( Section Joins and Relates ) are often extremely helpful as they will automate much of the data entry effort . KEY TAKEAWAYS The most common types of data available for use in a GIS are alphanumeric strings , numbers , values , dates , and . Nominal and ordinal data represent categorical data , while interval and ratio data represent numeric data . Data capture are derived from either primary or secondary sources . URL books 110

EXERCISES . The following data are derived from which measurement scale ?

a . My happiness score on a scale of to 10 . My weight 192 . The city I live in Culver City My current body temperature The number of cheeseburgers I can eat before passing out 12 My license plate number Describe at least two different methods for adding the information from a topographic map to your GIS . Stevens , 1946 . On the Theory of Scales of Measurement Science 103 ( 2684 ) URL books 111

Database Management LEARNING OBJECTIVE . The objective of this section is to understand the basic properties of a relational database management system . A database is a structured collection of data . A database management system ( is a software package that allows for the creation , storage , maintenance , manipulation , and retrieval of large that are distributed over one or more files . A and its associated functions are usually accessed through commercial software packages such as Access , Oracle , Pro , or . Database management normally refers to the management of tabular data in row and column format and is frequently used for personal , business , government , and scientific endeavors . database management systems , alternatively , include the functionality of a but also contain geographic information about each data point such as identity , location , shape , and orientation . Integrating this geographic information with the tabular attribute data of a classical provide users with powerful tools to visualize and answer the spatially explicit questions that arise in an increasingly technological society . Several types of database models exist , such as the , hierarchical , network , and relational models ( 1995 Jackson 1999 ) A database is essentially a spreadsheet whereby all data are stored in a single , large table ( Figure Flat Database ) A hierarchical database is also a fairly simple model that organizes data into a association across levels ( Figure Hierarchical Database ) Common examples of this model include phylogenetic trees for of plants and animals and familial genealogical trees showing relationships . Network databases are similar to hierarchical databases , however , because they also support relationships ( Figure Network Database ) This expanded capability allows greater search within the and reduces potential redundancy of information . Alternatively , both the hierarchical and network models can become incredibly complex depending on the size of the databases and the number of interactions between the data points . Modern URL books 112

geographic information system ( GIS ) software typically employs a fourth model referred to as a relational database ( 1970 ) Figure ( a ) Flat File Figure ( Hierarchical Name Group Occupation 1000 Model 1000 Chef 1000 Chef 1001 Musician 1001 Musician 1001 Librarian Occupation Musician URL boo 113

Figure ( Network Relational Database Management Systems A relational database management system ( is a collection of tables that are connected in such a way that that data can be accessed without reorganization of the tables . The tables are created such that each column represents a particular attribute ( soil type , PIN number , last name , acreage ) and each row contains a unique instance of data for that columnar attribute ( Delhi Sands Soils , 5555 , Smith , acres ) In the relational model , each table ( not surprisingly called a relation ) is linked to each other table via predetermined keys ( Date 1995 ) The primary key represents the attribute ( column ) whose value uniquely a particular record ( row ) in the relation ( table ) The primary key may not contain missing values as multiple missing values would represent nonunique entities that violate the basic rule of the primary key . The primary key corresponds to an identical attribute in a secondary table ( and possibly third , fourth , fifth , etc . called a foreign key . This results in all the information in the table being directly related to the information in the second table via the primary and foreign keys , hence the term relational . With these links in place , tables within the database can be kept very simple , resulting in minimal computation time and file complexity . This process can be repeated over many tables as long as each contains a foreign key that corresponds to another table primary key . URL books 114

The relational model has two primary advantages over the other database models described earlier . First , each table can now be separately prepared , maintained , and edited . This is particularly useful when one considers the potentially huge size of many of today modern databases . Second , the tables may be maintained separately until the need for a particular query or analysis calls for the tables to be related . This creates a large degree of efficiency for processing of information within a given database . It may become apparent to the reader that there is great potential for redundancy in this model as each table must contain an attribute that corresponds to an attribute in every other related table . Therefore , redundancy must actively be monitored and managed in a . To accomplish this , a set of rules called normal forms have been developed ( 1970 ) There are three basic normal forms . The normal form ( Figure First Normal Form Violation ( above ) and Fix ( below ) refers to five conditions that must be met ( Date 1995 ) They are as follows . There is no sequence to the ordering of the rows . There is no sequence to the ordering of the columns . Each row is unique . Every cell contains one and only one value . All values in a column pertain to the same subject . URL books 115

Figure ( Employee Table Em 100654 100375 Violation 100164 a USS Table Employee Department Table 100654 100654 100375 100375 100164 100375 Denotes Key 100164 The second normal form states that any column that is not a primary key must be dependent on the primary key . This reduces redundancy by eliminating the potential for multiple primary keys throughout multiple tables . This step often involves the creation of new tables to maintain normalization . URL books 115

Figure ( I ( and Fix ( below ) Employee Department Table 100654 100375 100375 100164 Employee Department Table 100654 100375 100375 100164 The third normal form states that all keys must depend on the primary key , while the primary key remains independent of all keys . This form was wittily summed up by Kent ( 1983 ) who that all keys must provide a fact about the key , the whole key , and nothing but the Echoing this quote is the rejoinder so help me ( personal communication with 1989 ) URL books ( DE Sales Sales Accounting Marketing Violation Depends on primary key Department Table Sales Accounting Marketing Denotes Key

Figure ' Department Table ' Sales 100152 Accounting 100987 Marketing 100026 Department Table Employee Table Sales 100152 100654 Accounting 100987 100375 Marketing 100026 100164 100152 Denotes Key Depends on Primary Key Depends on Key 100026 Joins and Relates An additional advantage of an is that it allows attribute data in separate tables to be linked in a post hoc fashion . The two operations commonly used to accomplish this are the join and relate . The join operation the of one table into a second table through the use of an attribute or that is common to both tables . This is commonly utilized to combine attribute information from one or more data tables ( information taken from reports or documents ) with a spatially explicit GIS feature layer . A second type of join combines feature information based on spatial location and association rather than on common attributes . In , three types of spatial joins are available . Users URL books 118

may ( match each feature to the closest feature , match each feature to the feature that it is part of , or ( match each feature to the feature that it intersects . Alternatively , the relate operation temporarily associates two map layers or tables while keeping them physically separate . Relates are bidirectional , so data can be accessed from the one of the tables by selecting records in the other table . The relate operation also allows for the association of three or more tables , if necessary . Sometimes it can be unclear as to which operation one should use . As a general rule , joins are most suitable for instances involving or relationships . Joins are also advantageous due to the fact that the data from the two tables are readily observable in the single output table . The use of relates , on the other hand , are suitable for all table relationships ( and ) however , they can slow down computer access time if the tables are particularly large or spread out over remote locations . KEY TAKEAWAYS Database management systems can be flat , hierarchical , network , or relational . Relational database management systems ( utilize primary keys and foreign keys to link data tables . The model reduces data redundancy by employing three basic normal forms . EXERCISE . Identify the three violations of normal forms in the following table . Instructor Class Class Number Enrollment Lennon Advanced Calculus 10073 34 Introductory Physical Education 10045 23 Harrison Auto Repair and Feminism 10045 54 URL books 119

Instructor Class Class Number Enrollment Starr , Best Quantum Physics 10023 39 for Computing Machinery 13 ( for Computing Machinery 13 ( Association for Computing and Machinery . 26 ( URL books , 1995 . GIS A Computing Perspective . London Taylor Francis . Date , 1995 . An Introduction to Database Systems . Reading , MA . Date , 1995 . An Introduction to Database Systems . Reading , MA . Jackson , 1999 . Thirty Years ( and More ) of Databases . Information and Software Technology 41 . 1970 . A Relational Model of Data for Large Shared Data Banks . Communications ofthe Association , 1970 . A Relational Model of Data for Large Shared Data Banks . Communications ofthe Association Kent , 1983 . A Simple Guide to Five Formal Forms in Relational Database Theory . Communications ofthe 120

File Formats LEARNING OBJECTIVE . The objective of this section is to overview a sample of the most common types of vector , raster , and hybrid file formats . data are stored in many different file formats . Each geographic information system ( GIS ) software package , and each version of these software packages , supports different formats . This is true for both vector and raster data . Although several of the more common formats are summarized here , many other formats exist for use in various GIS programs . Vector File Formats The most common vector file format is the . developed by in the early for use with the III database management software package in , are simple , developed to store the geometric location and attribute information of geographic features . are incapable of storing null values , as well as or network features . Field names within the attribute table are limited to ten characters , and each can represent only point , line , or polygon feature sets . Supported data types are limited to point , integer , date , and text . are supported by almost all commercial and GIS software . Despite being called a , this format is actually a compilation of many different . Table File Types lists and describes the different file formats associated with the . Among those listed , only the , and formats are mandatory to create a functioning , while all others are conditionally required . As a general rule , the names for each file should conform to the convention when using older versions of GIS software packages . According to this convention , the can contain up to eight characters , and the contains three characters . The more recent GIS software packages have relaxed this requirement and will accept longer . Table File Types URL books 121

Feature geometry Index format for the feature geometry Feature attribute information in IV format Projection information and Spatial index of the features and spatial index of the features AIN and Attribute information for active fields in the table index for index for with format Attribute index used in and later Metadata in format Code page specifications for identifying character encoding Indicates mandatory files The earliest vector format file for use in GIS software packages , which is ill in use today , is the coverage . This file format supports multiple features types ( points , lines , while also storing the topological information associated with those features . Attribute data are stored as multiple files in a separate directory labeled Due to its creation in an environment , these maintain strict naming conventions . File names can not be longer than thirteen characters , can not contain spaces , can not start with a number , and must be completely in lowercase . can not be edited in or later versions of software package . The US Census Bureau maintains a type of referred to as TIGER Line ( Integrated Geographic Encoding and Referencing system ) Although these do not contain actual census information , they map features such as census tracts , roads , railroads , buildings , rivers , and other features that support and improve the bureau and improve the Bureau ability to 8217 ability to collect census information . released in 1990 , are explicit and are linked to the Census Bureau Master Address File ( URL books 122

therefore enabling the of street addresses . These files are free to the public and can be freely downloaded from private vendors that support the format . Drawing Interchange Format or Drawing Exchange Format ) is a proprietary vector file format developed by to allow interchange between CAD ( design ) software and other mapping software packages . files were originally released in 1982 with the purpose of providing an exact representation of native format . Although the is still commonly used , newer versions of have incorporated more complex data types ( regions , dynamic blocks ) that are not supported in the format . Therefore , it may be presumed that the format may become less popular in analysis over time . Finally , the US Geological Survey ( maintains an vector file format that details physical and cultural features across the United States . These explicit ( Digital Line Graphics ) come in , and depending on whether they are derived from , or , ooo , topographic quadrangle maps . The features available in the different types depend on the scale of the but generally include data such as administrative and political boundaries , hydrography , transportation systems , hypsography , and land cover . Vector data can also be structured to represent surface elevation information . A TIN ( Triangulated Irregular Network ) is an vector data structure that uses contiguous , nonoverlapping triangles to represent geographic surfaces ( Figure Triangulated Irregular Network ( TIN ) Whereas the raster depiction of a surface represents elevation as an average value over the spatial extent of the individual pixel ( see Section Raster File Formats ) the TIN data structure models each vertex of the triangle as an exact elevation value at a specific point on the earth . The arcs between each vertex are an approximation of the elevation between two vertices . These arcs are then into triangles from which information on elevation , slope , aspect , and surface area can be derived across the entire extent of the models space . Note that term irregular in the name of the data model refers to the fact that the vertices are typically laid out in a scattered fashion . URL books 123

Figure The use of TINs confers certain advantages over elevation models ( see Section Raster File Formats ) First , linear topographic features are very accurately represented relative to their raster counterpart . Second , a comparatively small number of data points are needed to represent a surface , so file sizes are typically much smaller . This is particularly true as vertices can be clustered in areas where relief is complex and can be sparse in areas where relief is simple . Third , specific elevation data can be incorporated into the data model in a post hoc fashion via the placement of additional vertices if the original is deemed insufficient or inadequate . Finally , certain spatial statistics can be calculated that URL books 124

can not be obtained when using a elevation model , such as plain delineation , storage capacity curves for reservoirs , and curves for . Raster File Formats A multitude of raster format types are available for use in GIS . The selection of raster formats has dramatically increased with the widespread availability of imagery from digital cameras , video recorders , satellites , and so forth . Raster imagery is typically ( 256 colors ) or ( 16 million colors ) Due to ongoing technological advancements , raster image sizes have been getting larger and larger . To deal with this potential constraint , two types of compression are commonly used lossless and . Lossless compression reduces size without decreasing image quality . compression attempts to exploit limitations of the human eye by removing information from the image that can not be sensed . As you may guess , compression results in smaller sizes than lossless compression . Among the most common raster used on the web are the PEG , TIFF , and formats , all of which are open source and can be used with most GIS software packages . The PEG ( Joint Photographic Experts Group ) Tagged Image File Format ) raster formats are most frequently used by digital cameras to store values for each of the red , blue , and green colors spaces ( and sometimes colors , in the case of TIFF images ) PEGs support compression , while TIFFs can be either or lossless . Unlike PEG , TIFF images can be saved in either or color ( Portable Network Graphics ) are images that support either or lossless compression . files are designed for efficient viewing in such as Internet Explorer , and Safari . Native PEG , TIFF , and files do not have information associated with them and therefore can not be used in any mapping efforts . In order to employ these in a GIS , a world must be created . A world file is a separate , data file that specifies the locations and transformations that allow the image to be projected into a standard coordinate system ( Universal Transverse or State Plane ) The of the world is based on the name of the raster file , while a is typically added into to the file extension . The world extension name for a JPEG is for a TIFF , it is and for a , URL books 125

An example of a raster file format with explicit information is the proprietary ( Seamless Image Database ) format . This lossless compression format was developed by , for use with large aerial photographs or satellite images , whereby portions of a compressed image can be viewed quickly without having to decompress the entire file . The format is frequently used for visualizing . Like , the proprietary ( Enhanced Compression Wavelet ) format also includes information within the file structure . This compression format was developed by Earth Resource Mapping and supports up to 255 layers of image information . Due to the potentially huge file sizes associated with an image that supports so many layers , represent an excellent option for performing rapid analysis on large images while using a relatively small amount of the computer RAM ( Random Access Memory ) thus accelerating computation speed . Like the , Digital Raster Graphics ) are scanned versions of topographic maps and include all of the collar material from the originals . The information found within the images is , to the coordinate system . These graphics are scanned at a minimum of 250 ( dots per inch ) and therefore have a spatial resolution of approximately meters . contain up to thirteen colors and therefore may look slightly different from the originals . In addition , they include all the collar material from the original print version , are to the surface of the earth , fit the Universal Transverse ( projection , and are most likely based on the data points ( NAD stands for North American Datum ) Like the TIN vector format , some raster file formats are developed explicitly for modeling elevation . These include the DEM , and formats . DEM ( US Geological Survey Digital Elevation Model ) is a popular file format due to widespread availability , the simplicity of the model , and the extensive software support for the format . Each pixel value in these DEMs denotes spot elevations on the ground , usually in feet or meters . Care must be taken when using DEMs due to the enormous volume of data that accompanies these as the spatial extent covered in the image begins to increase . DEMs are referred to as digital terrain models ( when they represent a simple , model and URL books 125

as digital surface models ( when they include the heights of landscape features such as buildings and trees ( Figure Digital Surface Model ( left ) and Digital Terrain Model ( right ) Figure Digital Model ( left ) and Digital Terrain Model ( DEMs can be classified into one of four levels of quality ( labeled to ) depending on its source data and resolution . This source data can be , or , topographic . The DEM format is a single file of ASCII text comprised of three data blocks A , and The A block contains header information such as data origin , type , and measurement systems . The block contains contiguous elevation data described as a integer . The block contains trailer information such as square ( error of the scene . The DEM format has recently been succeeded by the ( Spatial Data Transfer Standard ) The format was developed as a distribution format for transferring data from one computer to another with zero data loss . The ( Digital Terrain Elevation Data ) format is another elevation raster file format . It was developed in the for military purposes such as line of sight analysis , visualization , and mission planning . The format maintains three levels of data over five different latitudinal zones . Level URL books ) 127

data has a resolution of approximately 900 meters Level data has a resolution of approximately 90 meters and Level data has a resolution of approximately 30 meters . Hybrid File Formats A is a recently developed , proprietary format that supports both vector and raster feature ( points , lines , annotation , JPEG , TIFF ) within a single . This format maintains topological relationships and is stored as an file . The was developed to be a comprehensive model for representing and modeling information . There are three different types of . The personal was developed for editing , whereby two editors can not work on the same at a given time . The personal employs the Access file format and maintains a size limit of gigabytes per , although it has been noted that performance begins to degrade after file size approaches 250 megabytes . The personal is currently being phased out by and is therefore not used for new data creation . The file similarly allows only editing , but this restriction applies only to unique feature within a . The file incorporates new tools such as domains ( rules applied to attributes ) groups of objects with a feature class or table ) and merge policies ( rules to control and the output of split and merge operations ) This format stores information as binary files with a size limit of terabyte and has been noted to perform and scale much more than the personal ( approximately of the feature geometry storage required by and personal ) File databases are not tied to any relational database management system and can be employed on both Windows and platforms . Finally , file can be compressed to formats that further reduce size without subsequently reducing performance . The third hybrid format is the , which allows multiple editors to simultaneously work on feature within a single ( Like the file , this format can be employed on both Windows and platforms . File size is limited to gigabytes and its URL books 128

proprietary nature requires an or license for use . The is implemented on the Server Express software package , which is a free platform developed by . In addition to the , Adobe Systems Incorporated ( Portable Document Format ) is an format that allows for the representation of geometric entities such as points , lines , and . can be used to and mark coordinate pairs , measure distances , files , and raster images . This format is particularly useful as the is widely accepted to be the preferred standard for printable web documents . Although functionally similar , the should not be confused with the format developed by Technologies . Rather , the is a branded version of the . Finally , Google Earth supports a new , hybrid file format referred to as a ( Keyhole Markup Language ) files associate points , lines , images , models , and so forth , with a longitude and latitude value , as well as other view information such as tilt , heading , altitude , and so forth . are commonly encountered , and they are zipped versions . KEY TAKEAWAYS Common vector file formats used in applications include , and . Common raster file formats used in applications include , DEMs , and . Common hybrid file formats used in applications include ( personal , file , and ) and . EXERCISES . If you were a city planner tasked with creating a GIS database for mapping features throughout the city , would you prefer using a or a ?

What are the advantages and disadvantages of using either of these formats ?

URL books 129 . Search the web and create a list of that contain working files for each of the raster and vector formats discussed in this section . 2010 . What is ?

Data Quality LEARNING OBJECTIVE . The objective of this section is to ascertain the different types of error inherent in . Not all data are created equally . Data quality refers to the ability of a given to satisfy the objective for which it was created . With the voluminous amounts of data being created and served to the cartographic community , care must be taken by individual geographic information system ( GIS ) users to ensure that the data employed for their project is suitable for the task at hand . Two primary attributes characterize data quality . Accuracy describes how close a measurement is to its actual value and is often expressed as a probability ( 80 percent of all points are within meters of their true locations ) Precision refers to the variance of a value when repeated measurements are taken . A watch may be correct to ! of a second ( precise ) but may be 30 minutes slow ( not accurate ) As you can see in Figure Accuracy and Precision , the blue darts are both precise and accurate , while the red darts are precise but inaccurate . Figure URL books

Several types of error can arise when accuracy precision requirements are not met during data capture and creation . Positional accuracy is the probability of a feature being within units of either its true location on earth ( absolute positional accuracy ) or its location in relation to other mapped features ( relative positional accuracy ) For example , it could be said that a particular mapping effort may result in 95 percent of trees being mapped to within feet for their true location ( absolute ) or 95 percent of trees are mapped to within feet of their location as observed on a digital ortho quarter quadrangle ( relative ) Speaking about absolute positional error does beg the question , however , of what exactly is the true location of an object ?

As discussed in Chapter Map Anatomy , differing conceptions of the earth shape has led to a plethora of projections , data points , and , each attempting to clarify positional errors for particular locations on the earth . To begin addressing this unanswerable question , the US National Map Accuracy Standard ( or ) suggests that to meet horizontal accuracy requirements , a paper map is expected to have no more than 10 percent of measurable points fall outside the accuracy values range shown in Figure Relation between Positional Error and Scale . Similarly , the vertical accuracy of no more than 10 percent of elevations on a contour map shall be in error of more than the contour interval . Any map that does not meet these horizontal and vertical accuracy standards will be deemed unacceptable for publication . Figure Relation I ) lU ( Error ( Scale Horizontal Accuracy Examples Engineering Scale National Map of Accuracy Standard feet feet ' feet feet feet feet ' feet mile feet feet URL books 131

Positional errors arise via multiple sources . The process of digitizing paper maps commonly introduces such inaccuracies . Errors can arise while registering the map on the digitizing board . A paper map can shrink , stretch , or tear over time , changing the dimensions of the scene . Input errors created from hastily digitized points are common . Finally , converting between coordinate systems and transforming between data points may also introduce errors to the . The square ( error is frequently used to evaluate the degree of inaccuracy in a digitized map . This statistic measures the deviation between the actual ( true ) and estimated ( digitized ) locations of the control points . Figure Potential Digitization Error illustrates the inaccuracies of lines representing soil types that result from input control point location errors . By applying an error calculation to the , one could determine the accuracy of the digitized map and thus determine its suitability for inclusion in a given study . Figure Potential Digitization Error Positional errors can also arise when features to be mapped are inherently vague . Take the example of a wetland ( Figure Defining a Wetland Boundary ) What a wetland boundary ?

Wetlands are determined by a combination of hydrologic , vegetative , and edaphic factors . Although the US Army Corps of Engineers is currently responsible for the boundary of wetlands throughout the country , this task is not as simple as it may seem . In particular , regional differences URL books 132

in the characteristics of a wetland make delineating these features particularly troublesome . For example , the of a wetland boundary for the riverine wetlands in the eastern United States , where water is abundant , is often useless when delineating similar types of wetlands in the desert southwest United States . Indeed , the complexity and confusion associated with the conception of what a wetland is may result in the feature in the , which subsequently leads to positional accuracy errors in the GIS database . Figure Defining a Boundary URL books ) 133

In addition to positional accuracy , attribute accuracy is a common source of error in a GIS . Attribute errors can occur when an incorrect value is recorded within the attribute or when a is missing a value . Misspelled words and other typographical errors are common as well . Similarly , a common inaccuracy occurs when developers enter in an attribute when the value is actually This is common in count data where would represent zero , while a null would represent a locale where no data collection effort was undertaken . In the case of categorical values , inaccuracies occasionally occur when attributes are mislabeled . For example , a map may list a polygon as agricultural when it is , in fact , This is particularly true if the is out of date , which leads us to our next source of error . Temporal accuracy addresses the age or timeliness of a . No is ever completely current . In the time it takes to create the , it has already become outdated . Regardless , there are several dates to be aware of while using a . These dates should be found within the metadata . The publication date will tell you when the was created or released . The date relates the date and time the data was collected . If the contains any future prediction , there should also be a forecast period or date . To address temporal accuracy , many undergo a regular data update regimen . For example , the California Department of Fish and Game updates its sensitive species databases on a near monthly basis as new are continually being made . It is important to ensure that , as an , you are constantly using the most data for your GIS application . The fourth type of accuracy in a GIS is logical consistency . Logical consistency requires that the data are correct . For example , does a stream segment of a line fall within the of the corresponding polygon ?

Do roadways connect at nodes ?

Do all the connections and point in the correct direction in a network ?

In regards to the last question , the author was recently using an unnamed smartphone application to navigate a busy city roadway and was twice told to turn the wrong direction down streets . So beware , errors in logical consistency may lead to violations , or worse ! URL books 134

The type of accuracy is data completeness . Comprehensive inclusion of all features within the GIS database is required to ensure accurate mapping results . Simply put , all the data must be present for a to be accurate . Are all of the counties in the state represented ?

Are all of the stream segments included in the river network ?

Is every convenience store listed in the database ?

Are only certain types of convenience stores listed within the database ?

Indeed , incomplete data will inevitably lead to incomplete or insufficient analysis . KEY TAKEAWAYS All data contains error . Accuracy represents how close a measurement is to its actual value , while precision refers to the variance of a value when repeated measurements are taken . The five types of error in a are related to positional accuracy , attribute accuracy , temporal accuracy , logical consistency , and data completeness . EXERCISES . What are the five types of errors associated geographic information ?

Provide an example of each type of error . Per the description of the positional accuracy of wetland boundaries , discuss a map feature whose boundaries are inherently vague and difficult to map . URL books 135

Essentials of Geographic Information Systems Chapter 5 Geospatial Data Management

Subjects

Grade Levels

Resource Type

Essentials of Geographic Information Systems Chapter 5 Geospatial Data Management PDF Download