Zip Code Caveat: Bias Due to Spatiotemporal Mismatches Between Zip Codes and US Census–Defined Geographic Areas—The Public Health Disparities Geocoding Project

2002 
Use of zip codes in US public health research is on the rise. As of February 2002, 230 articles were indexed by zip code in PubMed,1 all published since 1989. Fifty-two of these articles (23%) involved the use of censusderived zip code socioeconomic data (e.g., median household income) to investigate the effects of socioeconomic position on specified health outcomes (article citations are available on request from the authors). To date, discussions regarding the use of zip code socioeconomic data for US public health research have focused chiefly on whether zip codes' larger population size (average: 30 000) and potentially greater socioeconomic heterogeneity would attenuate estimates of socioeconomic gradients in health detected using zip codes in comparison with estimates obtained via census tract (average population: 4000) or block group (average population: 1000) socioeconomic data.2–7 Unacknowledged in the public health literature, however, is the fact that zip codes differ from census tracts and block groups in other important ways, including spatiotemporal definition and stability. Unlike census tracts, defined by the US Bureau of the Census as “small, relatively permanent statistical subdivision[s] of a county . . . designed to be relatively homogeneous with respect to population characteristics, economic status, and living conditions,”8(ppG-10–G-11) zip codes are “administrative units established by the United States Postal Service . . . for the most efficient delivery of mail, and therefore generally do not respect political or census statistical area boundaries.”9(pA-13) Spanning in size from a single building or company with a high volume of mail to large areas that cut across states, “carrier routes for one zip code may intertwine with those of one or more zip codes” such that “this area is more conceptual than geographic.”10 To “overcome the difficulties in precisely defining the land area covered by each zip code,”11 the US Census Bureau created a new statistical entity built from census blocks: the 5-digit zip code tabulation area (ZCTA), first used in the 2000 census.12 Of note, ZCTAs and zip codes sharing the same 5-digit code may not necessarily cover the same area (Table 1 ▶),13 so that zip codes obtained via self-report or from addresses in medical records cannot be assumed to correspond to census-defined ZCTAs. TABLE 1 —Technical Definitions of and Distinctions Between Zip Codes and Zip Code Tabulation Areas (ZCTAs) Even before introduction of the ZCTAs, there were 2 types of spatiotemporal discontinuity that could conceivably affect health studies linking zip codes to census-derived data: (1) changes in zip code delivery routes—and hence in population covered by the affected zip code—and (2) discontinuation and addition of zip codes in nondecennial years.14–16 Between 1997 and 2001 alone, the US Post Office added approximately 390 new zip codes nationwide and discontinued 120 (oral communication, Meg Ausman, US Post Office Data Center, February 5, 2002). One implication of these changes is that persons could be correctly geocoded to a zip code that did not exist in the preceding decennial census. Findings from the Public Health Disparities Geocoding Project17 illustrate the potential problems for health research of spatiotemporal zip code–census mismatches, even those dating from before the creation of ZCTAs. This project was designed to assess which area-based socioeconomic measures at which levels of geography (census tract, block group, and zip code) are most appropriate for monitoring socioeconomic inequalities in health. Health data from 2 states (Massachusetts and Rhode Island) and the 1990 census were used. Records were geocoded in 1999 by a firm whose accuracy we ascertained to be high (96%),18 and the firm, following standard practice, returned the most recent geocodes available. Cancer incidence rates were one of the health outcomes addressed. We found that in Massachusetts (474 zip codes listed in the 1990 census), 17 376 (10.4%) of the 166 730 cancer cases occurring during 1987 to 1993 were geocoded to 193 zip codes not included in the 1990 census; 15 774 (90.8%) of these 17 376 cases were in one of 30 zip codes changed or established after the 1990 census.19–21 By contrast, in Rhode Island (70 zip codes listed in the 1990 census), only 0.7% (148) of the 19 766 geocoded cancer incidence records were matched to zip codes not included in the 1990 census. In the case of colon cancer incidence in Massachusetts, moreover, the impact of excluding persons linked to zip codes not included in the 1990 census was substantial. Zip code–level analyses yielded socioeconomic gradients contrary to those observed via data at the tract and block group levels and contrary to those reported in the literature (Tables 2–4 ▶ ▶ ▶).22 TABLE 2 —Incident Colon Cancer Counts by Geographic Level: Massachusetts, 1987–1993 TABLE 3 —Colon Cancer Incidence Rates, Stratified by Area-Based Socioeconomic Measures, Among Persons in Areas With the Least and Most Resources, Along With Age-Adjusted Comparisons (Incidence Rate Ratio and Relative Index of Inequality): Massachusetts, ... TABLE 4 —Area-Based Socioeconomic Measures and Cutpoints Used in Data Analysis Given the growing interest in linking geographic and health data,23,24 we urge researchers, when using geocoded records, to pay careful attention to the potential for spatiotemporal mismatches between censusderived and zip code data as well as to changes in zip code boundaries and years in which boundaries were established. Public health projects and programs that use zip code data should likewise be alert to potential new issues stemming from the replacement of zip codes with ZCTAs in the 2000 census.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    282
    Citations
    NaN
    KQI
    []