Commuting Zones and Labor Market Areas - Documentation

Note: An update to the Preliminary 2020 Commuting Zones data product is underway for release in fall 2025. Anticipated refinements to the methodology mean that these delineations are expected to change with the release of the final version.

Commuting Zones (CZs) are geographic units of analysis that are intended to reflect the local economy where people live and work. Commuting Zones allow users to aggregate county-level data to geographic units that more closely reflect functional labor markets. The information on this page documents the creation and use of CZs in several sections:

Background
Scope/Coverage
Methodology
Strengths and Limitations
Data Source
Recommended Citation
References

Background

Commuting Zones (CZs) and Labor Market Areas (LMAs) were developed by the USDA, Economic Research Service (ERS) in the 1980s to facilitate research on how labor markets influence the socioeconomic well-being of workers residing in those markets (Tolbert & Sizer Killian, 1987; Tolbert & Sizer, 1996). A local economy and its labor market are not bounded by the nearest county line but by interrelationships between buyers and sellers of labor. Therefore, geographic areas that captured these cross-county interrelationships were needed to understand the variation in socioeconomic conditions across nonmetropolitan America. Existing regional delineation systems, such as the U.S. Department of Commerce, Bureau of the Census’s (Census Bureau) metropolitan statistical areas and the U.S. Department of Commerce, Bureau of Economic Analysis’s (BEA) Economic Areas, did not adequately capture these interrelationships in rural counties.

The 1980 and 1990 delineations include two geographic levels: commuting zones (CZs) and labor market areas (LMAs). The smaller geographic areas, CZs, are clusters of counties based on intercounty commuting patterns. The larger geographic areas, LMAs, are neighboring commuting zones grouped using commuting patterns to contain at least 100,000 residents. This minimum population level was necessary to acquire specially tabulated public-use microdata samples from the 1980 and 1990 Decennial Censuses that identify the labor markets in which individuals work.

The process used to create the 1980 and 1990 CZs and LMAs included two steps. Step 1 created preliminary delineations using hierarchical cluster analysis based on commuting flows between counties. Step 2 utilized expert opinion to revise the preliminary delineations by placing counties into commuting zones that experts deemed more reasonable. This process resulted in 765 CZs and 382 LMAs in the 1980 delineation, while the 1990 delineation contains 741 CZs and 394 LMAs. The 2000 CZ methodology used only step 1—the hierarchical cluster analysis—of the previous methodology. Furthermore, LMAs were discontinued due to limited usage. The 2000 delineation contains 709 CZs.

ERS did not publish CZs for 2010, but Pennsylvania State University (Penn State) researchers Christopher S. Fowler, Danielle C. Rhubart, and Leif Jensen evaluated the ERS Commuting Zone methodology and created new CZs for 2010, as well as revised CZs for 1990 and 2000, using a consistent methodology (Fowler et al., 2016). This revised methodology only uses hierarchical cluster analysis to identify CZs and includes changes to how the data are processed and to specifications used in the cluster analysis. Fowler later created 2020 Commuting Zones using the same methodology (Fowler, 2024). These commuting zone delineations (as well as source code, research, and documentation) are available on Fowler's website Labor-sheds for Regional Analysis.

The Preliminary 2020 ERS Commuting Zones are based on the consistent methodology developed by Fowler, Rhubart, and Jensen—but are not an exact match due to a minor change made to the underlying data (see Reproduction Script) in the Penn State Commuting Zones and not in the ERS Preliminary 2020 Commuting Zones.

Scope/Coverage

The Preliminary 2020 Commuting Zones (CZs) data product places the 3,222 counties and census-designated county equivalents in the United States and Puerto Rico into 592 CZs. The Island Areas of American Samoa, Guam, the Northern Mariana Islands, and the U.S. Virgin Islands are not included because county-to-county commuting data are not available for these areas.

The 1980, 1990, and 2000 CZ and LMA delineations are available for counties in the 50 U.S. States and Washington, DC. For the 1980 and 1990 delineations, a CZ code is provided for each county indicating its commuting zone. Importantly, there is not a separate LMA code in these releases; each county’s LMA is indicated by the first three digits of the CZ code.

A delineation of 2010 CZs is not currently available from the USDA, ERS. Fowler, Rhubart, and Jensen from Penn State published a 2010 CZ delineation, as well as revised CZs for 1990 and 2000, that all follow a consistent methodology. Fowler later created a 2020 CZ delineation using the same methodology. These delineations (as well as source code, research, and documentation) are available on the Labor-sheds for Regional Analysis page.

Methodology

The Preliminary 2020 Commuting Zones (CZs) data product is based on Fowler et al.'s (2016) revised version of ERS's original methodology. They are nearly identical to the 2020 Commuting Zones created by Fowler (2024).

The 2020 Commuting Zones delineation uses commuting flows data (see the Data Source section) to represent the labor market connection among counties. For every county and census-designated county equivalent, the commuting flows data estimate how many workers reside in County A and commute to County B for work. CZs are created by grouping counties together based on the strength of the commuter flows between them.

A proportional flow metric is used to represent the strength of the commuter flows between counties. The metric is defined as the sum of the number of people commuting from County A to County B and the number of people commuting from County B to County A, divided by the smaller of County A’s and County B’s work forces. Defined mathematically:

where C_ab represents the number of commuters residing in County A and working in County B, C_ba represents the number of commuters residing in County B and working in County A, W_a represents the total work force residing in County A, and W_b represents the total work force residing in County B.

The proportional flow metric is calculated for every pair of county combinations in the United States and Puerto Rico. County pairs with a large number of people commuting between them relative to the size of the smaller county are considered to have a stronger commuting connection. Conversely, county pairs with fewer people commuting between the two counties relative to the size of the smaller county are considered to have a weaker commuting connection. Using the smaller of the two county work forces as the denominator in the proportional flow metric prioritizes grouping smaller work force counties with larger ones.

The proportional flow between each county combination is used to calculate the distance metric (this does not refer to the geographic distance between counties, e.g., miles) in an agglomerative hierarchical cluster analysis. Agglomerative hierarchical cluster analysis is an iterative process that starts with individual counties and gradually groups the counties together into successively larger clusters. This process is implemented using the distance metric between the counties. In our analysis, the distance metric is calculated as 1 minus the proportional flow. The distance metric ranges between 0 and 1, where .001 represents the strongest possible commuting connection between two counties and 1 indicates no commuting connection between two counties. See Fowler (2024) for more details on preparing the data for hierarchical cluster analysis.

The cluster analysis starts by grouping the two counties in the United States with the smallest distance (strongest proportional commuting flow) between them to create the first commuting zone. Once two counties have been grouped into a commuting zone, they act as a single unit for the rest of the clustering process. Next, the distances among the commuting zone and the remaining 3,220 counties are evaluated, and the units with the smallest average distance are combined.

When counties are grouped into commuting zones, the average linkage method is used to calculate the distance between grouped units. This method calculates the average distance of each county in one grouped unit to each county in the other grouped unit. For example, the connection between County C and CZ 1—composed of County A and County B—is defined as the average of the connection between County C and County A and the connection between County C and County B. If the connection of County C to Commuting Zone 1 is less than 0.977, County C is considered to have a strong enough connection to be added to CZ 1.

This process can continue grouping units—counties and/or commuting zones—together until one large cluster is created. However, we want to identify functional labor markets with strong commuting ties so that researchers can use labor markets to analyze economic and social characteristics throughout the Nation. Therefore, we end the clustering process when the connection between two units is too weak for the units to be considered part of the same labor market (i.e., the distance among units is too large for any more combinations to occur).

Following the work of Fowler et al. (2016), we set 0.977 as the distance at which two units are no longer considered to be part of the same labor market. This distance was determined by Fowler et al. (2016) to be the point at which their hierarchical cluster analysis on 1990 counties was the most similar to USDA, ERS’s published 1990 delineation of CZs.

The Preliminary 2020 Commuting Zones are the raw output of this hierarchical cluster analysis. This methodology, and that adopted by Fowler et al. (2016) and Fowler (2024), does not include a second step that was used in the vintage 1980 and 1990 CZs. In this second step, an ERS researcher examined the results of the hierarchical cluster analysis and reassigned any counties that they perceived were placed poorly by the hierarchical cluster analysis. While this use of expert judgement is generally considered to have improved ERS’s previous CZ delineations, there is no documented methodology for how these decisions were made. Therefore, this second step of the original methodology is not replicable. The Final 2020 Commuting Zones will incorporate additional refinements that improve the delineation of labor markets among counties to form the CZs. The methods will be transparently documented so that everything is completely replicable.

Strengths and Limitations

One of the greatest strengths of the Preliminary 202 Commuting Zones (CZs) is that they are designed to identify connections between small, rural counties and more populous and urban counties. The CZs are intended to identify the counties to which rural residents are most likely to commute to work, shop, recreate, or consume other goods and services. They create functional units of observation rather than units that are defined by an arbitrary political boundary, such as counties. Additionally, CZs can be used with a variety of Federal and other data sources by aggregating widely-available county-level data to the CZ unit of analysis.

Another advantage of CZs is that they include all the territory within the United States and Puerto Rico. Conversely, the U.S. Office of Management and Budget’s (OMB) widely-used Core Based Statistical Areas—metropolitan and micropolitan statistical areas—only include counties with strong commuting connections to urban cores with populations of 10,000 or more people. All other counties are considered part of one large noncore area. This delineation results in a significant number of U.S. counties being excluded from the OMB’s labor markets, whereas every county is placed in a labor market in the CZs data product.

While CZs are intended to represent real-world labor markets, previous research on commuting zones does not specifically define the term “labor market”. This lack of a clear definition makes it difficult to evaluate whether CZs successfully represent the concept. Furthermore, the composition of labor markets varies across the United States. Residents in the more densely populated eastern United States may live close enough to multiple relatively large cities—which act as employment centers—to accept a job from multiple commuting destinations. Residents in the more sparsely populated western United States are more likely to live long distances from any large employment center. The CZs data product does not identify counties that may belong to multiple labor markets or those counties that are only weakly connected to a labor market.

Another limitation of the Preliminary 2020 CZs is that, while they generally look reasonable, careful examination reveals that the hierarchical cluster analysis placed several counties into CZs with which they are unlikely to have any significant real-world commuting connection. The most obvious examples of these poorly defined units are noncontiguous CZs where two portions of the same unit are separated by: multiple other CZs, hundreds of miles, many hours of drive-time, or, in one instance, the international border with Canada. Key examples of these poorly defined CZs in the preliminary 2020 data include:

CZ 17: Places Lincoln County, MT in a commuting zone with three Alaskan Census Areas and Boroughs.
CZ 59: Places San Diego County, CA in a commuting zone with three counties near Lake Tahoe, an estimated 9-hour drive.
CZ 375: Places Kaufman County, TX and Van Zandt County, TX—two counties on the Southeastern edge of the Dallas metropolitan area—in a commuting zone associated with Midland, TX, an estimated 6-hour drive to the West that bypasses Dallas.

While it is conceivable that noncontiguous counties could be part of the same labor market, these examples show that the hierarchical cluster analysis assigns some counties to CZs with which they likely have very weak connections and which do not reflect real-world labor markets. These weak connections occur in both contiguous and noncontiguous CZs.

There are several limitations associated with the original CZs methodology that were kept for consistency over time. For example, some methodological decisions had no documented justification other than that the decisions seemed to be reasonable to the researchers at the time. The most significant of these decisions was selecting 0.98 as the maximum average distance between units, at which point the hierarchical cluster analysis stopped grouping counties and commuting zones together. Choosing a higher threshold would have resulted in fewer, larger Commuting Zones and choosing a lower threshold would have resulted in more, smaller Commuting Zones.

The selection of a seemingly arbitrary distance threshold may stem from the fact that the specific labor market concept that CZs were intended to capture was never explicitly defined (see Fowler and Jensen (2020) for more information on defining labor markets). Had the concept been explicitly defined using measurable characteristics, it may have been possible to select a distance threshold—based on specific metrics—that optimized the CZs’ ability to represent the labor market concept. Instead, the distance threshold used in the 2010 and 2020 Commuting Zones (0.977) was selected to optimize the similarity between Fowler, Rhubart, and Jensen's replication of the 1990 CZs and the original CZs published by ERS (Fowler et al., 2016).

Another methodological limitation of the 1980 and 1990 CZ delineations is that they are not replicable. Their creation included a step in which an ERS researcher evaluated the results of the hierarchical cluster analysis and used expert judgement to reassign counties to CZs that the researcher deemed more appropriate. Given the limitations of the hierarchical cluster analysis described above, this step likely improved the final CZ data product by moving counties that were very weakly connected to the commuting zones assigned by the cluster analysis to other commuting zones with which the counties had stronger connections. While this step likely improved the CZs delineation, these decisions were made solely on the basis of expert knowledge, without documenting rules of thumb or guidelines followed to make these decisions. A replicable process is needed to meet current statistical standards and be transparent.

The Preliminary 2020 CZs delineation is purely the result of the hierarchical cluster analysis, making no post-clustering changes. A documented and replicable post-cluster analysis evaluation of the CZs would likely improve the final results by providing a quality-control step with the same objective as the expert judgement employed in earlier versions. Post-clustering evaluation of CZs and associated changes will be added to the final version of the 2020 CZs to mitigate this limitation as much as possible.

Finally, there is also a limitation associated with the data used to create the 2010 and 2020 CZ delineations. The delineations use county-to-county commuting flows data from the Census Bureau's American Community Survey (ACS) 5-year estimates. The ACS is based on a relatively small sample of the U.S. population, particularly in counties with small populations. This means that ACS commuting flow estimates may not closely reflect real-world commuting flows in some counties due to sampling error. While data reliability is a concern, Fowler and Cromartie (2023) found that sampling error had little effect on the classification of the vast majority of counties and census tracts in other classification schemes.

Data Source

U.S. Department of Commerce, Bureau of the Census, 2016–2020 5-Year American Community Survey commuting flows. [Accessed June 2, 2025.]

Recommended Citation

U.S. Department of Agriculture, Economic Research Service. Preliminary 2020 Commuting Zones, August 2025.

References

Fowler, C.S. (2024). New Commuting Zone delineation for the U.S. based on 2020 data. Scientific Data, 11(975). https://doi.org/10.1038/s41597-024-03829-5

Fowler, C.S. & Cromartie, J. (2023). The role of data sample uncertainty in delineations of core based statistical areas and Rural Urban Commuting Areas. Spatial Demography, 11(2), 6. https://doi.org/10.1007/s40980-023-00118-4

Fowler, C.S. & Jensen, L. (2020). Bridging the gap between geographic concept and the data we have: The case of labor markets in the USA. Economy and Space, 52(7), 1395–1414. https://doi.org/10.1177/0308518X20906154

Fowler, C.S., Rhubart, D.C., & Jensen, L. (2016). Reassessing and revising commuting zones for 2010: History, assessment, and updates for U.S. ‘labor-sheds’ 1990–2010. Population Research and Policy Review, 35(2), 263–286. https://doi.org/10.1007/s11113-016-9386-0

Tolbert, C.M. & Sizer Killian, M. (1987). Labor market areas for the United States (Report No. AGES870721). U.S. Department of Agriculture, Economic Research Service. https://doi.org/10.22004/ag.econ.277959

Tolbert, C.M. & Sizer, M. (1996). U.S. Commuting Zones and Labor Market Areas: A 1990 update (Staff Paper No. AGES-9614). U.S. Department of Agriculture, Economic Research Service. https://doi.org/10.22004/ag.econ.278812