__________________________________________________ A CENTURY OF SPRAWL IN THE UNITED STATES: GEOGRAPHIC TABULAR DATA Chris Barrington-Leigh and Adam Millard-Ball __________________________________________________ Table of Contents _________________ 1 FILE FORMAT 2 GEOGRAPHIC UNITS AND TIME SERIES 3 FIELDS/COLUMNS 4 TIPS AND CAVEATS 5 CITATION 6 CONTACT This file describes the geographic tables published, along with network graphs described separately, as part of the 2015 PNAS article DOI: 10.1073/pnas.1504033112. The full citation is below. The dataset released with this article is archived independently as [DOI:10.5061/dryad.3k502] ([http://dx.doi.org/10.5061/dryad.3k502]). 1 FILE FORMAT ============= Three formats are provided: - .pandas files are for use with the Python pandas data analysis library ([http://pandas.pydata.org]). They can be loaded with the read_pickle() function from within pandas. - .tsv files are tab-separated and can be used with any spreadsheet or statistical analysis package. Line endings are the simple "\n" POSIX type. - .shp files are for use with Geographical Information System software, e.g. QGIS or ArcGIS. The geographic boundary files are from the US Census Bureau ([https://www.census.gov/geo/maps-data/data/tiger-line.html]), 2014 vintage. The first two are compressed into one ZIP file. The .shp files are compressed into another ZIP file. 2 GEOGRAPHIC UNITS AND TIME SERIES ================================== The following files provide the cross-sectional estimates (i.e., the stock of nodes): - all .shp files - Barrington-Leigh-Millard-Ball-PNAS2015-Century-of-sprawl-latest-stock-county.pandas - Barrington-Leigh-Millard-Ball-PNAS2015-Century-of-sprawl-latest-stock-county.tsv The remaining .pandas and .tsv files provide the full time series (in long format) for different geographic aggregations, as follows: - US counties - US metropolitan regions (CSAs and CBSAs). CSAs are used where they are defined; CBSAs are used for metropolitan regions outside a CSA. - US states - The national level (US) The FIPS code lookups are available here (we use the 2013 delineations): [http://www.census.gov/geo/reference/ansi.html] 3 FIELDS/COLUMNS ================ Please refer to the Supporting Information for details of sources of the underlying data and the process for constructing the dataset. The data consist of three measures of street-network sprawl: - degree (mean nodal degree) - deadend (fraction of deadends) - fourway (fraction of 4+ degree nodes) - N denotes the number of nodes Standard errors of the means are also provided (prefixed se_). They reflect an estimate which relies on assuming unbiased measurement and sampling, and are simply the sample variance divided by the square root of the number of nodes. Suffixes denote the series: - _parcel denotes the parcel-based series (the suffix is abbreviated to Pcl in the .shp files) - _TIGERcensus denotes the census-based series (abbreviated to Cen) - _TIGER denotes the TIGER/Line series (abbreviated to TGR) Notes: - The parcel-based series has partial coverage of the US. See the Supporting Information. - The year-by-year parcel-based and census-based series describe the characteristics of newly constructed nodes in a given year. - The TIGER/Line series describes the characteristics of the stock of nodes in a given year. - The allyears and the shape files describes the characteristics of the stock of nodes in 2012 (parcel-based) or 2013 (TIGER/Line), regardless of the series. - The latest stock estimates differ between the parcel-based and TIGER/Line series for reasons discussed in Section 1.1 of the SI. In brief, the parcel-based series excludes some service or other access roads without associated buildings. Moreover, available parcel data cover only a subset (in some cases none) of each state or metropolitan area. - For more information about the differences between the series, refer to the Supporting Information. - Note that the data are limited to urbanized areas, defined as block groups where the majority of blocks were classified as urban in the 2010 Census. 4 TIPS AND CAVEATS ================== - See the Supporting Information for strong caveats on the use of TIGER and TIGERcensus data. - Ignore low-count cells (see columns N_*) - Do not use years before 1920 from any dataset. They are provided for completeness, but county parcel records are less reliable prior to 1920. We recommend collapsing these years to a "pre-1920" value. - Use rolling means to smooth years before 1930, and for longer periods of time for smaller units of analysis 5 CITATION ========== Barrington-Leigh, Christopher and Millard-Ball, Adam (2015), "A Century of Sprawl in the United States." Proceedings of the National Academy of Sciences, DOI: 10.1073/pnas.1504033112 6 CONTACT ========= For further questions, please contact: - Chris Barrington-Leigh, McGill University: Chris.Barrington-Leigh@McGill.ca - Adam Millard-Ball, University of California, Santa Cruz: adammb@ucsc.edu