Skip to content

rlsweeney/public_cs_texas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Replication Package for "Relinquishing Riches" by Thomas R. Covert and Richard L. Sweeney

Overview

The code in this replication package takes as inputs a mixture of publicly available data and commercial data, and outputs the figures, tables, and LaTeX input files used in the paper. All code is written in the R programming language. The replicator can run all of the paper's code by executing the make command from the root level of the replication package. The makefile will also generate a PDF file of the paper, using LaTeX and the files generated by the aforementioned R programs. The replicator should expect the code to run for about 12 hours on a modern laptop computer. If the replicator does not have access to the commercial data used in the paper (see below), the make command will recognize this and execute the subset of analyses that are possible using only the publicly available data.

Data Availability and Provenance Statements

This paper uses several publicly accessible data sources, exact copies of which are included in the replication package, and one commercially accessible data source, which is not included. In the details provided below, we describe how we obtained each of these sources, and, where possible, provide internet addresses for current versions of these sources. In many cases, the publicly available data is occasionally updated, and, to our knowledge, the associated data providers do not provide permanent links to previous versions. As a result, the replication package includes copies of the exact versions used in the paper.

Statement about Rights

  • I certify that the author(s) of the manuscript have legitimate access to and permission to use the data used in this manuscript.
  • I certify that the author(s) of the manuscript have documented permission to redistribute/publish the data contained within this replication package.

Summary of Availability

  • All data are publicly available.
  • Some data cannot be made publicly available.
  • No data can be made publicly available.

Details on each Data Source

The subset of the data used in this paper which can be made publicly available is deposited as a "data replication package" in a Harvard Dataverse repository, accessible at https://doi.org/10.7910/DVN/9WJ3JK. As described below in the "Instructions to Replicators" section, this paper's code replication package will automatically download this data replication package for replicators who do not download it themselves first.

Texas General Land Office (GLO) The paper makes use of three kinds of GLO data. The first type, "raw" GLO data, is data that we either downloaded in shapefile or tabular form from the GLO website or which we received via public records request. Our code uses this data in its unmodified form. The second type, "manually entered" GLO data, is data which we created ourselves from PDF documents available on the GLO website, or from non-digitized responses to public record requests. The third type, "modified" GLO data, is data which represents manual corrections we have made to the first two types of data.

Raw data includes:

  • Active and Inactive lease shape files, available in their current form from GLO GIS website. This replication package includes the version of these files that we downloaded in June, 2020, in raw_data/leases/June2020/.
  • The Lease Land File, which we received from GLO via a public records request. This file associates leases with their underlying parcels, and is included in the replication package as raw_data/leases/LeaseLandFile.csv.
  • The Mineral Lease Summary File, which we received from GLO via a public records request. This file provides additional non-spatial lease information that is not included in the Active and Inactive lease shape files, and is included in the replication package as raw_data/leases/tblMineralLeaseSummaryAll_2020.csv.
  • The Mineral Lease Assignment File, which we received from GLO via a public records request. This file provides information on when leases are assigned from one party to another, as well as identifying information about the parties involved. It is included in the replication package as raw_data/Assignments/glo_assignments.csv.
  • The Royalty Revenue Files, which we received from GLO via a public records request. These files document the product-specific (oil vs. gas) monthly royalty revenues earned by each lease. They are included in the replication package as the set of excel files matching raw_data/payments/Royalty_Payments_*.xlsx.
  • The Bonus and Delay Rental Payment File, which we received from GLO via a public records request. This file records bonus payments, delay rental payments, and other miscellaneous non-royalty payments earned by each lease. It is included in the replication package as raw_data/payments/2051506_PIR-20-0915_Rentals_Cleared_Payments.xlsx.

Manually entered data includes:

  • Auction Bid Notices, which are published by GLO in PDF format on their website. These are public notices documenting which parcels will be available in each auction, including auctions on parcels that ultimately do not transact. The current public version of this file covers auctions conducted since 2005. We additionally did public records searches for earlier auctions, back to the year 2000, on GLO's Land Grant Search. We converted the PDF into digital tabular format using OCR software and some manual checking/editing. The digital tabular version of this is included in the replication package as raw_data/notices/old_notices.xlsx, raw_data/notices/newest_notices.xlsx and intermediate_data/glo_notices_final.csv.
  • Auction Bid Results, which are published by GLO in PDF format on their website. These are public records of which bids were received for each parcel at each auction, including the name of the bidder and the bid amount. The current public version of this file covers auctions conducted since 2005. We additionally did public records searches for earlier auctions, back to the year 2000, on GLO's Land Grant Search. We converted the PDF into digital tabular format using OCR software and some manual checking. The digital tabular version of this is included in the replication package as the set of excel files matching raw_data/bids/*.xlsx.
  • RAL Lease Coversheets, which are included as a single page associated with each RAL lease's PDF file stored at the Land Grant Search. For each RAL lease, the coversheet shows the initial negotiated offer as well as GLO's "recommended" revised offer, and any associated comparable leases justifying this recommendation. We had a team of research assistants use the Land Grant Search to find each RAL lease's PDF document and then manually enter the coversheet information into excel. We have included this excel file in the replication package as raw_data/coversheets/Final_Term_Sheet.xlsx and raw_data/coversheets/Highlighted_Terms.xlsx.
  • Orderly Development Agreement (ODA) Flags, which indicate sets of leases that have joint reporting of royalty payment revenue. Firms that sign an ODA with GLO report all associated royalty revenue on a single lease, so we use ODA information to allocate lease output to leases within an ODA. We found these by manually searching for ODA agreement numbers, starting at 1, and stopping at 6 (as of June, 2020, there were only 6), on the GLO Map Server. We have included a CSV file of the leases associated with each ODA in the replication package as raw_data/leases/odas.csv.
  • Additional Scraped Parcel Information, which we recorded from the Land Grant Search. Some leases do not appear in the Lease Land File, which relates leases to their underlying parcels, so we used web scraping software to search for each lease on the Land Grant Search and record any available parcel information. We have included an R dataset containing these scraping results in the replication package as raw_data/leases/lease_parcel_scraping/scraped_parcels.Rda.
  • Lease Addenda Information, which are included as several pages at the end of each RAL lease's PDF file stored at the Land Grant Search. For each RAL lease, the addenda information shows how the lessee and lessor have agreed to changes from the standard lease document. We had a team of research assistants use the Land Grant Search to find each RAL lease's PDF document and then manually enter and standardize the addenda information into a spreadsheet. We have included a CSV file of this data in the replication package as raw_data/addenda.csv.
  • Manually Improved Firm Names, which we created from the population of lessee, assignee, and bidder names. We manually reviewed these names, did internet research to verify that similar sounding names corresponded to the same entity, and created a single identifier for sets of names representing the same firm. We have included an excel file of this data in the replication package as intermediate_data/manually_improved_names.xlsx.

Modified data includes:

  • A variety of manual edits to lease contract information that we made after reviewing lease PDFs, including edits to bonus payments, royalty rates, undivided interest status, effective dates, and expiration dates. We have included this information in tabular form in the replication package as: intermediate_data/leases/missing_bonus.csv, intermediate_data/leases/missing_bonus2.csv, intermediate_data/leases/missing_bonus3.xlsx, intermediate_data/leases/missing_bonus4.xlsx, intermediate_data/leases/lease_check.csv, intermediate_data/leases/missing_undivided.csv, intermediate_data/leases/royalty_fixes.xlsx, intermediate_data/leases/effective_date_fixes.xlsx, intermediate_data/leases/term_fixes.xlsx,
  • A variety of manual edits to lease assignment information that we made after reviewing the lease assignment data and lease documents available at the Land Grant Search. We have included this information in tabular form in the replication package as intermediate_data/assignments/partial_assignments.csv, intermediate_data/assignments/manual_pass.xlsx, intermediate_data/assignments/glo_assignments_fix.csv.

US Energy Information Administration (EIA) We use EIA's oil and gas price information. In particular, we use their monthly West Texas Intermediate Crude Oil spot price series, available here and their monthly Henry Hub spot price series, available here. We have included the versions of these files that we downloaded in September, 2019, in the replication package, as raw_data/prices/RWTCm.xls and raw_data/prices/RNGWHHDm.xls.

We also use EIA's shale play shapefiles. We use shale play boundaries from here, and the replication package includes versions of these files that we downloaded in May, 2019, as the set of files matching shape_files/TightOil_ShaleGas_IndividualPlays_Lower48_EIA/*Boundary*. Finally, we use EIA's shale play thickness information for the Permian Shale and the Eagle Ford Shale. We have included the versions of these files that we downloaded in May, 2019, in the replication package as the set of files matching shape_files/TightOil_ShaleGas_IndividualPlys_Lower48_EIA/*Isopach*.

Multi-Resolution Land Characteristics Consortium (MRLC) We downloaded Land Cover data from the MLRC in November, 2017. Our analyses use the National Land Cover Database (NLCD), during the 2006 epoch. This raster data covers the entire continental US, and, as such, is contained in an extremely large file that can be hard to work with on a laptop computer. To aid replicators, we have included a version that we clipped to the state of Texas (using a separate desktop computer running ArcGIS) in our replication package, as the files in the directory shape_files/Land_Cover/landcover.

Texas Department of Transportation (TXDOT) We downloaded the Texas public road network (highways, county roads, city streets, toll roads, and local streets) in shape file format from the Texas Department of Transportation. The download link that we used is no longer active, but a current version of this data is available here. We have included the version of this file that we downloaded in August, 2017, in the replication package as the set of files in the folder shape_files/txdot-roads_tx/.

US Geological Survey (USGS) We downloaded shape files for rivers, streams, and water bodies from the US Geological Survey National Hydrography Dataset. Though it is possible to download the current version of the NHD from that link, the version we downloaded in September, 2018, is no longer available, so we include it as a part of the replication package as the set of files in the folder shape_files/usgs-rivers_tx/.

US Census (Census) We downloaded a shapefile describing the boundary of all US counties, which we use to identify counties for leases and parcels, as well as assemble a shape for the State of Texas, from the US Census. The download link that we used is no longer active, but a current version of this data is available here. We have included the version of this file that we downloaded in January, 2017, in the replication package as the set of files in the folder shape_files/us_county/.

Commercially Accessible Data

The commercially accessible data in this paper is the Texas Permanent School Fund Land Grid owned by P2 Energy Solutions. This data represents the location, shape, and land type of all original PSF parcels, in shape file format. We acquired an academic license to use this data in February, 2018, after approximately six months of negotiations, and are not able to redistribute it. Interested researchers can also access this data by contacting P2 Energy Solutions and negotiating a similar academic license. We would be happy to assist with any reasonable replication attempts for two years following publication.

Computational requirements

Software Requirements

  • make, which is installed by default on UNIX-like systems (MacOS, Linux, etc). Windows users can install make from a variety of sources. We recommend installing Chocolately first, and then using it to install make with the terminal command choco install make.
  • A LaTeX distribution.
  • R 4.2.0, with the following packages and their versions
    • boot (1.3-28)
    • broom (1.0.2)
    • exactextractr (0.8.1)
    • fixest (0.11.0)
    • Formula (1.2-4)
    • furrr (0.3.1)
    • fuzzyjoin (0.1.6)
    • grf (2.2.1)
    • grid (4.2.2)
    • gstat (2.1-0)
    • kableExtra (1.3.4)
    • knitr (1.41)
    • lmtest (0.9-40)
    • lubridate (1.9.0)
    • lwgeom (0.2-10)
    • raster (3.6-13)
    • readxl (1.4.1)
    • rgdal (1.6-3)
    • rgeos (0.6-1)
    • RISCA (1.0.3)
    • sandwich (3.0-2)
    • sf (1.0-9)
    • tidyverse (1.3.2)

Note: the program code/package_installation.R, which is executed as part of the makefile process, will check for the presence of these packages and installs the newest version of them if they are not currently available.

Memory and Runtime Requirements

Summary

Approximate time needed to reproduce the analyses on a 2018 vintage laptop computer:

  • <10 minutes
  • 10-60 minutes
  • 1-8 hours
  • 8-24 hours
  • 1-3 days
  • 3-14 days
  • > 14 days
  • Not feasible to run on a desktop machine, as described below.

Details

The code was last run on a 4-core Intel-based laptop, with 16 gigabytes of RAM, running MacOS version 11.6.7.

Description of programs/code

  • Programs in code/Data_Cleaning ingest and clean all of the raw data described above, saving their output in the data directories generated_data and generated_shape_files.
  • Programs in code/Analysis processes data from the generated_data folder and generate tables in the output/tables folder, figures in the output/figures folder, and LaTeX fragments in the output/estimates folder.
  • Programs in code/functions define commonly used functions in the analysis and data cleaning programs.
  • The program code/paths.R defines relative paths used by data cleaning and analysis programs. It depends on the presence of a data.txt file in the root level of the replication archive (for details, see below in the "Instructions to Replicators" section).
  • The program code/texas_constants.R defines fixed values of various parameters used in data cleaning and analysis programs.

Instructions to Replicators

This guide assumes you have already downloaded the code replication package and have expanded that archive into a known location on your machine, e.g., /Users/tcovert/texas_code or C:\rsweeney\texas_code. There is a separate data replication package, available as a Harvard Dataverse repository, at https://doi.org/10.7910/DVN/9WJ3JK. If you have not already downloaded it, the code replication process can download it for you (see step 7 below). However, if you wish to download it separately, make note of where the data replication package is saved, and update data.txt accordingly (see step 2 below).

  1. Create a separate folder on your computer for the data replication package, e.g., /Users/tcovert/texas_data or C:\rsweeney\texas_data.
  2. Save a data.txt file in the root level of your copy of the paper's code folder. It should contain the full path to where you want the data replication package to be saved, such as /Users/tcovert/texas_data or C:\rsweeney\texas_data or where it is saved if you downloaded it separately.
  3. If you haven't already installed R, install it.
  4. If you haven't already installed LaTeX, install it.
  5. If make isn't already installed, install it (see above for instructions to Windows users).
  6. Navigate your terminal to the root level of the code repository.
  7. Optionally install the relevant R packages by typing make install. If any of the required packages is already installed on your computer, this step will not overwrite the package versions you already have. This step is required if you do not already have all of the relevant packages installed.
  8. If you have not yet downloaded the data repository, type make getdata. This will download the data replication package and save it to the folder you have defined in data.txt. If you have already downloaded the data replication package (and saved it in the location specified in data.txt) you can skip this step.
  9. To run the code replication and build a fresh pdf of the paper, type make.

If the replicator does not have access to the commercially available data, make will execute the subset of analyses that are possible using only publicly available data.

Note: many of the programs make use of computationally intensive Double/Debiased Machine Learning (DML) estimation techniques from Chernozhukov et al, which represent the vast majority of the computational time reported above. Replicators who are willing to sacrifice some accuracy in order to obtain faster results should change the value of the variable dml_n in code/texas_constants.R to an odd number smaller than its default value, which is 101. This variable refers to the number of cross-fitting steps that the DML estimators average over, so setting it to something smaller (e.g., 11) would reduce the time spent in DML computation by a factor of 10.

List of tables and programs

The provided code reproduces:

  • All numbers provided in text in the paper
  • All tables and figures in the paper
  • Selected tables and figures in the paper, as explained and justified below.
Figure/Table # Program Line Number Output file Note
Figure 1 code/Analysis/lease_stats.R 82 output/figures/cohorts.png
Table 1 code/Analysis/lease_stats.R 223 output/tables/summary_stats_by_type.tex
Table 2 code/Analysis/parcel_stats.R 169 output/tables/summary_stats_parcel.tex Requires commercial data
Figure 2 writeups/cs_texas.tex 241 none
Table 3 code/Analysis/parcel_stats.R 50 output/tables/parcel_balance.tex Requires commercial data
Figure 3 code/Analysis/leases_maps.R 169 output/figures/sample_glo_leases.png
Table 4 code/Analysis/regressions_lease_contracts.R 243 output/tables/logbonus_regressions.tex
Table 5 code/Analysis/regressions_lease_contracts.R 244 output/tables/royalty_term_regressions.tex
Table 6, panel a code/Analysis/regressions_outputs.R 295 output/tables/stacked_output_levels.tex
Table 6, panel b code/Analysis/regressions_outputs.R 296 output/tables/stacked_output_poisson.tex
Figure 4 code/Analysis/parcel_monthplots.R 97 output/figures/active_plot.png Requires commercial data
Table 7 code/Analysis/regessions_parcels.R 181 output/tables/parcel_regressions.tex Requires commercial data
Table 8 code/Analysis/allocative_diffs.R 110 output/tables/allocative.tex
Table 9 code/Analysis/regressions_firms.R 61 output/tables/firms_regressions.tex
Table 10 code/Analysis/auction_analysis.R 198 output/tables/TopPairAuctionShares.tex
Table 11 code/Analysis/auction_analysis.R 775 output/tables/auction_number_bids.tex
Table 12 code/Analysis/auction_analysis.R 362 output/tables/auction_bonus_regressions.tex

The Online Appendix contains additional tables and figures which map to code in this replication package as follows:

Figure/Table # Program Line Number Output File Note
Figure A.1 code/analysis/leases_maps.R 91 output/figures/glo_leases_in_texas.png
Figure A.2 code/analysis/parcel_hazard_analysis.R 395 output/figures/ipwkm10.png Requires commercial data
Figure A.3 code/analysis/parcel_hazard_analysis.R 429 output/figures/ipwkm20.png Requires commercial data
Table A.1 code/analysis/regressions_lease_contracts.R 240 output/tables/bonus_regressions.tex
Table A.2 code/analysis/regressions_extracontrols.R 186 output/tables/lease_regressions_extra_bonus.tex
Table A.3 code/analysis/regressions_extracontrols.R 183 lease_regressions_extra_output.tex
Table A.4 code/analysis/regressions_drilled.R 131 output/tables/drilled_regressions.tex
Table A.5 code/analysis/regressions_drilled.R 132 output/tables/logdboe_drilled_regressions.tex
Table A.6 code/analysis/parcel_hazard_analysis.R 146 output/tables/spell_stats.tex Requires commercial data
Table A.7 code/analysis/parcel_hazard_analysis.R 360 output/tables/logrank_stats.tex Requires commercial data
Table A.8 code/analysis/regressions_lessors.R 106 output/tables/logbonus_regressions_lessor_heterogeneity.tex
Table A.9 code/analysis/size_het.R 86 output/tables/logbonus_size_heterogeneity.tex
Table A.10 code/analysis/regressions_parcels.R 314 output/tables/lease_parcel_comparisons_linear.tex Requires commercial data
Table A.11 code/analysis/regressions_parcels.R 317 output/tables/lease_parcel_comparisons_poisson.tex Requires commercial data
Table A.12 code/analysis/leases_stats.R 340 output/tables/summary_data_construction.tex
Table A.13 code/analysis/auction_appendix.tex 965 output/tables/auction_appendix.tex
Table A.14 code/analysis/regressions_bonus_raladdenda.R 181 output/tables/stacked_regressions_addenda.tex

References

Covert, Thomas; Sweeney, Richard, 2022, "Replication Data for: "Relinquishing Riches: Auctions vs Informal Negotiations in Texas Oil and Gas Leasing", https://doi.org/10.7910/DVN/9WJ3JK, Harvard Dataverse, V1

Texas General Land Office, “Past Bid Sale Results,” April 2017.

Texas General Land Office, “Active Oil & Gas Leases,” June 2020.

Texas General Land Office, “Inactive Oil & Gas Leases,” June 2020.

Texas General Land Office, "Auction Bid Notices,” June 2020.

U.S. Energy Information Administration, “Eagle Ford play boundaries, structure and isopachs,” May 2019.

U.S. Energy Information Administration, “Henry Hub Natural Gas Spot Price,” September 2019.

U.S. Energy Information Administration, “Low permeability oil and gas play boundaries in Lower 48 States,” May 2019.

U.S. Energy Information Administration, “Permian Basin: Wolfcamp formation elevation and isopachs,” May 2019.

U.S. Energy Information Administration, “West Texas Intermediate Crude Oil Spot Price,” September 2019.

Multi-Resolution Land Characteristics Consortium, "National Land Cover Database," November 2017.

Texas Department of Transportation, “TxDOT Roadways,” August 2017.

U.S. Geological Survey, “National Land Cover Database,” June 2021.

U.S. Census, "TIGER/Line Shapefiles," January 2017.

P2 Energy Solutions, “Texas Permanent School Fund Land Grid,” February 2018.

V Chernozhukov, D Chetverikov, M Demirer, E Duflo, C Hansen, W Newey and James Robbins, "Double/debiased machine learning for treatment and structural parameters," The Econometrics Journal, February 2018

About

Replication files for Covert and Sweeney (2022)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published