Skip to content

hktosun/ShinSolon2011JPubE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Replication of Shin & Solon (2011, JPubE)

This repository includes the R code that replicates Figure 2-5 of "Trends in men's earnings volatility: What does the Panel Study of Income Dynamics show?" by Shin & Solon (2011, Journal of Public Economics).

Description of the Study

The paper studies the change in the earnings volatility of men in the U.S. between 1969 and 2006. To this end, they employ the PSID data for the relevant years. We document the summary statistics of age and wage income distributions, and plot our version of Figure 2 to Figure 5 of the paper.

In Figure 2, we plot these differences. Figure 3 shows the historical movement of the log difference in earnings by plotting 10th, 25th, 50th, 75th and 90th percentile over the entire period. In Figure 4, we compare two methods of calculating volatility of the earnings. Finally, in Figure 5, we plot the 90th percentile-10th percentile difference of the earnings with the latter method in which we use the levels of income rather than the logs.

Data

PSID Variables

We provide the list of variables that we use for each year here. The variable names here relates to the variable names as follows:

  • 1968 Interview#: id_68,
  • Age of HH: age,
  • Sex of HH: gender,
  • Wage & Sal of HH: wage,
  • Accuracy of HH Head Wage: acc_wage,
  • Total Labor Income of HH: inc,
  • Accuracy of HH Head Labor Income: acc_inc,
  • Farm Income of HH: farm,
  • Accuracy of Farm Income:acc_farm,
  • (Labor Part of) Business Income of HH: bus,
  • Accuracy of Business Income: acc_bus,
  • Tot. Income Exl Farm and Business HH: excfarmbus

The list of variables is a collaboration with Lejvi Dalani, Joao Fonsenca Rodrigues, Xiaohan Zhang and Yan Zhao from the Economics Department at University of Minnesota, and myself.

Preparing the Datasets

We analyze the two-year differences in earnings throughout the period. So, we prepare a distinct dataset for each pair of years. By doing this, we got 31 datasets: (calendar years) 1969-1971, 1970-1972, 1971-1973, ..., 1994-1996, 1996-1998, ..., 2004-2006. Note that we add 0 to the end of each variable name (except for the 1968 ID) if it's for the initial year, and 1 if it's for the final year. For example, in the data set 1969-1971, we name the initial year's wage as wage0 and the final year's wage as wage1.

Cleaning the Data

We restrict the sample to include only the heads of the households who have the following features:

  • gender: male
  • age: in [25,59] in both years
  • age difference between two years: in {1,2,3}
  • belong to: SRC cross-section sample
  • wage & total labor income & farm income & business income: not imputed

For Figure 2, 3 and 5, and for the first two plots in Figure 4, we keep the observations only if:

  • wage > 1
  • total labor income > 1

Then, for all calculations, we drop the top and bottom 1% of the observations:

  • wage >= 1st percentile of wages
  • wage <= 99th percentile of wages

Summary Statistics

The summary statistics of ages and wage & salaries in the dataset is as follows:

Statistic Age Wage
Mean 38.20 29666.51
Median 37.00 24000.00
Std. Dev. 8.98 22958.38
Min 25.00 950.00
Max 58.00 250000.00
Skewness 0.36 2.22
Kurtosis -0.96 8.30
IQR 15.00 24000.00

Calculations

Figure 2

For Figure 2, we calculate the numbers for five series:

  • Wages & Salaries: Take the log of the wage and salary income (wage) for both years, take the difference, regress on a quadratic in age, save the residuals, save the standard deviation of the residuals.
  • Wages & Salaries with ages 30-54: Restrict the sample to the people at age in [30,54], and do the same as above.
  • Total Labor Income (Dynan et al.): Take the log of the total labor income (inc) for both years, take the difference, regress on a quadratic in age, save the residuals, save the standard deviation of the residuals.
  • Wages & Salaries without controlling for age: Take the log of the wage and salary income (wage) for both years, take the difference, save the standard deviation of the difference.
  • Total Labor Income excluding farm & business income: Take the log of the Total Labor Income excluding farm & business income (excfarmbus) for both years, take the difference, regress on a quadratic in age, save the residuals, save the standard deviation of the residuals.

Figure 2 is as follows:

Figure 3

For Figure 3, we take the log of the wage and salary income (wage) for both years, take the difference, regress on a quadratic in age, save the residuals. And then, we calculate the following values:

  • p90: 90th percentile of the residuals
  • p75: 75th percentile of the residuals
  • median: 50th percentile of the residuals
  • p25: 25th percentile of the residuals
  • p10: 10th percentile of the residuals

Figure 3 is as follows:

Figure 4

For Figure 4, we calculate the following three series:

  • Change in log earnings: Take the log of the wage and salary income (wage) for both years, take the difference, regress on a quadratic in age, save the residuals, save the standard deviation of the residuals, save the difference between the 90th and the 10th percentile of the residuals.
  • Relative change in real earnings (zeros and outliers excluded): Here the authors introduce a new method to estimate the earnings change.

"We use the CPI-U-RS to put earnings into real terms. Again we account for the mean effects of year, age, and cohort by estimating a separate regression each year of change in real earnings on a quadratic in age, and then proceed to study the residualized real earnings change. We rescale the residualized real earnings change between years t-2 and t into relative terms by dividing it by the simple average of the sample means of real earnings in the two years." (Shin & Solon, 2011 JPubE)

  • Relative change in real earnings (zeros and outliers included): The same calculation described above with the dataset that includes zero earnings.

Figure 4 is as follows:

Figure 5

For Figure 5, we do the same calculations as we did for Relative change in real earnings (zeros and outliers excluded) series in Figure 4. Then, we calculate the following five properties of that series:

  • p90: 90th percentile
  • p75: 75th percentile
  • median: 50th percentile
  • p25: 25th percentile
  • p10: 10th percentile

Figure 5 is as follows:

About

R Code for Solon & Shin (2011, JPubE)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published