A Look At Non-working Prime-age Men Using R: 1994-2016

On December 26, 2016, economist George Borjas posted an article on his blog titled Where Did 2.5 Million Native Working Men Go?. It includes a graph showing a substantial rise in the number of prime-age native men who do not work at all during an entire calendar year. It also includes the simple Stata code which was used to generate the graph using CPS (Consumer Population Survey) data. Stata is a statistical language widely used in academia. The following describes how to replicate the data and graph in the statistical language R, another widely used statistical language which has the advantage of being freely available under the GNU General Public License.

Following are instructions for extracting the required data from IPUMS, the Integrated Public Use Microdata Series:

  1. Go to https://cps.ipums.org/cps/
  2. Click on Login on the upper menu bar
  3. Log in to CPS if you have an account (else click on "Create an Account")
  4. Click on Home on menu bar
  5. Click on "Browse and Select Data" in upper left
  6. Click on the "Select Samples" button
  7. On the ASEC tab, check all of the boxes from 1994 through 2016
  8. Click on "Submit sample selections" (Data Cart now shows 23 samples)
  9. Under "Select Variables", select Person, then "Core Demographic"
  10. Check AGE and SEX
  11. Select Person, then "Education"
  12. Check EDUC
  13. Select Person, then "Ethnicity/Nativity"
  14. Check CITIZEN
  15. Select Person, then "Work"
  16. Check EMPSTAT, WKSWORK1, and UHRSWORKLY
  17. Select Household, then "Geographic"
  18. Check STATEFIP and METFIPS (Data Cart now shows 7 variables)
  19. Click "VIEW CART" at the bottom of the Data Cart (you should see the following)
Variable   Variable Label                              Type   16  15  14  13  12  11  10  09  08  07  06  05  04  03  02  01  00  99  98  97  96  95  94
---------  ------------------------------------------  ----  --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
YEAR       Survey year [preselected]                     H     X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X
SERIAL     Household serial number [preselected]         H     X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X
HWTSUPP    Household weight, Supplement [preselected]    H     X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X
CPSID      CPSID, household record [preselected]         H     .   .   .   .   .   .   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X
ASECFLAG   Flag for ASEC [preselected]                   H     X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X
HFLAG      Flag for the 3/8 file 2014 [preselected]      H     .   .   X   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .
MONTH      Month [preselected]                           H     X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X
PERNUM     Person number in sample unit [preselected]    P     X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X
CPSIDP     CPSID, person record [preselected]            P     .   .   .   .   .   .   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X
WTSUPP     Supplement Weight [preselected]               P     X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X
STATEFIP   State (FIPS code)                             H     X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X
METFIPS    Metropolitan area FIPS code                   H     X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X
AGE        Age                                           P     X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X
SEX        Sex                                           P     X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X
CITIZEN    Citizenship status                            P     X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X
EDUC       Educational attainment recode                 P     X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X
EMPSTAT    Employment status                             P     X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X
WKSWORK1   Weeks worked last year                        P     X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X
UHRSWORKLY Usual hours worked per week (last yr)         P     X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X   X
  1. Click "Create data extract"
  2. Check that there are 23 samples, 19 variables, Data Format is set to CSV, and Structure is set to Rectangular
  3. Under "Describe your extract", type something like "Data for non-workers"
  4. Click "Submit extract"
  5. When you receive an email with the subject "IPUMS-CPS data extract is ready", open it and go to the download link that it provides
  6. On the line with the correct date and description, click on the CVS link to download the .gz zip file containing the data
  7. Copy and unzip the .gz file (I used 7-zip to unzip it) to the working directory where you plan to run the R program to analyze it
  8. Change the name of the extracted .csv file to cps16.csv
The file cps16.csv can be used by Excel and other software, not just by R. It can be generated as explained above or can be created by downloading the file at http://econdataus.com/cps_00013.csv.gz, unzipping it, and renaming the extracted .csv file to cps16.csv. With that file, it is possible to replicate the graph by running the R program at http://econdataus.com/nowork1ed.R. Following is the start of the resulting output if this program is placed in the same directory as cps16.csv and run:

> source("nowork1ed.R")
[1] "***** USING current cc dataframe. If error, enter 'rm(cc)' and rerun. *****"
[1] "4268777  Initial number of records"
[1] "4268777  1994 to present"
[1] "4268777  the United States"
[1] "1798834  AGE 25 to 54"
[1] "861800  MEN"
[1] "861800  SUPPLEMENT WEIGHT >= 0"
   YEAR IMMIGRANT NATIVE
1  1994     11.07   8.20
2  1995     10.37   7.92
3  1996      9.71   8.16
4  1997      9.37   7.71
5  1998      8.14   7.48
6  1999      7.84   7.82
7  2000      7.47   7.64
8  2001      8.11   7.74
9  2002      8.00   8.58
10 2003      9.10   9.23
11 2004      9.30  10.36
12 2005      8.33  10.40
13 2006      7.73  10.08
14 2007      7.69   9.89
15 2008      8.14  10.35
16 2009      8.03  11.26
17 2010     10.68  13.11
18 2011     11.27  14.40
19 2012     10.50  14.69
20 2013      9.76  14.04
21 2014      8.53  14.30
22 2015      9.76  13.72
23 2016      9.33  12.99
Press enter to continue, escape to exit
Below is George Borjas' graph from his post, followed by the first graph generated by the program. As can be seen, the program does appear to replicate the numbers in Borjas' graph exactly. Also, the graphs show an interesting pattern. For native men, the percentage not working for the entire year was about 8 percent from 1994 to 2001. Then, following the collapse of the tech bubble, the percentage increased to about 10 percent in 2004 to 2007. Then, following the financial crisis, the percentage skyrocketed to nearly 15 pecent in 2012 but has since fallen back to about 13 percent.

The percentages for immigrant men have been somewhat more steady. They fell from about 11 percent in 1994 to about 8 percent in 1998 and continued there through 2002. They did rise briefly to about 9 percent in 2003 and 2004 but fell back to about 8 percent from 2005 through 2009. Following the financial crisis, they rose to over 11 percent in 2011 but have fallen back to about 9 percent since then.

George Borjas' graph from https://gborjas.org/2016/12/26/where-did-native-working-men-go/ Non-working Prime-age Men in the United States: 1994-2016

The rest of the program divides this group of men aged 25-54 into 6 groups according to their educations, specifically the highest degree that they received. The six groups are as follows:

  1. With no high school diploma or equivalent
  2. With a high school diploma or equivalent
  3. With an associate's degree
  4. With a bachelor's degree
  5. With a master's degree or progressional school degree
  6. With a doctorate degree
Pressing enter will output the numbers and graph for each of these groups. Hence, six successive enters will output the following six graphs:

Non-working men in the United States (aged 25-54 with no high school diploma): 1994-2016 Non-working men in the United States (aged 25-54 with a high school diploma): 1994-2016 Non-working men in the United States (aged 25-54 with an associate's degree): 1994-2016 Non-working men in the United States (aged 25-54 with a bachelor's degree): 1994-2016 Non-working men in the United States (aged 25-54 with a master's degree): 1994-2016 Non-working men in the United States (aged 25-54 with a doctorate degree): 1994-2016

As can be seen from the first graph, the percentages of non-working men is much higher for natives than for immigrants when looking at those without a high school diploma. The second graph shows that the percentages have likewise been somewhat larger since abut 2004 for those with just a high school diploma. For Associate's degrees, the percentages are similiar and for bachelor, master, and doctorate degrees, the percentages for natives is actually a bit lower than for immigrants. Also, most of the native subgroups seem to follow a pattern very similar to the entire native group except at different levels. There is a pre-tech-crash level that goes from 1994 to 2001, a post-tech-crash level that goes from about 2004 to 2007 or 2008, a post-financial crisis peak that is reached in 2012 or 2013, and then a current recovery low reached in 2016. This can be better seen in the first of the following two graphs which shows the percentages for all of the native subgroups in one graph:

Non-working men in the United States (aged 25-54 with a master's degree): 1994-2016 Non-working men in the United States (aged 25-54 with a doctorate degree): 1994-2016

As can be seen, the percentages are by far the highest for native men without a high school diploma and generally get lower and lower with each higher level of degree. Interestingly, this is much less the case for immigrant men, shown in the last graph above. Some of this may be due to the greater volatility of the percentages, likely due to the smaller sample sizes for immigrants. Still, their percentages range only up to about 14 percent whereas the native percentages range up to about 40 percent. Again, this difference is mainly due to high percentages for native men without a high school diploma and, since about 2004, native men with a high school diploma.

A Look At Non-working Prime-age Men Who Are Not Disabled or Retired

Someone suggested to me that many of the prime-age men who are not working may be on disability. To check this, I looked at the variable EMPSTAT whose codes are given at this link. As can be seen, this variable indicates those persons who are not in the labor force for the three reasons of 1) unable to work, 2) retired, and 3) other. The R program at http://econdataus.com/nowork1ed.R was modified to filter out men who were unable to work or retired, creating the program at http://econdataus.com/nowork1edaw.R. These categories are in the CPS data only from 1995 on so the year 1994 was also dropped from the data. Following is the start of the resulting output from if this new program is placed in the same directory as cps16.csv and run:

> source("nowork1edaw.R")
[1] "READING cps16.csv"
[1] "4268777  Initial number of records"
[1] "4117834  1995 to present"
[1] "4117834  the United States"
[1] "1734411  AGE 25 to 54"
[1] "830818  MEN"
[1] "830818  SUPPLEMENT WEIGHT >= 0"
[1] "824713  NOT RETIRED"
[1] "790226  ABLE TO WORK"
   YEAR IMMIGRANT NATIVE
1  1995      7.90   3.94
2  1996      7.37   4.09
3  1997      7.53   3.43
4  1998      6.57   3.28
5  1999      5.93   3.20
6  2000      5.41   3.21
7  2001      6.11   2.99
8  2002      6.08   3.74
9  2003      6.83   4.41
10 2004      7.25   4.77
11 2005      6.34   4.84
12 2006      5.79   4.66
13 2007      5.72   4.64
14 2008      6.05   4.98
15 2009      6.17   5.57
16 2010      8.62   7.75
17 2011      9.30   8.72
18 2012      8.70   8.70
19 2013      8.02   8.28
20 2014      6.77   7.96
21 2015      7.84   7.42
22 2016      7.30   7.02
Press enter to continue, escape to exit
The output above suggests that there are many more prime-age men who are unable to work than are retired. It shows that there are 34,487 (824,713 - 790,226) records for prime-age men who are unable to work but only 6,105 (830,818 - 824,713) records for prime-age men who are retired.

Below is the first graph which replicates George Borjas' graph, followed by the first graph generated by this modified program. As can be seen, omitting men who are unable to work or are retired greatly lowers the percentage of non-working prime-age men. In addition, it changes the relationship between non-working native and immigrant men. When such groups are included, the percentages of non-working native men has been greater than non-working immigrants since 2004. When they are excluded, however, the percentages of non-working native men are lower than for non-working immigrants through 2009 and have been fairly close to the same since then.

Non-working Prime-age Men in the United States: 1994-2016 Non-working Prime-age Men in the United States: 1994-2016

The following graphs show the percentages for all of the native subgroups using the old and new programs.

Non-working men in the United States (aged 25-54 with a master's degree): 1994-2016 Non-working men in the United States (aged 25-54 with a doctorate degree): 1994-2016

The tables that correspond to these two graphs are below. As can be seen, the percentages of non-working native men are less when disabled and retired men are excluded. However, the percentages still follow similar patterns. There is a pre-tech-crash level that goes from 1994 to 2001, a post-tech-crash level that goes from about 2004 to 2007 or 2008, a post-financial crisis peak that is reached in 2012 or 2013, and then a current recovery low reached in 2016. Also, the percentages are by far the highest for native men without a high school diploma and generally get lower and lower with each higher level of degree.

[1] "Non-working men in the United States"                          |  [1] "Non-working native men in the United States"
[1] "(Native men aged 25-54, by highest degree earned)"             |  [1] "(aged 25-54, not disabled or retired, by highest degree earned)"
[1] ""                                                              |  [1] ""
   year     None High School Associate Bachelor   Master Doctorate  |     year      None High School Associate Bachelor   Master Doctorate
1  1994 23.93476    8.158448  3.896971 3.640087 2.369354 1.9878545  |
2  1995 25.79986    7.659562  4.388986 3.348897 1.890308 1.1966235  |  1  1995 11.048307    3.969391  2.591929 2.442985 1.436833 1.0644856
3  1996 24.09733    8.246367  4.332988 3.522857 2.458127 1.1240355  |  2  1996 10.431690    4.329267  2.700989 2.311037 1.558790 0.7813831
4  1997 22.17640    7.834706  4.719395 3.003711 2.475704 3.3732589  |  3  1997  9.500403    3.396144  2.232627 1.985542 1.500015 2.1844244
5  1998 22.50329    7.733667  5.044659 2.598489 1.794520 1.7700976  |  4  1998  9.326256    3.307981  2.557800 1.714335 1.358966 1.4345468
6  1999 23.42595    7.751093  5.971421 3.677120 2.320554 5.5305467  |  5  1999  8.912069    3.072446  2.651000 2.165487 1.379175 3.2346097
7  2000 24.64538    8.107736  4.301428 2.826800 2.856068 1.5169964  |  6  2000  9.598931    3.447551  1.519486 1.653959 2.091263 0.6489769
8  2001 23.33722    8.302969  4.219798 3.444472 2.983891 3.4924304  |  7  2001  7.640450    3.103152  1.774484 2.114655 1.845905 2.0178369
9  2002 26.10195    9.170032  4.961840 3.600853 2.761848 3.8744776  |  8  2002 11.031562    3.827119  2.713626 2.213669 1.753767 2.0599418
10 2003 24.90467   10.135294  6.224749 3.974275 3.665574 1.8380122  |  9  2003  9.788979    5.053351  2.779903 2.561775 2.420605 0.8798000
11 2004 27.98283   11.385504  6.673425 4.913583 4.202391 5.6700937  |  10 2004 10.492163    5.520809  3.034578 2.895233 2.500184 2.0146012
12 2005 26.80580   11.764377  6.361591 4.756703 4.028958 1.7104130  |  11 2005 10.508144    5.561023  3.005369 3.118164 2.270327 1.3519057
13 2006 27.53624   11.223668  6.524083 4.435543 2.895372 2.4916975  |  12 2006 12.228161    5.237080  3.550063 2.469026 1.586251 0.8196643
14 2007 27.75558   11.214378  6.097780 3.827268 2.960263 5.0618161  |  13 2007 11.827666    5.319175  3.099831 2.381781 1.892904 4.0588827
15 2008 29.30280   11.571590  6.461713 4.605218 3.157504 2.9787060  |  14 2008 13.041878    5.603615  3.355218 2.878247 2.146078 0.9251455
16 2009 31.72768   12.788053  7.043775 4.379115 3.644760 3.6361968  |  15 2009 14.408642    6.527108  3.514557 2.986322 1.926392 1.7155335
17 2010 34.31836   15.274047  8.440003 5.912770 4.336778 0.7922547  |  16 2010 18.588905    9.290675  5.170669 4.289186 2.838599 0.4186806
18 2011 36.63066   17.091594 10.024771 6.340361 5.108762 5.2275024  |  17 2011 21.305855   10.570562  5.892957 4.527796 3.757421 3.6489758
19 2012 37.02843   17.216450 10.482695 6.649542 6.350413 4.3658314  |  18 2012 19.545043   10.556135  5.979951 4.920452 4.254250 2.8845053
20 2013 39.52832   16.409264  9.349281 6.062037 5.551721 4.9366017  |  19 2013 24.655405    9.570819  5.281138 4.509749 3.731786 3.1907374
21 2014 38.55372   16.939782 10.255942 6.869011 4.722501 4.5033449  |  20 2014 21.790390    9.314117  5.831258 4.948481 2.925943 2.6825799
22 2015 36.66128   16.758037  8.971155 6.500578 4.278258 4.4359567  |  21 2015 18.560155    9.043120  5.460408 4.398864 2.909817 3.2347204
23 2016 33.64799   16.273820  9.122731 5.846723 4.537243 3.2254183  |  22 2016 15.781907    8.780200  5.316584 4.116720 3.387643 1.9143537
Press enter to continue, escape to exit                                Press enter to continue, escape to exit
The following graphs show the percentages for all of the immigrant subgroups using the old and new programs.

Non-working men in the United States (aged 25-54 with a master's degree): 1994-2016 Non-working men in the United States (aged 25-54 with a doctorate degree): 1994-2016

The tables that correspond to these two graphs are below. As can be seen, any changes following the tech-crash or financial crisis are less discernible for immigrants. The overall levels seem slightly lower when disabled and retired men are excluded, topping out in 2011 at just above 10 rather than 12 percent. Likewise, the percentages tend to be lower for immigrant men with each higher level of degree but this is likewise less discernible.

[1] "Non-working men in the United States"                            |  [1] "Non-working immigrant men in the United States"
[1] "(Immigrant men aged 25-54, by highest degree earned)"            |  [1] "(aged 25-54, not disabled or retired, by highest degree earned)"
[1] ""                                                                |  [1] ""
   year      None High School Associate  Bachelor   Master Doctorate  |     year     None High School Associate Bachelor   Master Doctorate
1  1994 14.033034   11.493449  8.201668  7.587995 6.557907  7.584898  |
2  1995 10.820140   12.052305  8.789175  9.027794 7.523659  1.013484  |  1  1995 7.330412    9.267499  7.693659 7.927455 6.733559  1.013484
3  1996 11.932034    9.526647  9.359483  8.814626 4.581276  6.144976  |  2  1996 8.059229    7.321660  9.287818 7.743603 4.249432  4.223218
4  1997 10.956381    9.813705  6.174171  6.984775 9.539714  3.012604  |  3  1997 7.925817    8.307901  6.209094 5.661433 8.527706  3.012604
5  1998  8.714946    7.996812  9.291066  6.974422 9.747802  1.910817  |  4  1998 6.206645    6.445191  6.650490 6.852248 9.166293  1.910817
6  1999  8.531789    8.363127  3.048060  8.867331 4.361156  7.036046  |  5  1999 5.547831    6.545960  3.087448 7.322615 4.161811  5.581273
7  2000  9.169114    7.697235  6.262186  5.433323 5.624510  5.677590  |  6  2000 6.053492    4.959868  6.262186 4.894527 5.683816  4.834512
8  2001  9.758264    7.620222  9.136511  6.750867 7.315921  3.721288  |  7  2001 7.064123    5.393381  7.485772 6.117435 5.603652  2.814083
9  2002  8.617679    8.694351 10.315303  5.793964 6.351902  6.252799  |  8  2002 5.663988    6.987875  7.710801 4.840935 5.455150  6.313112
10 2003  9.840201    9.392804 10.470152  8.537524 7.234353  3.209059  |  9  2003 6.565908    6.976276  6.715102 7.666644 6.711454  3.209059
11 2004 10.604418    8.756340 10.372942  9.079274 7.497957  4.564734  |  10 2004 7.900639    6.390199  8.296535 8.037116 6.839079  3.820686
12 2005  8.064483    9.326450  5.301334  8.456405 7.884339  2.868461  |  11 2005 5.687213    6.949563  3.397037 7.274183 6.914273  2.868461
13 2006  8.590182    7.471582  4.605281  7.873194 7.227543  6.732683  |  12 2006 6.074488    5.273809  2.378777 6.913610 6.270162  6.732683
14 2007  7.757602    8.479059  9.103436  7.529590 4.676437  5.026372  |  13 2007 5.310396    6.026410  7.685323 6.271134 4.276588  5.026372
15 2008  8.312823    9.350206  8.213938  6.774803 6.344314  4.767132  |  14 2008 5.408784    6.726654  7.565692 5.693464 6.053311  3.874791
16 2009  8.276154    9.178335  8.483715  6.876599 5.042763  6.877191  |  15 2009 5.564011    7.375030  7.233666 5.889929 3.897021  5.550368
17 2010 12.175131   11.198450 10.943474  9.779326 6.959284  3.792619  |  16 2010 9.293035    9.037258  9.945069 8.138775 6.512681  2.562861
18 2011 12.420359   12.193465 10.033817 10.500795 7.358617  4.638000  |  17 2011 9.504431   10.255605  7.508032 9.686682 6.434427  3.376363
19 2012 10.866810   11.412889 10.593446 10.068147 7.693326  6.525689  |  18 2012 8.566408    9.258888  8.750570 9.214269 6.767762  6.442759
20 2013 10.560259   10.835212  9.226796  9.736871 6.290508  2.255934  |  19 2013 7.618328    8.959526  8.066934 9.174862 5.754231  2.255934
21 2014  9.348807   10.006865  6.818387  6.860362 6.360493  2.969372  |  20 2014 6.665344    8.048224  6.094223 5.654849 6.006511  3.014964
22 2015  9.956958   10.806339  7.066466  9.217391 9.067074  5.380317  |  21 2015 6.727441    8.639447  5.722436 8.697853 8.430892  5.380317
23 2016  8.188258   11.252606 11.346355  8.282496 6.654622  9.846073  |  22 2016 5.388443    8.808517  8.803258 7.224750 5.967010  9.367113
Press enter to continue, escape to exit                                  Press enter to continue, escape to exit
In summary, the 2.5 million number in George Borjas' question of "Where Did 2.5 Million Native Working Men Go?" is based on a rise of 5 percent, from 8 percent to 13 percent, in a population of about 50 million prime-age native men. Excluding those men who are unable to work (presumedly on disability) or retired causes the rise to be 3 percent, from 4 to 7 percent. Three percent of 50 million is 1.5 million. Hence, it would seem that about a million of the 2.5 million prime-age native men became disabled or retired (chiefly the former) and the other 1.5 million includes the unemployed and those who disappeared from the labor force with the reason of "other". I would presume that this latter group is chiefly men who were unable to find work but dropped off the unemployment rolls, having used up all of their allowed unemployment.
A Look At Non-working Prime-age Men Using R: 1994-2018
A Look At Mariel Using R