Analysis of "STEM Workers, H-1B Visas, and Productivity in US Cities"

Part 1

15. Replication of Data in PSS_Data.dta

As mentioned in the previous sections, all of the prior analysis is based on data from the file PSS_Data.dta. This is included in the supplemental data that can be downloaded from the Journal of Labor Economics website. PSS_Data.dta contains 176 variables as shown in the following listing:

> pss_data <- read.dta("PSS_Data.dta")
> names(pss_data)
  [1] "year"                      "metarea"                   "nat_coll_wkwage"           "nat_coll_emp"              "nat_nocoll_wkwage"
  [6] "nat_nocoll_emp"            "nat_hs_wkwage"             "nat_hs_emp"                "nat_nohs_wkwage"           "nat_nohs_emp"
 [11] "nat_stemO4_wkwage"         "nat_stemO4_emp"            "nat_nonstemO4_wkwage"      "nat_nonstemO4_emp"         "nat_coll_stemO4_wkwage"
 [16] "nat_coll_stemO4_emp"       "nat_coll_nonstemO4_wkwage" "nat_coll_nonstemO4_emp"    "nat_stemO8_wkwage"         "nat_stemO8_emp"
 [21] "nat_nonstemO8_wkwage"      "nat_nonstemO8_emp"         "nat_coll_stemO8_wkwage"    "nat_coll_stemO8_emp"       "nat_coll_nonstemO8_wkwage"
 [26] "nat_coll_nonstemO8_emp"    "nat_stemM4_wkwage"         "nat_stemM4_emp"            "nat_nonstemM4_wkwage"      "nat_nonstemM4_emp"
 [31] "nat_coll_stemM4_wkwage"    "nat_coll_stemM4_emp"       "nat_coll_nonstemM4_wkwage" "nat_coll_nonstemM4_emp"    "nat_stemM8_wkwage"
 [36] "nat_stemM8_emp"            "nat_nonstemM8_wkwage"      "nat_nonstemM8_emp"         "nat_coll_stemM8_wkwage"    "nat_coll_stemM8_emp"
 [41] "nat_coll_nonstemM8_wkwage" "nat_coll_nonstemM8_emp"    "nat_sector1"               "nat_sector2"               "nat_sector3"
 [46] "nat_sector4"               "nat_sector5"               "nat_sector6"               "nat_sector7"               "nat_sector8"
 [51] "nat_sector9"               "nat_sector10"              "nat_sector11"              "nat_sector12"              "nat_sector13"
 [56] "nat_coll_sector1"          "nat_coll_sector2"          "nat_coll_sector3"          "nat_coll_sector4"          "nat_coll_sector5"
 [61] "nat_coll_sector6"          "nat_coll_sector7"          "nat_coll_sector8"          "nat_coll_sector9"          "nat_coll_sector10"
 [66] "nat_coll_sector11"         "nat_coll_sector12"         "nat_coll_sector13"         "nat_nocoll_avgrentpr"      "nat_nocoll_medrentpr"
 [71] "nat_coll_avgrentpr"        "nat_coll_medrentpr"        "imm_stemO4"                "imm_nonstemO4"             "imm_coll_stemO4"
 [76] "imm_coll_nonstemO4"        "imm_stemO8"                "imm_nonstemO8"             "imm_coll_stemO8"           "imm_coll_nonstemO8"
 [81] "imm_stemM4"                "imm_nonstemM4"             "imm_coll_stemM4"           "imm_coll_nonstemM4"        "imm_stemM8"
 [86] "imm_nonstemM8"             "imm_coll_stemM8"           "imm_coll_nonstemM8"        "imm_coll"                  "imm_nocoll"
 [91] "imm_hs"                    "imm_nohs"                  "tot_stemO4"                "tot_stemO8"                "tot_stemM4"
 [96] "tot_stemM8"                "imm"                       "nat"                       "imm_noi"                   "imm_stemO4_noi"
[101] "imm_nonstemO4_noi"         "imm_coll_stemO4_noi"       "imm_coll_nonstemO4_noi"    "imm_stemO8_noi"            "imm_nonstemO8_noi"
[106] "imm_coll_stemO8_noi"       "imm_coll_nonstemO8_noi"    "imm_stemM4_noi"            "imm_nonstemM4_noi"         "imm_coll_stemM4_noi"
[111] "imm_coll_nonstemM4_noi"    "imm_stemM8_noi"            "imm_nonstemM8_noi"         "imm_coll_stemM8_noi"       "imm_coll_nonstemM8_noi"
[116] "labforce"                  "labforce_noi"              "popwt"                     "indian"                    "mexican"
[121] "pred_coll_wkwage"          "pred_nocoll_wkwage"        "pred_wkwage"               "pred_coll_emp"             "pred_nocoll_emp"
[126] "pred_emp"                  "imm_stemO4_H1B_hat80"      "imm_coll_stemO4_H1B_hat80" "imm_stemO4_H1B_hat70"      "imm_stemO8_H1B_hat80"
[131] "imm_coll_stemO8_H1B_hat80" "imm_stemO8_H1B_hat70"      "imm_stemM4_H1B_hat80"      "imm_coll_stemM4_H1B_hat80" "imm_stemM4_H1B_hat70"
[136] "imm_stemM8_H1B_hat80"      "imm_coll_stemM8_H1B_hat80" "imm_stemM8_H1B_hat70"      "imm_stemO4_H1BL1_hat80"    "imm_stemO4_noi_H1B_hat80"
[141] "imm_stemO4_false1_hat80"   "imm_stemO4_false2_hat80"   "imm_stemO8_H1BL1_hat80"    "imm_stemO8_noi_H1B_hat80"  "imm_stemO8_false1_hat80"
[146] "imm_stemO8_false2_hat80"   "imm_stemM4_H1BL1_hat80"    "imm_stemM4_noi_H1B_hat80"  "imm_stemM4_false1_hat80"   "imm_stemM4_false2_hat80"
[151] "imm_stemM8_H1BL1_hat80"    "imm_stemM8_noi_H1B_hat80"  "imm_stemM8_false1_hat80"   "imm_stemM8_false2_hat80"   "imm_hat80"
[156] "mexican_hat80"             "indian_hat80"              "nat_coll_hat80"            "nat_nocoll_hat80"          "imm_coll_hat80"
[161] "imm_nocoll_hat80"          "imm_coll_noi_hat80"        "imm_nocoll_noi_hat80"      "labforce_hat80"            "labforce_noi_hat80"
[166] "imm_stemO4_H1Bagg_hat80"   "imm_stemO8_H1Bagg_hat80"   "imm_stemM4_H1Bagg_hat80"   "imm_stemM8_H1Bagg_hat80"   "imm_manualO8_false3_hat80"
[171] "imm_manualO4_false3_hat80" "statefip"                  "obs1980"                   "obs1970"                   "panel1980"
[176] "panel1970"
These variables contain the six key dependent variables and one key independent variable used in the analysis. They are as follows:
  [3] "nat_coll_wkwage"
  [4] "nat_coll_emp"
  [5] "nat_nocoll_wkwage"
  [6] "nat_nocoll_emp"
 [11] "nat_stemO4_wkwage"
 [12] "nat_stemO4_emp"
 [73] "imm_stemO4"
Page S230 of the study describes the source for these seven variables as follows:

Our data on the occupations, employment, wages, age, and education of individuals come from the Ruggles et al. (2010) Integrated Public Use Microdata Series (IPUMS) 5% census files for 1980, 1990, and 2000; the 1% ACS sample for 2005; and the 2008–10 3% merged ACS sample for 2010. We use data only on 219 MSAs consistently identified from 1980 through 2010.

In order to replicate these seven variables from the original IPUMS data, the data was selected and downloaded from the IPUMS USA website for the following samples:

Sample         Density
-------------  -------
1980 5% state   5.0%
1990 5%         5.0%
2000 5%         5.0%
2005 ACS        1.0%
2010 ACS 3yr    3.0%
For each sample, the following variables were extracted:
Type^ Variable              Label
----  --------              -----
H     YEAR                  Census year
H     DATANUM               Data set number
H     SERIAL                Household serial number
H     HHWT                  Household weight
H     STATEFIP              State (FIPS code)
H     METAREA (general)     Metropolitan area [general version]
H     METAREAD (detailed)   Metropolitan area [detailed version]
H     GQ                    Group quarters status
P     PERNUM                Person number in sample unit
P     PERWT                 Person weight
P     AGE                   Age
P     BPL (general)         Birthplace [general version]
P     BPLD (detailed)       Birthplace [detailed version]
P     CITIZEN               Citizenship status
P     EDUC (general)        Educational attainment [general version]
P     EDUCD (detailed)      Educational attainment [detailed version]
P     EMPSTAT (general)     Employment status [general version]
P     EMPSTATD (detailed)   Employment status [detailed version]
P     OCC                   Occupation
P     OCC1990               Occupation, 1990 basis
P     IND                   Industry
P     IND1990               Industry, 1990 basis
P     CLASSWKR (general)    Class of worker [general version]
P     CLASSWKRD (detailed)  Class of worker [detailed version]
P     WKSWORK1              Weeks worked last year*
P     WKSWORK2              Weeks worked last year, intervalled
P     INCWAGE               Wage and salary income

* WKSWORK1 not available from 2008 on
^ H=Household, P=Person
Based on the study and its appendix, the R program jole1data_rep.R below uses this data to replicate the seven variable in PSS_Data.dta and create the file PSS_Data_rep.csv. The program generates the following output:
> source("jole1data_rep.R")
[1] "START READ OF jole_80.dta"
[1] "PROCESS jole_80.dta"
[1] "11343120  Initial"
[1] "6940893  Age 18-65"
[1] "5331242  Worked 1 or more weeks"
[1] "5185976  Non-institutional"
[1] "############################################################"
[1] "1980  INCLUDE INCWAGE==0, EXCLUDE INCWAGE==999999"
[1] "############################################################"
[1] "5185976  Valid incwage"
[1] "############################################################"
[1] "1980  CONVERT TO TIME_CONSISTENT OCCUPATIONS"
[1] "############################################################"
[1] "5135244  Occupation not Military, Unemployed or Unknown"
[1] "5130925  Occupation not 167, 192, 304, or 308"
[1] "############################################################"
[1] "1980  IND1990 > 0 & IND1990 <= 244"
[1] "############################################################"
[1] "5130925  IND1990 - not N/A, worked since 1984 and responded"
[1] "5111349  Birthplace not Abroad, At sea, Other or Missing"
[1] "3469260  Metareas (219)"
[1] "Change 1980 to 2010 dollars"
[1] "Change 1980 to 2010 dollars"
[1] "3020  Size of aa"
[1] "START READ OF jole_90.dta"
[1] "PROCESS jole_90.dta"
[1] "12501046  Initial"
[1] "7707006  Age 18-65"
[1] "6218598  Worked 1 or more weeks"
[1] "6086603  Non-institutional"
[1] "############################################################"
[1] "1990  INCLUDE INCWAGE==0, EXCLUDE INCWAGE==999999"
[1] "############################################################"
[1] "6086603  Valid incwage"
[1] "############################################################"
[1] "1990  CONVERT TO TIME_CONSISTENT OCCUPATIONS"
[1] "############################################################"
[1] "6035019  Occupation not Military, Unemployed or Unknown"
[1] "6029261  Occupation not 167, 192, 304, or 308"
[1] "############################################################"
[1] "1990  IND1990 > 0 & IND1990 <= 244"
[1] "############################################################"
[1] "6029261  IND1990 - not N/A, worked since 1984 and responded"
[1] "6009872  Birthplace not Abroad, At sea, Other or Missing"
[1] "3859417  Metareas (219)"
[1] "Change 1990 to 2010 dollars"
[1] "Change 1990 to 2010 dollars"
[1] "6193  Size of aa"
[1] "START READ OF jole_00.dta"
[1] "PROCESS jole_00.dta"
[1] "14081466  Initial"
[1] "8681911  Age 18-65"
[1] "6979381  Worked 1 or more weeks"
[1] "6805782  Non-institutional"
[1] "############################################################"
[1] "2000  INCLUDE INCWAGE==0, EXCLUDE INCWAGE==999999"
[1] "############################################################"
[1] "6805782  Valid incwage"
[1] "############################################################"
[1] "2000  CONVERT TO TIME_CONSISTENT OCCUPATIONS"
[1] "############################################################"
[1] "6769389  Occupation not Military, Unemployed or Unknown"
[1] "6764427  Occupation not 167, 192, 304, or 308"
[1] "############################################################"
[1] "2000  IND1990 > 0 & IND1990 <= 244"
[1] "############################################################"
[1] "6764427  IND1990 - not N/A, worked since 1984 and responded"
[1] "6764427  Birthplace not Abroad, At sea, Other or Missing"
[1] "4578682  Metareas (219)"
[1] "Change 2000 to 2010 dollars"
[1] "Change 2000 to 2010 dollars"
[1] "9493  Size of aa"
[1] "START READ OF jole_05.dta"
[1] "PROCESS jole_05.dta"
[1] "2878380  Initial"
[1] "1778997  Age 18-65"
[1] "1439462  Worked 1 or more weeks"
[1] "1439462  Non-institutional"
[1] "############################################################"
[1] "2005  INCLUDE INCWAGE==0, EXCLUDE INCWAGE==999999"
[1] "############################################################"
[1] "1439462  Valid incwage"
[1] "############################################################"
[1] "2005  CONVERT TO TIME_CONSISTENT OCCUPATIONS"
[1] "############################################################"
[1] "1431690  Occupation not Military, Unemployed or Unknown"
[1] "1430597  Occupation not 167, 192, 304, or 308"
[1] "############################################################"
[1] "2005  IND1990 > 0 & IND1990 <= 244"
[1] "############################################################"
[1] "1430597  IND1990 - not N/A, worked since 1984 and responded"
[1] "1430428  Birthplace not Abroad, At sea, Other or Missing"
[1] "994735  Metareas (219)"
[1] "Change 2005 to 2010 dollars"
[1] "Change 2005 to 2010 dollars"
[1] "12505  Size of aa"
[1] "START READ OF jole_10.dta"
[1] "PROCESS jole_10.dta"
[1] "Create mm$WKSWORK1"
[1] "9093077  Initial"
[1] "5672423  Age 18-65"
[1] "4438018  Worked 1 or more weeks"
[1] "4349383  Non-institutional"
[1] "############################################################"
[1] "2010  INCLUDE INCWAGE==0, EXCLUDE INCWAGE==999999"
[1] "############################################################"
[1] "4349383  Valid incwage"
[1] "############################################################"
[1] "2010  CONVERT TO TIME_CONSISTENT OCCUPATIONS"
[1] "############################################################"
[1] "4323596  Occupation not Military, Unemployed or Unknown"
[1] "4320654  Occupation not 167, 192, 304, or 308"
[1] "############################################################"
[1] "2010  IND1990 > 0 & IND1990 <= 244"
[1] "############################################################"
[1] "4320654  IND1990 - not N/A, worked since 1984 and responded"
[1] "4320163  Birthplace not Abroad, At sea, Other or Missing"
[1] "3022724  Metareas (219)"
[1] "15699  Size of aa"
[1] "ALL IPUMS FILES READ"
[1] "START OF AGGREGATE AND MERGE INTO FINAL FILES"
[1] "READ AND UPDATE PSS_data0.csv"
[1] "WRITE FINAL FILES PSS_Data_rep.txt and PSS_Data_rep.csv"
>
The output summarizes the filtering being done on the data. For example, the third line shows that 11,343,120 is the initial population for the 1980 data and the fourth line shows that this is reduced to 6,940,893 after filtering to just those between 18 and 65 in age.

16. Difference in Filtering Between Wages and Employment

The process of replication showed up one interesting item. In order to replicate the data, it was necessary to do one extra bit of filtering just on the wages. For those variables (the three ending with _wkwage), workers who earned an income of zero had to be filtered out. However, for the employment variables (imm_stemO4 and the three ending with _emp), all workers with valid incomes, including those with an income with zero, are included. Hence, these two sets of variables are looking at slightly different populations.

In any event, jole1data_rep.R successfully replicates all but two instances of the seven variables within a tolerance of one percent. The following table, generated by the R program jole1comp90_1.R below summarizes the differences and lists these two instances:

[1] "COMPARING PSS_Data.dta to PSS_Data_rep.csv at tolerance 1"
[1] ""
[1] "TOTAL  DIFF  1990  2000  2005  2010    NA  VARIABLE"
[1] "-----  ----  ----  ----  ----  ----  ----  --------"
[1] "  876     0     0     0     0     0     0  nat_stemO4_wkwage"
[1] "  876     0     0     0     0     0     0  nat_coll_wkwage"
[1] "  876     0     0     0     0     0     0  nat_nocoll_wkwage"
[1] "  876     0     0     0     0     0     0  nat_stemO4_emp"
[1] "  876     0     0     0     0     0     0  nat_coll_emp"
[1] "  876     0     0     0     0     0     0  nat_nocoll_emp"
[1] "  876    41     0     1     0     1    39  imm_stemO4"
[1] ""
[1] "COMPARING PSS_Data.dta to PSS_Data_rep.csv at tolerance 1"
[1] ""
[1] "  N  VARIABLE             % DIFF      PSS     CALC  YEAR  METAREA"
[1] "---  -----------------  --------  -------  -------  ----  -------------"
[1] "  1  imm_stemO4             9.63      509      558  2010  Bremerton, WA"
[1] "  2  imm_stemO4           -10.88      239      213  2000  Canton, OH"
As can be seen, all instances of the seven variables replicated within a tolerance of one percent except for two. Those both differed by about ten percent.

Two modified versions of jole1data_rep.R were created to calculate the results if the same filtering was applied to all seven variables. The programs jole1data_all.R and jole1data_pos.R are identical to jole1data_rep.R except that the former filters all seven variables for all valid incomes, including zero, and the latter filters them for just positive valid incomes. The output of those programs are saved in PSS_Data_all.csv and PSS_Data_pos.csv, respectively. The following are two other tables that are generated by jole1comp90_1:

[1] "COMPARING PSS_Data.dta to PSS_Data_all.csv at tolerance 1"
[1] ""
[1] "TOTAL  DIFF  1990  2000  2005  2010    NA  VARIABLE"
[1] "-----  ----  ----  ----  ----  ----  ----  --------"
[1] "  876   705   184   184   155   182     0  nat_stemO4_wkwage"
[1] "  876   876   219   219   219   219     0  nat_coll_wkwage"
[1] "  876   876   219   219   219   219     0  nat_nocoll_wkwage"
[1] "  876     0     0     0     0     0     0  nat_stemO4_emp"
[1] "  876     0     0     0     0     0     0  nat_coll_emp"
[1] "  876     0     0     0     0     0     0  nat_nocoll_emp"
[1] "  876    41     0     1     0     1    39  imm_stemO4"
[1] ""
[1] "COMPARING PSS_Data.dta to PSS_Data_pos.csv at tolerance 1"
[1] ""
[1] "TOTAL  DIFF  1990  2000  2005  2010    NA  VARIABLE"
[1] "-----  ----  ----  ----  ----  ----  ----  --------"
[1] "  876     0     0     0     0     0     0  nat_stemO4_wkwage"
[1] "  876     0     0     0     0     0     0  nat_coll_wkwage"
[1] "  876     0     0     0     0     0     0  nat_nocoll_wkwage"
[1] "  876   707   185   184   155   183     0  nat_stemO4_emp"
[1] "  876   876   219   219   219   219     0  nat_coll_emp"
[1] "  876   876   219   219   219   219     0  nat_nocoll_emp"
[1] "  876   293    68    68    47    71    39  imm_stemO4"
As the titles suggest, these tables compare the data in PSS_Data_all.csv and PSS_Data_pos.csv to the the orignal data in PSS_Data.dta. As can be seen, filtering for all incomes causes significant changes to most of the wage data and filtering for just positive incomes causes significant changes to most of the employment data.

The R program jole1comp90_0z1.R below is identical to jole1comp90_1.R except that it uses a tolerance of 0.1 percent instead of 1 percent. Following is the output:

> source("jole1comp90_0z1.R")
[1] ""
[1] "COMPARING PSS_Data.dta to PSS_Data_rep.csv at tolerance 0.1"
[1] ""
[1] "TOTAL  DIFF  1990  2000  2005  2010    NA  VARIABLE"
[1] "-----  ----  ----  ----  ----  ----  ----  --------"
[1] "  876     5     3     0     0     2     0  nat_stemO4_wkwage"
[1] "  876     7     3     0     4     0     0  nat_coll_wkwage"
[1] "  876     6     1     0     3     2     0  nat_nocoll_wkwage"
[1] "  876    13     8     1     0     4     0  nat_stemO4_emp"
[1] "  876    23    15     0     6     2     0  nat_coll_emp"
[1] "  876    34    20     1     6     7     0  nat_nocoll_emp"
[1] "  876    49     3     2     0     5    39  imm_stemO4"
[1] ""
[1] "COMPARING PSS_Data.dta to PSS_Data_pos.csv at tolerance 0.1"
[1] ""
[1] "TOTAL  DIFF  1990  2000  2005  2010    NA  VARIABLE"
[1] "-----  ----  ----  ----  ----  ----  ----  --------"
[1] "  876     5     3     0     0     2     0  nat_stemO4_wkwage"
[1] "  876     7     3     0     4     0     0  nat_coll_wkwage"
[1] "  876     6     1     0     3     2     0  nat_nocoll_wkwage"
[1] "  876   791   207   208   169   207     0  nat_stemO4_emp"
[1] "  876   876   219   219   219   219     0  nat_coll_emp"
[1] "  876   876   219   219   219   219     0  nat_nocoll_emp"
[1] "  876   358    78    95    53    93    39  imm_stemO4"
[1] ""
[1] "COMPARING PSS_Data.dta to PSS_Data_all.csv at tolerance 0.1"
[1] ""
[1] "TOTAL  DIFF  1990  2000  2005  2010    NA  VARIABLE"
[1] "-----  ----  ----  ----  ----  ----  ----  --------"
[1] "  876   791   207   208   169   207     0  nat_stemO4_wkwage"
[1] "  876   876   219   219   219   219     0  nat_coll_wkwage"
[1] "  876   876   219   219   219   219     0  nat_nocoll_wkwage"
[1] "  876    13     8     1     0     4     0  nat_stemO4_emp"
[1] "  876    23    15     0     6     2     0  nat_coll_emp"
[1] "  876    34    20     1     6     7     0  nat_nocoll_emp"
[1] "  876    49     3     2     0     5    39  imm_stemO4"
[1] ""
[1] "COMPARING PSS_Data.dta to PSS_Data_rep.csv at tolerance 0.1"
[1] ""
[1] "  N  VARIABLE             % DIFF      PSS     CALC  YEAR  METAREA"
[1] "---  -----------------  --------  -------  -------  ----  -------------"
[1] "  1  nat_stemO4_wkwage      0.32     1583     1588  2010  Bremerton, WA"
[1] "  2  nat_stemO4_wkwage      0.12     1169     1171  1990  Honolulu, HI"
[1] "  3  nat_stemO4_wkwage      0.59     1236     1243  1990  New Haven-Meriden, CT"
[1] "  4  nat_stemO4_wkwage      0.34     1043     1046  1990  Savannah, GA"
[1] "  5  nat_stemO4_wkwage     -0.12     1328     1327  2010  Stockton, CA"
[1] "  1  nat_coll_wkwage       -0.37     1140     1135  1990  Benton Harbor, MI"
[1] "  2  nat_coll_wkwage        0.20     1480     1483  2005  Charleston-N.Charleston,SC"
[1] "  3  nat_coll_wkwage        0.11     1254     1256  2005  Honolulu, HI"
[1] "  4  nat_coll_wkwage       -0.16     1478     1475  2005  Melbourne-Titusville-Cocoa-Palm Bay, FL"
[1] "  5  nat_coll_wkwage        0.23     1457     1460  2005  South Bend-Mishawaka, IN"
[1] "  6  nat_coll_wkwage        0.11     2012     2015  1990  Stamford, CT"
[1] "  7  nat_coll_wkwage        0.19     1160     1162  1990  Vineland-Milville-Bridgetown, NJ"
[1] "  1  nat_nocoll_wkwage      0.15      908      909  2010  Anchorage, AK"
[1] "  2  nat_nocoll_wkwage      0.12      578      578  1990  Anniston, AL"
[1] "  3  nat_nocoll_wkwage      0.22      753      755  2005  Bremerton, WA"
[1] "  4  nat_nocoll_wkwage      0.25      836      838  2005  Honolulu, HI"
[1] "  5  nat_nocoll_wkwage      0.12      726      727  2010  Honolulu, HI"
[1] "  6  nat_nocoll_wkwage      0.15      668      669  2005  Salt Lake City-Ogden, UT"
[1] "  1  nat_stemO4_emp        -0.70     6997     6948  2010  Bremerton, WA"
[1] "  2  nat_stemO4_emp         0.35     7422     7448  2000  Canton, OH"
[1] "  3  nat_stemO4_emp        -0.14    12441    12423  1990  Colorado Springs, CO"
[1] "  4  nat_stemO4_emp        -0.26    14899    14861  1990  Fort Lauderdale-Hollywood-Pompano Beach, FL"
[1] "  5  nat_stemO4_emp        -0.15    13113    13093  1990  Honolulu, HI"
[1] "  6  nat_stemO4_emp        -0.19    18968    18932  2010  Melbourne-Titusville-Cocoa-Palm Bay, FL"
[1] "  7  nat_stemO4_emp        -0.24    12286    12257  1990  Miami-Hialeah, FL"
[1] "  8  nat_stemO4_emp        -0.73     4242     4211  1990  New Haven-Meriden, CT"
[1] "  9  nat_stemO4_emp        -0.10    23140    23116  1990  Orlando, FL"
[1] " 10  nat_stemO4_emp        -0.14    37084    37031  2010  Riverside-San Bernardino,CA"
[1] " 11  nat_stemO4_emp        -0.75     2678     2658  1990  Savannah, GA"
[1] " 12  nat_stemO4_emp        -0.34     6680     6657  2010  Stockton, CA"
[1] " 13  nat_stemO4_emp        -0.17     6641     6630  1990  York, PA"
[1] "  1  nat_coll_emp          -0.13    34162    34117  1990  Augusta-Aiken, GA-SC"
[1] "  2  nat_coll_emp          -0.28    13501    13463  1990  Benton Harbor, MI"
[1] "  3  nat_coll_emp          -0.11    27731    27700  1990  Bridgeport, CT"
[1] "  4  nat_coll_emp          -0.16    40926    40861  2005  Canton, OH"
[1] "  5  nat_coll_emp          -0.24    76420    76233  2005  Charleston-N.Charleston,SC"
[1] "  6  nat_coll_emp          -0.10    34868    34832  1990  El Paso, TX"
[1] "  7  nat_coll_emp          -0.15    44288    44222  2010  Eugene-Springfield, OR"
[1] "  8  nat_coll_emp          -0.11    16812    16794  1990  Fayetteville, NC"
[1] "  9  nat_coll_emp          -0.20   111912   111688  2005  Honolulu, HI"
[1] " 10  nat_coll_emp          -0.16   120107   119909  2010  Honolulu, HI"
[1] " 11  nat_coll_emp          -0.13    17118    17095  1990  McAllen-Edinburg-Pharr-Mission, TX"
[1] " 12  nat_coll_emp          -0.15    57438    57353  2005  Melbourne-Titusville-Cocoa-Palm Bay, FL"
[1] " 13  nat_coll_emp          -0.14   114717   114558  1990  Miami-Hialeah, FL"
[1] " 14  nat_coll_emp          -0.19    26578    26528  1990  New Haven-Meriden, CT"
[1] " 15  nat_coll_emp          -0.11    45234    45183  1990  Providence-Fall River-Pawtucket, MA/RI"
[1] " 16  nat_coll_emp          -0.11    51671    51612  2005  Reno, NV"
[1] " 17  nat_coll_emp          -0.17    10839    10821  1990  Roanoke, VA"
[1] " 18  nat_coll_emp          -0.11    27691    27660  1990  Salinas-Sea Side-Monterey, CA"
[1] " 19  nat_coll_emp          -0.12   213055   212803  1990  San Jose, CA"
[1] " 20  nat_coll_emp          -0.32    29457    29362  2005  South Bend-Mishawaka, IN"
[1] " 21  nat_coll_emp          -0.19    20554    20515  1990  Stamford, CT"
[1] " 22  nat_coll_emp          -0.25     7880     7860  1990  Vineland-Milville-Bridgetown, NJ"
[1] " 23  nat_coll_emp          -0.18    15334    15306  1990  Waco, TX"
[1] "  1  nat_nocoll_emp        -0.25    91369    91139  2010  Anchorage, AK"
[1] "  2  nat_nocoll_emp        -0.23    43693    43594  1990  Anniston, AL"
[1] "  3  nat_nocoll_emp        -0.11   147732   147571  1990  Augusta-Aiken, GA-SC"
[1] "  4  nat_nocoll_emp        -0.36    80960    80669  2005  Bremerton, WA"
[1] "  5  nat_nocoll_emp        -0.14    83261    83141  1990  Bridgeport, CT"
[1] "  6  nat_nocoll_emp        -0.21    58049    57928  1990  Brownsville-Harlingen-San Benito, TX"
[1] "  7  nat_nocoll_emp        -0.15   137333   137123  1990  El Paso, TX"
[1] "  8  nat_nocoll_emp        -0.12    85947    85848  1990  Fayetteville, NC"
[1] "  9  nat_nocoll_emp        -0.11   178941   178752  1990  Hartford-Bristol-Middleton- New Britain, CT"
[1] " 10  nat_nocoll_emp        -0.42   222260   221334  2005  Honolulu, HI"
[1] " 11  nat_nocoll_emp        -0.52   223262   222091  2010  Honolulu, HI"
[1] " 12  nat_nocoll_emp        -0.12   111057   110927  2010  Kileen-Temple, TX"
[1] " 13  nat_nocoll_emp        -0.12    59397    59324  1990  Longview-Marshall, TX"
[1] " 14  nat_nocoll_emp        -0.11  2640898  2638028  1990  Los Angeles-Long Beach, CA"
[1] " 15  nat_nocoll_emp        -0.13    53446    53375  1990  Medford, OR"
[1] " 16  nat_nocoll_emp        -0.19   323762   323141  1990  Miami-Hialeah, FL"
[1] " 17  nat_nocoll_emp        -0.13    73512    73413  1990  New Haven-Meriden, CT"
[1] " 18  nat_nocoll_emp        -0.13  3919102  3913923  1990  New York-Northeastern NJ"
[1] " 19  nat_nocoll_emp        -0.16   153151   152905  1990  Providence-Fall River-Pawtucket, MA/RI"
[1] " 20  nat_nocoll_emp        -0.11    41673    41628  1990  Pueblo, CO"
[1] " 21  nat_nocoll_emp        -0.10   125549   125418  2010  Reno, NV"
[1] " 22  nat_nocoll_emp         0.12    90026    90132  2000  Salem, OR"
[1] " 23  nat_nocoll_emp        -0.44    52505    52274  2005  Salinas-Sea Side-Monterey, CA"
[1] " 24  nat_nocoll_emp        -0.40   489617   487649  2005  Salt Lake City-Ogden, UT"
[1] " 25  nat_nocoll_emp        -0.22   475853   474787  2010  Salt Lake City-Ogden, UT"
[1] " 26  nat_nocoll_emp        -0.14   634807   633934  2005  San Diego, CA"
[1] " 27  nat_nocoll_emp        -0.12   403965   403498  1990  San Jose, CA"
[1] " 28  nat_nocoll_emp        -0.12   667722   666952  2005  Seattle-Everett, WA"
[1] " 29  nat_nocoll_emp        -0.13   154940   154742  2010  Spokane, WA"
[1] " 30  nat_nocoll_emp        -0.12    43588    43534  1990  State College, PA"
[1] " 31  nat_nocoll_emp        -0.13   148615   148416  1990  Stockton, CA"
[1] " 32  nat_nocoll_emp        -0.18   252484   252027  2010  Tacoma, WA"
[1] " 33  nat_nocoll_emp        -0.17    36755    36694  1990  Terre Haute, IN"
[1] " 34  nat_nocoll_emp        -0.12   253797   253496  1990  West Palm Beach-Boca Raton-Delray Beach, FL"
[1] "  1  imm_stemO4            -0.30     5404     5388  1990  Atlanta, GA"
[1] "  2  imm_stemO4             9.63      509      558  2010  Bremerton, WA"
[1] "  3  imm_stemO4           -10.88      239      213  2000  Canton, OH"
[1] "  4  imm_stemO4             0.25    26367    26434  2010  Detroit, MI"
[1] "  5  imm_stemO4             0.29     3443     3453  2010  Honolulu, HI"
[1] "  6  imm_stemO4            -0.78     3322     3296  1990  Portland, OR-WA"
[1] "  7  imm_stemO4            -0.28     9018     8993  2000  Portland, OR-WA"
[1] "  8  imm_stemO4            -0.49     5972     5943  1990  Riverside-San Bernardino,CA"
[1] "  9  imm_stemO4             0.43    12410    12463  2010  Riverside-San Bernardino,CA"
[1] " 10  imm_stemO4             0.88     2628     2651  2010  Stockton, CA"
>
As can be seen, there are more (137) instances outside the tolerance of 0.1 percent but still a relatively small number.

The R programs jole1data_pss.R, jole1data_rep.R, jole1data_all.R, and jole1data_pos.R generate the data for PSS_Data_pss.csv, PSS_Data_rep.csv, PSS_Data_all.csv, and PSS_Data_pos.csv, respectively. As previously mentioned, this is for the original PSS data, the replicated PSS data, the replicated PSS data filtered for all valid incomes, and replicated PSS data filtered for all positive incomes, respectively. The first table output by each of the R programs tab5_pss.R, tab5_rep.R, tab5_all.R, and tab5_pos.R show the main regressions for these four cases. Following are those four tables:

[1] "OUTPUT FROM tab4_pss.R
[1] "======================"
[1] "1990-2010 USING STUDY'S FORMULA "
[1] ""
[1] " N  INTERCEPT    SLOPE    STUDY    % DIFF     S.E.   T-STAT    P-VAL  DESCRIPTION"
[1] "--  ---------  --------  -------  -------  -------  -------  -------  -----------------------------------"
[1] " 1)   -0.3322    6.6191   6.6500    -0.46    7.073    0.936    0.350  Weekly Wage, Native STEM"
[1] " 2)   -0.0764    8.0414   8.0300     0.14    3.804    2.114    0.035  Weekly Wage, Native College-Educated"
[1] " 3)    0.0594    3.6174   3.7800    -4.30    1.972    1.835    0.067  Weekly Wage, Native Non-College-Educated"
[1] " 4)    0.0272    0.5349   0.5300     0.92    0.296    1.806    0.072  Employment, Native STEM"
[1] " 5)    0.0311    2.4942   2.4800     0.57    1.452    1.718    0.087  Employment, Native College-Educated"
[1] " 6)   -0.2557   -5.1701  -5.1700     0.00    3.178   -1.627    0.104  Employment, Native Non-College-Educated"
[1] ""
[1] "OUTPUT FROM tab4_rep.R
[1] "======================"
[1] "1990-2010 USING STUDY'S FORMULA "
[1] ""
[1] " N  INTERCEPT    SLOPE    STUDY    % DIFF     S.E.   T-STAT    P-VAL  DESCRIPTION"
[1] "--  ---------  --------  -------  -------  -------  -------  -------  -----------------------------------"
[1] " 1)   -0.3542    5.6406   6.6500   -15.18    7.090    0.796    0.427  Weekly Wage, Native STEM"
[1] " 2)   -0.0416    8.5377   8.0300     6.32    3.825    2.232    0.026  Weekly Wage, Native College-Educated"
[1] " 3)    0.0393    3.4182   3.7800    -9.57    1.996    1.712    0.088  Weekly Wage, Native Non-College-Educated"
[1] " 4)    0.0257    0.4231   0.5300   -20.18    0.308    1.375    0.170  Employment, Native STEM"
[1] " 5)    0.0251    2.2430   2.4800    -9.56    1.521    1.475    0.141  Employment, Native College-Educated"
[1] " 6)   -0.2720   -7.3076  -5.1700    41.35    3.427   -2.133    0.034  Employment, Native Non-College-Educated"
[1] ""
[1] "OUTPUT FROM tab4_all.R
[1] "======================"
[1] "1990-2010 USING STUDY'S FORMULA "
[1] ""
[1] " N  INTERCEPT    SLOPE    STUDY    % DIFF     S.E.   T-STAT    P-VAL  DESCRIPTION"
[1] "--  ---------  --------  -------  -------  -------  -------  -------  -----------------------------------"
[1] " 1)   -0.3344    5.1055   6.6500   -23.23    7.155    0.714    0.476  Weekly Wage, Native STEM"
[1] " 2)   -0.0303    7.6353   8.0300    -4.91    3.853    1.982    0.048  Weekly Wage, Native College-Educated"
[1] " 3)    0.0607    3.0270   3.7800   -19.92    2.047    1.479    0.140  Weekly Wage, Native Non-College-Educated"
[1] " 4)    0.0257    0.4231   0.5300   -20.18    0.308    1.375    0.170  Employment, Native STEM"
[1] " 5)    0.0251    2.2430   2.4800    -9.56    1.521    1.475    0.141  Employment, Native College-Educated"
[1] " 6)   -0.2720   -7.3076  -5.1700    41.35    3.427   -2.133    0.034  Employment, Native Non-College-Educated"
[1] ""
[1] "OUTPUT FROM tab4_pos.R
[1] "======================"
[1] "1990-2010 USING STUDY'S FORMULA "
[1] ""
[1] " N  INTERCEPT    SLOPE    STUDY    % DIFF     S.E.   T-STAT    P-VAL  DESCRIPTION"
[1] "--  ---------  --------  -------  -------  -------  -------  -------  -----------------------------------"
[1] " 1)   -0.3552    5.6635   6.6500   -14.83    7.118    0.796    0.427  Weekly Wage, Native STEM"
[1] " 2)   -0.0432    8.5724   8.0300     6.75    3.842    2.231    0.026  Weekly Wage, Native College-Educated"
[1] " 3)    0.0397    3.4365   3.7800    -9.09    2.006    1.713    0.088  Weekly Wage, Native Non-College-Educated"
[1] " 4)    0.0258    0.4032   0.5300   -23.93    0.303    1.331    0.184  Employment, Native STEM"
[1] " 5)    0.0260    1.7466   2.4800   -29.57    1.423    1.228    0.220  Employment, Native College-Educated"
[1] " 6)   -0.2431   -7.1418  -5.1700    38.14    3.264   -2.188    0.029  Employment, Native Non-College-Educated"
The following table compares the slopes and p-values from the prior four tables:
[1] "      SLOPES                                           P-VALUES
[1] "      ----------------------------------------------  ----------------------------------"
[1] " N    STUDY     OrgPSS    RepPSS    AllInc    PosInc   OrgPSS   RepPSS   AllInc   PosInc    DESCRIPTION"
[1] "--   -------  --------  --------  --------  --------  -------  -------  -------  -------    -----------------------------------"
[1] " 1)   6.6500    6.6191    5.6406    5.1055    5.6635    0.350    0.427    0.476    0.427    Weekly Wage, Native STEM"
[1] " 2)   8.0300*** 8.0414    8.5377    7.6353    8.5724    0.035**  0.026**  0.048**  0.026**  Weekly Wage, Native College-Educated"
[1] " 3)   3.7800**  3.6174    3.4182    3.0270    3.4365    0.067*   0.088*   0.140    0.088*   Weekly Wage, Native Non-College-Educated"
[1] " 4)   0.5300    0.5349    0.4231    0.4231    0.4032    0.072*   0.170    0.170    0.184    Employment, Native STEM"
[1] " 5)   2.4800    2.4942    2.2430    2.2430    1.7466    0.087*   0.141    0.141    0.220    Employment, Native College-Educated"
[1] " 6)  -5.1700   -5.1701   -7.3076   -7.3076   -7.1418    0.104    0.034**  0.034**  0.029**  Employment, Native Non-College-Educated"

  * Significant at the 10% (0.10) level.
 ** Significant at the  5% (0.05) level.
*** Significant at the  1% (0.01) level.
Despite the close replication of the data, there is some difference between OrgPSS (using the original PSS data) and RepPSS (using the replicated PSS data). The significant 8.03 slope for the weekly wage of native college-educated climbs to 8.54 and the less significant 3.78 slope for the weekly wage of native non-college-educated drops to 3.42. There are similar small differences for the other less-significant variables.

The AllInc (filtering all incomes) and PosInc (filtering just positive incomes) does not have a major effect on the key regressions. The significant 8.03 slope for the weekly wage of native college-educated varies between 7.6 and 8.6. They have a bit more effect on the less significant 3.78 slope for the weekly wage of native non-college-educated which drops to between 3.0 and 3.4 for the alternate filtering.

There was generally minor, if any, changes in the p-values obtained using various of the four data files. One notable difference is that the last variable (employment of Native Non-College-Educated) had some significance for a negative correlation in all but the study's data. Still, the p-value appears to have been nearly significant in the study's data.

17. Summary of Replication of Data in PSS_Data.dta

The main finding of interest in the replication of data to this point is the seeming inconsistency of filtering applied to wages versus employment. As mentioned in the prior section, the wage variables were calculated for the population of workers who earned a positive, non-zero income. The employment variables, however, were calculated for the population of workers who earned any valid income, including a zero income. The numbers of workers recorded as earning a zero income is not insignificant as ignoring them made it impossible to replicate the data within any kind of reasonable tolerance.

In any event, this helps underline the importance of replicating a study's result all the way back to the source data. Studies typically extract and aggregate data into one or more data files which are then used to perform the analysis. Often, the programs to do the analysis are provided but the programs to do the extraction and aggregation are not. At most, they are often described somewhat in an appendix as was the case with this study. Replicating the data from the original source accomplishes several things. First, it verifies that the extraction, aggregation, and any cleaning was done correctly. Secondly, it reveals the precise methods that were used and allows alternate methods to be tried. Expanding on this last point, it also allows alternate data or an expanded range of data to be analyzed using the same or modified methods.

One item of note is that this replication is not complete. The following variables have not yet been replicated and the prior analysis currently uses the values provided by the study:

delta_imm_stemO4_H1B_hat80 - instrument for H-1B imputed growth of foreign STEM
bartik_coll_wage           - instrument to predict the wage growth of college-educated workers based on each city’s industrial composition in 1980
bartik_coll_emp            - instrument to predict the employment growth of college-educated workers based on each city’s industrial composition in 1980
bartik_nocoll_wage         - instrument to predict the wage growth of non-college-educated workers based on each city’s industrial composition in 1980
bartik_nocoll_emp          - instrument to predict the employment growth of non-college-educated workers based on each city’s industrial composition in 1980
Following are the regression results using all of the variables:
"1990-2010 USING STUDY'S FORMULA "
[1] ""
[1] " N  INTERCEPT    SLOPE    STUDY    % DIFF     S.E.   T-STAT    P-VAL  DESCRIPTION"
[1] "--  ---------  --------  -------  -------  -------  -------  -------  -----------------------------------"
[1] " 1)   -0.3322    6.6191   6.6500    -0.46    7.073    0.936    0.350  Weekly Wage, Native STEM"
[1] " 2)   -0.0764    8.0414   8.0300     0.14    3.804    2.114    0.035  Weekly Wage, Native College-Educated"
[1] " 3)    0.0594    3.6174   3.7800    -4.30    1.972    1.835    0.067  Weekly Wage, Native Non-College-Educated"
[1] " 4)    0.0272    0.5349   0.5300     0.92    0.296    1.806    0.072  Employment, Native STEM"
[1] " 5)    0.0311    2.4942   2.4800     0.57    1.452    1.718    0.087  Employment, Native College-Educated"
[1] " 6)   -0.2557   -5.1701  -5.1700     0.00    3.178   -1.627    0.104  Employment, Native Non-College-Educated"
Next, following are the results when the bartik variables are removed:
 "1990-2010 USING STUDY'S FORMULA MINUS BARTIK VARIABLES "
[1] ""
[1] " N  INTERCEPT    SLOPE    STUDY    % DIFF     S.E.   T-STAT    P-VAL  DESCRIPTION"
[1] "--  ---------  --------  -------  -------  -------  -------  -------  -----------------------------------"
[1] " 1)   -0.0874    5.6811   6.6500   -14.57    6.245    0.910    0.363  Weekly Wage, Native STEM"
[1] " 2)    0.0871    8.4148   8.0300     4.79    3.370    2.497    0.013  Weekly Wage, Native College-Educated"
[1] " 3)    0.1455    3.5455   3.7800    -6.20    1.979    1.791    0.074  Weekly Wage, Native Non-College-Educated"
[1] " 4)    0.0056    0.7897   0.5300    49.00    0.262    3.011    0.003  Employment, Native STEM"
[1] " 5)    0.0566    3.6409   2.4800    46.81    1.283    2.838    0.005  Employment, Native College-Educated"
[1] " 6)    0.1092   -5.4231  -5.1700     4.89    3.230   -1.679    0.094  Employment, Native Non-College-Educated"
[1] ""
Finally, following are the results when delta_imm_stemO4_H1B_hat80, the instrument for H-1B imputed growth of foreign STEM, is also removed:
1] "1990-2010 USING STUDY'S FORMULA ON REPLICATED DATA "
[1] ""
[1] " N  INTERCEPT    SLOPE    STUDY    % DIFF     S.E.   T-STAT    P-VAL  DESCRIPTION"
[1] "--  ---------  --------  -------  -------  -------  -------  -------  -----------------------------------"
[1] " 1)   -0.0830    3.3526   6.6500   -49.58    3.083    1.088    0.277  Weekly Wage, Native STEM"
[1] " 2)    0.0946    4.4874   8.0300   -44.12    1.654    2.713    0.007  Weekly Wage, Native College-Educated"
[1] " 3)    0.1504    0.9759   3.7800   -74.18    0.970    1.006    0.315  Weekly Wage, Native Non-College-Educated"
[1] " 4)    0.0055    0.8456   0.5300    59.54    0.130    6.527    0.000  Employment, Native STEM"
[1] " 5)    0.0545    4.7156   2.4800    90.14    0.632    7.466    0.000  Employment, Native College-Educated"
[1] " 6)    0.0958    1.5815  -5.1700  -130.59    1.560    1.014    0.311  Employment, Native Non-College-Educated"s
[1] ""
The following table summarizes the information from the prior three tables:
[1] "     INTERCEPTS                     SLOPES                        P-VALUES
[1] "    -----------------------------  ----------------------------  -------------------------"
[1] "                           STUDY                         STUDY                      STUDY "
[1] "                 STUDY    -H1B IG              STUDY    -H1B IG            STUDY   -H1B IG"
[1] " N     STUDY    -BARTIK   -BARTIK    STUDY    -BARTIK   -BARTIK   STUDY   -BARTIK  -BARTIK  DESCRIPTION"
[1] "--  --------- --------- ---------  --------  --------  --------  -------  -------  -------  -----------------------------------"
[1] " 1)   -0.3322   -0.0874   -0.0830    6.6191    5.6811    3.3526    0.350    0.363    0.277  Weekly Wage, Native STEM"
[1] " 2)   -0.0764    0.0871    0.0946    8.0414    8.4148    4.4874    0.035    0.013    0.007  Weekly Wage, Native College-Educated"
[1] " 3)    0.0594    0.1455    0.1504    3.6174    3.5455    0.9759    0.067    0.074    0.315  Weekly Wage, Native Non-College-Educated"
[1] " 4)    0.0272    0.0056    0.0055    0.5349    0.7897    0.8456    0.072    0.003    0.000  Employment, Native STEM"
[1] " 5)    0.0311    0.0566    0.0545    2.4942    3.6409    4.7156    0.087    0.005    0.000  Employment, Native College-Educated"
[1] " 6)   -0.2557    0.1092    0.0958   -5.1701   -5.4231    1.5815    0.104    0.094    0.311  Employment, Native Non-College-Educated"
As can be seen, removal of the Bartik variables causes significant changes in the intercepts but relatively small changes in the slopes. The additional removal of the H-1B imputed growth variable, on the other hand, causes small changes in the intercept but relatively significant changes in the slopes. Hence, an expanded analysis of this study would do well to replicate these variables as well.

18. UPDATED FINDING: Problem with Study's Values for Labor Force

The above analysis replicated the six key dependent variables and one key independent variable used in the study. However, an updated version of the replication program also replicated labor force and found a serious problem. Following is the result of comparing the labor force as given in the study versus the labor force extracted from IPUMS data:

[1] "COMPARING PSS_Data.dta to IP_Metro_rep8010.csv at tolerance 1"
[1] ""
[1] "TOTAL  DIFF  1990  2000  2005  2010    NA  VARIABLE"
[1] "-----  ----  ----  ----  ----  ----  ----  --------"
[1] "  876     0     0     0     0     0     0  nat_stemO4_wkwage"
[1] "  876     0     0     0     0     0     0  nat_coll_wkwage"
[1] "  876     0     0     0     0     0     0  nat_nocoll_wkwage"
[1] "  876     0     0     0     0     0     0  nat_stemO4_emp"
[1] "  876     0     0     0     0     0     0  nat_coll_emp"
[1] "  876     0     0     0     0     0     0  nat_nocoll_emp"
[1] "  876    41     0     1     0     1    39  imm_stemO4"
[1] "  876   876   219   219   219   219     0  labforce"
As can be seen, every value differs by more than 1 percent. In inspecting the data, it appears that the study is using total population for labor force. As a test, the new replication program was modified to likewise use total population for the labor force. Following is the same comparison as above with this modified data:
[1] "COMPARING PSS_Data.dta to IP_Metro_rep8010_tp.csv at tolerance 1"
[1] ""
[1] "TOTAL  DIFF  1990  2000  2005  2010    NA  VARIABLE"
[1] "-----  ----  ----  ----  ----  ----  ----  --------"
[1] "  876     0     0     0     0     0     0  nat_stemO4_wkwage"
[1] "  876     0     0     0     0     0     0  nat_coll_wkwage"
[1] "  876     0     0     0     0     0     0  nat_nocoll_wkwage"
[1] "  876     0     0     0     0     0     0  nat_stemO4_emp"
[1] "  876     0     0     0     0     0     0  nat_coll_emp"
[1] "  876     0     0     0     0     0     0  nat_nocoll_emp"
[1] "  876    41     0     1     0     1    39  imm_stemO4"
[1] "  876     0     0     0     0     0     0  labforce"
Following is the same comparison with a smaller tolerance of 0.1 percent:
[1] ""
[1] "COMPARING PSS_Data.dta to IP_Metro_rep8010_tp.csv at tolerance 0.1"
[1] ""
[1] "TOTAL  DIFF  1990  2000  2005  2010    NA  VARIABLE"
[1] "-----  ----  ----  ----  ----  ----  ----  --------"
[1] "  876   446     6   219   219     2     0  nat_stemO4_wkwage"
[1] "  876   443     6   219   218     0     0  nat_coll_wkwage"
[1] "  876   466    28   219   217     2     0  nat_nocoll_wkwage"
[1] "  876    13     8     1     0     4     0  nat_stemO4_emp"
[1] "  876    23    15     0     6     2     0  nat_coll_emp"
[1] "  876    34    20     1     6     7     0  nat_nocoll_emp"
[1] "  876    49     3     2     0     5    39  imm_stemO4"
[1] "  876    11    11     0     0     0     0  labforce"
Following is a list of the 11 values that differed by more than 0.1 percent:
[1] "  1  labforce              -0.19    51694    51595  1990  Anniston, AL"
[1] "  2  labforce              -0.11   186649   186443  1990  Augusta-Aiken, GA-SC"
[1] "  3  labforce              -0.12   127440   127289  1990  Bridgeport, CT"
[1] "  4  labforce              -0.13    94883    94762  1990  Brownsville-Harlingen-San Benito, TX"
[1] "  5  labforce              -0.12   238580   238289  1990  El Paso, TX"
[1] "  6  labforce              -0.13   107499   107364  1990  Fayetteville, NC"
[1] "  7  labforce              -0.12    69402    69321  1990  Medford, OR"
[1] "  8  labforce              -0.14   108616   108467  1990  New Haven-Meriden, CT"
[1] "  9  labforce              -0.13   236111   235814  1990  Providence-Fall River-Pawtucket, MA/RI"
[1] " 10  labforce              -0.11   209787   209562  1990  Stockton, CA"
[1] " 11  labforce              -0.13    47124    47063  1990  Terre Haute, IN"
A review of the study's data file PSS_Data.dta verifies that this variable is called labforce in the file, strongly suggesting that this is the labor force. In the study, the notes for Table 3 state "[t]he dependent variable is the growth in foreign STEM as a percentage of the labor force". Hence, this seems like a serious problem with the data. The effects of this will be looked at more closely in the next section.

Part 3

Source Code for R Programs and Data Files Used in this Replication

  1. Source code for jole1data.R (replication program called by following three programs)
  2. Source code for jole1data_rep.R (replicates data in PSS_Data.dta used study's filtering)
  3. Source code for jole1data_all.R (replicates data in PSS_Data.dta for workers with any valid income, including zero)
  4. Source code for jole1data_all.R (replicates data in PSS_Data.dta for workers with just a positive, non-zero income)
  5. Source code for jole1comp90.R (called by next two programs)
  6. Source code for jole1comp90_1.R (compares PSS_Data.dta to replicated data files at tolerance 1)
  7. Source code for jole1comp90_0z1.R (compares PSS_Data.dta to replicated data files at tolerance 0.1)
  8. Source code for tab5.R (called by next four programs)
  9. Source code for tab5_pss.R (analyzes data from PSS_data_pss.csv, csv version of PSS_data.dta)
  10. Source code for tab5_rep.R (analyzes data from PSS_data_rep.csv)
  11. Source code for tab5_all.R (analyzes data from PSS_data_all.csv)
  12. Source code for tab5_pos.R (analyzes data from PSS_data_pos.csv)

Part 1 of Analysis of "STEM Workers, H-1B Visas, and Productivity in US Cities"
Short Analysis of "Immigration and American Jobs"
Detailed Analysis of "Immigration and American Jobs"
Analysis of "Foreign STEM Workers and Native Wages and Employment in U.S. Cities"
Information on H-1B Visas
Commentary on the Skills Gap
Go to Budget Home Page