15. Replication of Data in PSS_Data.dta
As mentioned in the previous sections, all of the prior analysis is based on data from the file PSS_Data.dta. This is included in the supplemental data that can be downloaded from the Journal of Labor Economics website. PSS_Data.dta contains 176 variables as shown in the following listing:
> pss_data <- read.dta("PSS_Data.dta") > names(pss_data) [1] "year" "metarea" "nat_coll_wkwage" "nat_coll_emp" "nat_nocoll_wkwage" [6] "nat_nocoll_emp" "nat_hs_wkwage" "nat_hs_emp" "nat_nohs_wkwage" "nat_nohs_emp" [11] "nat_stemO4_wkwage" "nat_stemO4_emp" "nat_nonstemO4_wkwage" "nat_nonstemO4_emp" "nat_coll_stemO4_wkwage" [16] "nat_coll_stemO4_emp" "nat_coll_nonstemO4_wkwage" "nat_coll_nonstemO4_emp" "nat_stemO8_wkwage" "nat_stemO8_emp" [21] "nat_nonstemO8_wkwage" "nat_nonstemO8_emp" "nat_coll_stemO8_wkwage" "nat_coll_stemO8_emp" "nat_coll_nonstemO8_wkwage" [26] "nat_coll_nonstemO8_emp" "nat_stemM4_wkwage" "nat_stemM4_emp" "nat_nonstemM4_wkwage" "nat_nonstemM4_emp" [31] "nat_coll_stemM4_wkwage" "nat_coll_stemM4_emp" "nat_coll_nonstemM4_wkwage" "nat_coll_nonstemM4_emp" "nat_stemM8_wkwage" [36] "nat_stemM8_emp" "nat_nonstemM8_wkwage" "nat_nonstemM8_emp" "nat_coll_stemM8_wkwage" "nat_coll_stemM8_emp" [41] "nat_coll_nonstemM8_wkwage" "nat_coll_nonstemM8_emp" "nat_sector1" "nat_sector2" "nat_sector3" [46] "nat_sector4" "nat_sector5" "nat_sector6" "nat_sector7" "nat_sector8" [51] "nat_sector9" "nat_sector10" "nat_sector11" "nat_sector12" "nat_sector13" [56] "nat_coll_sector1" "nat_coll_sector2" "nat_coll_sector3" "nat_coll_sector4" "nat_coll_sector5" [61] "nat_coll_sector6" "nat_coll_sector7" "nat_coll_sector8" "nat_coll_sector9" "nat_coll_sector10" [66] "nat_coll_sector11" "nat_coll_sector12" "nat_coll_sector13" "nat_nocoll_avgrentpr" "nat_nocoll_medrentpr" [71] "nat_coll_avgrentpr" "nat_coll_medrentpr" "imm_stemO4" "imm_nonstemO4" "imm_coll_stemO4" [76] "imm_coll_nonstemO4" "imm_stemO8" "imm_nonstemO8" "imm_coll_stemO8" "imm_coll_nonstemO8" [81] "imm_stemM4" "imm_nonstemM4" "imm_coll_stemM4" "imm_coll_nonstemM4" "imm_stemM8" [86] "imm_nonstemM8" "imm_coll_stemM8" "imm_coll_nonstemM8" "imm_coll" "imm_nocoll" [91] "imm_hs" "imm_nohs" "tot_stemO4" "tot_stemO8" "tot_stemM4" [96] "tot_stemM8" "imm" "nat" "imm_noi" "imm_stemO4_noi" [101] "imm_nonstemO4_noi" "imm_coll_stemO4_noi" "imm_coll_nonstemO4_noi" "imm_stemO8_noi" "imm_nonstemO8_noi" [106] "imm_coll_stemO8_noi" "imm_coll_nonstemO8_noi" "imm_stemM4_noi" "imm_nonstemM4_noi" "imm_coll_stemM4_noi" [111] "imm_coll_nonstemM4_noi" "imm_stemM8_noi" "imm_nonstemM8_noi" "imm_coll_stemM8_noi" "imm_coll_nonstemM8_noi" [116] "labforce" "labforce_noi" "popwt" "indian" "mexican" [121] "pred_coll_wkwage" "pred_nocoll_wkwage" "pred_wkwage" "pred_coll_emp" "pred_nocoll_emp" [126] "pred_emp" "imm_stemO4_H1B_hat80" "imm_coll_stemO4_H1B_hat80" "imm_stemO4_H1B_hat70" "imm_stemO8_H1B_hat80" [131] "imm_coll_stemO8_H1B_hat80" "imm_stemO8_H1B_hat70" "imm_stemM4_H1B_hat80" "imm_coll_stemM4_H1B_hat80" "imm_stemM4_H1B_hat70" [136] "imm_stemM8_H1B_hat80" "imm_coll_stemM8_H1B_hat80" "imm_stemM8_H1B_hat70" "imm_stemO4_H1BL1_hat80" "imm_stemO4_noi_H1B_hat80" [141] "imm_stemO4_false1_hat80" "imm_stemO4_false2_hat80" "imm_stemO8_H1BL1_hat80" "imm_stemO8_noi_H1B_hat80" "imm_stemO8_false1_hat80" [146] "imm_stemO8_false2_hat80" "imm_stemM4_H1BL1_hat80" "imm_stemM4_noi_H1B_hat80" "imm_stemM4_false1_hat80" "imm_stemM4_false2_hat80" [151] "imm_stemM8_H1BL1_hat80" "imm_stemM8_noi_H1B_hat80" "imm_stemM8_false1_hat80" "imm_stemM8_false2_hat80" "imm_hat80" [156] "mexican_hat80" "indian_hat80" "nat_coll_hat80" "nat_nocoll_hat80" "imm_coll_hat80" [161] "imm_nocoll_hat80" "imm_coll_noi_hat80" "imm_nocoll_noi_hat80" "labforce_hat80" "labforce_noi_hat80" [166] "imm_stemO4_H1Bagg_hat80" "imm_stemO8_H1Bagg_hat80" "imm_stemM4_H1Bagg_hat80" "imm_stemM8_H1Bagg_hat80" "imm_manualO8_false3_hat80" [171] "imm_manualO4_false3_hat80" "statefip" "obs1980" "obs1970" "panel1980" [176] "panel1970"These variables contain the six key dependent variables and one key independent variable used in the analysis. They are as follows:
[3] "nat_coll_wkwage" [4] "nat_coll_emp" [5] "nat_nocoll_wkwage" [6] "nat_nocoll_emp" [11] "nat_stemO4_wkwage" [12] "nat_stemO4_emp" [73] "imm_stemO4"Page S230 of the study describes the source for these seven variables as follows:
Our data on the occupations, employment, wages, age, and education of individuals come from the Ruggles et al. (2010) Integrated Public Use Microdata Series (IPUMS) 5% census files for 1980, 1990, and 2000; the 1% ACS sample for 2005; and the 2008–10 3% merged ACS sample for 2010. We use data only on 219 MSAs consistently identified from 1980 through 2010.
In order to replicate these seven variables from the original IPUMS data, the data was selected and downloaded from the IPUMS USA website for the following samples:
Sample Density ------------- ------- 1980 5% state 5.0% 1990 5% 5.0% 2000 5% 5.0% 2005 ACS 1.0% 2010 ACS 3yr 3.0%For each sample, the following variables were extracted:
Type^ Variable Label ---- -------- ----- H YEAR Census year H DATANUM Data set number H SERIAL Household serial number H HHWT Household weight H STATEFIP State (FIPS code) H METAREA (general) Metropolitan area [general version] H METAREAD (detailed) Metropolitan area [detailed version] H GQ Group quarters status P PERNUM Person number in sample unit P PERWT Person weight P AGE Age P BPL (general) Birthplace [general version] P BPLD (detailed) Birthplace [detailed version] P CITIZEN Citizenship status P EDUC (general) Educational attainment [general version] P EDUCD (detailed) Educational attainment [detailed version] P EMPSTAT (general) Employment status [general version] P EMPSTATD (detailed) Employment status [detailed version] P OCC Occupation P OCC1990 Occupation, 1990 basis P IND Industry P IND1990 Industry, 1990 basis P CLASSWKR (general) Class of worker [general version] P CLASSWKRD (detailed) Class of worker [detailed version] P WKSWORK1 Weeks worked last year* P WKSWORK2 Weeks worked last year, intervalled P INCWAGE Wage and salary income * WKSWORK1 not available from 2008 on ^ H=Household, P=PersonBased on the study and its appendix, the R program jole1data_rep.R below uses this data to replicate the seven variable in PSS_Data.dta and create the file PSS_Data_rep.csv. The program generates the following output:
> source("jole1data_rep.R") [1] "START READ OF jole_80.dta" [1] "PROCESS jole_80.dta" [1] "11343120 Initial" [1] "6940893 Age 18-65" [1] "5331242 Worked 1 or more weeks" [1] "5185976 Non-institutional" [1] "############################################################" [1] "1980 INCLUDE INCWAGE==0, EXCLUDE INCWAGE==999999" [1] "############################################################" [1] "5185976 Valid incwage" [1] "############################################################" [1] "1980 CONVERT TO TIME_CONSISTENT OCCUPATIONS" [1] "############################################################" [1] "5135244 Occupation not Military, Unemployed or Unknown" [1] "5130925 Occupation not 167, 192, 304, or 308" [1] "############################################################" [1] "1980 IND1990 > 0 & IND1990 <= 244" [1] "############################################################" [1] "5130925 IND1990 - not N/A, worked since 1984 and responded" [1] "5111349 Birthplace not Abroad, At sea, Other or Missing" [1] "3469260 Metareas (219)" [1] "Change 1980 to 2010 dollars" [1] "Change 1980 to 2010 dollars" [1] "3020 Size of aa" [1] "START READ OF jole_90.dta" [1] "PROCESS jole_90.dta" [1] "12501046 Initial" [1] "7707006 Age 18-65" [1] "6218598 Worked 1 or more weeks" [1] "6086603 Non-institutional" [1] "############################################################" [1] "1990 INCLUDE INCWAGE==0, EXCLUDE INCWAGE==999999" [1] "############################################################" [1] "6086603 Valid incwage" [1] "############################################################" [1] "1990 CONVERT TO TIME_CONSISTENT OCCUPATIONS" [1] "############################################################" [1] "6035019 Occupation not Military, Unemployed or Unknown" [1] "6029261 Occupation not 167, 192, 304, or 308" [1] "############################################################" [1] "1990 IND1990 > 0 & IND1990 <= 244" [1] "############################################################" [1] "6029261 IND1990 - not N/A, worked since 1984 and responded" [1] "6009872 Birthplace not Abroad, At sea, Other or Missing" [1] "3859417 Metareas (219)" [1] "Change 1990 to 2010 dollars" [1] "Change 1990 to 2010 dollars" [1] "6193 Size of aa" [1] "START READ OF jole_00.dta" [1] "PROCESS jole_00.dta" [1] "14081466 Initial" [1] "8681911 Age 18-65" [1] "6979381 Worked 1 or more weeks" [1] "6805782 Non-institutional" [1] "############################################################" [1] "2000 INCLUDE INCWAGE==0, EXCLUDE INCWAGE==999999" [1] "############################################################" [1] "6805782 Valid incwage" [1] "############################################################" [1] "2000 CONVERT TO TIME_CONSISTENT OCCUPATIONS" [1] "############################################################" [1] "6769389 Occupation not Military, Unemployed or Unknown" [1] "6764427 Occupation not 167, 192, 304, or 308" [1] "############################################################" [1] "2000 IND1990 > 0 & IND1990 <= 244" [1] "############################################################" [1] "6764427 IND1990 - not N/A, worked since 1984 and responded" [1] "6764427 Birthplace not Abroad, At sea, Other or Missing" [1] "4578682 Metareas (219)" [1] "Change 2000 to 2010 dollars" [1] "Change 2000 to 2010 dollars" [1] "9493 Size of aa" [1] "START READ OF jole_05.dta" [1] "PROCESS jole_05.dta" [1] "2878380 Initial" [1] "1778997 Age 18-65" [1] "1439462 Worked 1 or more weeks" [1] "1439462 Non-institutional" [1] "############################################################" [1] "2005 INCLUDE INCWAGE==0, EXCLUDE INCWAGE==999999" [1] "############################################################" [1] "1439462 Valid incwage" [1] "############################################################" [1] "2005 CONVERT TO TIME_CONSISTENT OCCUPATIONS" [1] "############################################################" [1] "1431690 Occupation not Military, Unemployed or Unknown" [1] "1430597 Occupation not 167, 192, 304, or 308" [1] "############################################################" [1] "2005 IND1990 > 0 & IND1990 <= 244" [1] "############################################################" [1] "1430597 IND1990 - not N/A, worked since 1984 and responded" [1] "1430428 Birthplace not Abroad, At sea, Other or Missing" [1] "994735 Metareas (219)" [1] "Change 2005 to 2010 dollars" [1] "Change 2005 to 2010 dollars" [1] "12505 Size of aa" [1] "START READ OF jole_10.dta" [1] "PROCESS jole_10.dta" [1] "Create mm$WKSWORK1" [1] "9093077 Initial" [1] "5672423 Age 18-65" [1] "4438018 Worked 1 or more weeks" [1] "4349383 Non-institutional" [1] "############################################################" [1] "2010 INCLUDE INCWAGE==0, EXCLUDE INCWAGE==999999" [1] "############################################################" [1] "4349383 Valid incwage" [1] "############################################################" [1] "2010 CONVERT TO TIME_CONSISTENT OCCUPATIONS" [1] "############################################################" [1] "4323596 Occupation not Military, Unemployed or Unknown" [1] "4320654 Occupation not 167, 192, 304, or 308" [1] "############################################################" [1] "2010 IND1990 > 0 & IND1990 <= 244" [1] "############################################################" [1] "4320654 IND1990 - not N/A, worked since 1984 and responded" [1] "4320163 Birthplace not Abroad, At sea, Other or Missing" [1] "3022724 Metareas (219)" [1] "15699 Size of aa" [1] "ALL IPUMS FILES READ" [1] "START OF AGGREGATE AND MERGE INTO FINAL FILES" [1] "READ AND UPDATE PSS_data0.csv" [1] "WRITE FINAL FILES PSS_Data_rep.txt and PSS_Data_rep.csv" >The output summarizes the filtering being done on the data. For example, the third line shows that 11,343,120 is the initial population for the 1980 data and the fourth line shows that this is reduced to 6,940,893 after filtering to just those between 18 and 65 in age.
16. Difference in Filtering Between Wages and Employment
The process of replication showed up one interesting item. In order to replicate the data, it was necessary to do one extra bit of filtering just on the wages. For those variables (the three ending with _wkwage), workers who earned an income of zero had to be filtered out. However, for the employment variables (imm_stemO4 and the three ending with _emp), all workers with valid incomes, including those with an income with zero, are included. Hence, these two sets of variables are looking at slightly different populations.
In any event, jole1data_rep.R successfully replicates all but two instances of the seven variables within a tolerance of one percent. The following table, generated by the R program jole1comp90_1.R below summarizes the differences and lists these two instances:
[1] "COMPARING PSS_Data.dta to PSS_Data_rep.csv at tolerance 1" [1] "" [1] "TOTAL DIFF 1990 2000 2005 2010 NA VARIABLE" [1] "----- ---- ---- ---- ---- ---- ---- --------" [1] " 876 0 0 0 0 0 0 nat_stemO4_wkwage" [1] " 876 0 0 0 0 0 0 nat_coll_wkwage" [1] " 876 0 0 0 0 0 0 nat_nocoll_wkwage" [1] " 876 0 0 0 0 0 0 nat_stemO4_emp" [1] " 876 0 0 0 0 0 0 nat_coll_emp" [1] " 876 0 0 0 0 0 0 nat_nocoll_emp" [1] " 876 41 0 1 0 1 39 imm_stemO4" [1] "" [1] "COMPARING PSS_Data.dta to PSS_Data_rep.csv at tolerance 1" [1] "" [1] " N VARIABLE % DIFF PSS CALC YEAR METAREA" [1] "--- ----------------- -------- ------- ------- ---- -------------" [1] " 1 imm_stemO4 9.63 509 558 2010 Bremerton, WA" [1] " 2 imm_stemO4 -10.88 239 213 2000 Canton, OH"As can be seen, all instances of the seven variables replicated within a tolerance of one percent except for two. Those both differed by about ten percent.
Two modified versions of jole1data_rep.R were created to calculate the results if the same filtering was applied to all seven variables. The programs jole1data_all.R and jole1data_pos.R are identical to jole1data_rep.R except that the former filters all seven variables for all valid incomes, including zero, and the latter filters them for just positive valid incomes. The output of those programs are saved in PSS_Data_all.csv and PSS_Data_pos.csv, respectively. The following are two other tables that are generated by jole1comp90_1:
[1] "COMPARING PSS_Data.dta to PSS_Data_all.csv at tolerance 1" [1] "" [1] "TOTAL DIFF 1990 2000 2005 2010 NA VARIABLE" [1] "----- ---- ---- ---- ---- ---- ---- --------" [1] " 876 705 184 184 155 182 0 nat_stemO4_wkwage" [1] " 876 876 219 219 219 219 0 nat_coll_wkwage" [1] " 876 876 219 219 219 219 0 nat_nocoll_wkwage" [1] " 876 0 0 0 0 0 0 nat_stemO4_emp" [1] " 876 0 0 0 0 0 0 nat_coll_emp" [1] " 876 0 0 0 0 0 0 nat_nocoll_emp" [1] " 876 41 0 1 0 1 39 imm_stemO4" [1] "" [1] "COMPARING PSS_Data.dta to PSS_Data_pos.csv at tolerance 1" [1] "" [1] "TOTAL DIFF 1990 2000 2005 2010 NA VARIABLE" [1] "----- ---- ---- ---- ---- ---- ---- --------" [1] " 876 0 0 0 0 0 0 nat_stemO4_wkwage" [1] " 876 0 0 0 0 0 0 nat_coll_wkwage" [1] " 876 0 0 0 0 0 0 nat_nocoll_wkwage" [1] " 876 707 185 184 155 183 0 nat_stemO4_emp" [1] " 876 876 219 219 219 219 0 nat_coll_emp" [1] " 876 876 219 219 219 219 0 nat_nocoll_emp" [1] " 876 293 68 68 47 71 39 imm_stemO4"As the titles suggest, these tables compare the data in PSS_Data_all.csv and PSS_Data_pos.csv to the the orignal data in PSS_Data.dta. As can be seen, filtering for all incomes causes significant changes to most of the wage data and filtering for just positive incomes causes significant changes to most of the employment data.
The R program jole1comp90_0z1.R below is identical to jole1comp90_1.R except that it uses a tolerance of 0.1 percent instead of 1 percent. Following is the output:
> source("jole1comp90_0z1.R") [1] "" [1] "COMPARING PSS_Data.dta to PSS_Data_rep.csv at tolerance 0.1" [1] "" [1] "TOTAL DIFF 1990 2000 2005 2010 NA VARIABLE" [1] "----- ---- ---- ---- ---- ---- ---- --------" [1] " 876 5 3 0 0 2 0 nat_stemO4_wkwage" [1] " 876 7 3 0 4 0 0 nat_coll_wkwage" [1] " 876 6 1 0 3 2 0 nat_nocoll_wkwage" [1] " 876 13 8 1 0 4 0 nat_stemO4_emp" [1] " 876 23 15 0 6 2 0 nat_coll_emp" [1] " 876 34 20 1 6 7 0 nat_nocoll_emp" [1] " 876 49 3 2 0 5 39 imm_stemO4" [1] "" [1] "COMPARING PSS_Data.dta to PSS_Data_pos.csv at tolerance 0.1" [1] "" [1] "TOTAL DIFF 1990 2000 2005 2010 NA VARIABLE" [1] "----- ---- ---- ---- ---- ---- ---- --------" [1] " 876 5 3 0 0 2 0 nat_stemO4_wkwage" [1] " 876 7 3 0 4 0 0 nat_coll_wkwage" [1] " 876 6 1 0 3 2 0 nat_nocoll_wkwage" [1] " 876 791 207 208 169 207 0 nat_stemO4_emp" [1] " 876 876 219 219 219 219 0 nat_coll_emp" [1] " 876 876 219 219 219 219 0 nat_nocoll_emp" [1] " 876 358 78 95 53 93 39 imm_stemO4" [1] "" [1] "COMPARING PSS_Data.dta to PSS_Data_all.csv at tolerance 0.1" [1] "" [1] "TOTAL DIFF 1990 2000 2005 2010 NA VARIABLE" [1] "----- ---- ---- ---- ---- ---- ---- --------" [1] " 876 791 207 208 169 207 0 nat_stemO4_wkwage" [1] " 876 876 219 219 219 219 0 nat_coll_wkwage" [1] " 876 876 219 219 219 219 0 nat_nocoll_wkwage" [1] " 876 13 8 1 0 4 0 nat_stemO4_emp" [1] " 876 23 15 0 6 2 0 nat_coll_emp" [1] " 876 34 20 1 6 7 0 nat_nocoll_emp" [1] " 876 49 3 2 0 5 39 imm_stemO4" [1] "" [1] "COMPARING PSS_Data.dta to PSS_Data_rep.csv at tolerance 0.1" [1] "" [1] " N VARIABLE % DIFF PSS CALC YEAR METAREA" [1] "--- ----------------- -------- ------- ------- ---- -------------" [1] " 1 nat_stemO4_wkwage 0.32 1583 1588 2010 Bremerton, WA" [1] " 2 nat_stemO4_wkwage 0.12 1169 1171 1990 Honolulu, HI" [1] " 3 nat_stemO4_wkwage 0.59 1236 1243 1990 New Haven-Meriden, CT" [1] " 4 nat_stemO4_wkwage 0.34 1043 1046 1990 Savannah, GA" [1] " 5 nat_stemO4_wkwage -0.12 1328 1327 2010 Stockton, CA" [1] " 1 nat_coll_wkwage -0.37 1140 1135 1990 Benton Harbor, MI" [1] " 2 nat_coll_wkwage 0.20 1480 1483 2005 Charleston-N.Charleston,SC" [1] " 3 nat_coll_wkwage 0.11 1254 1256 2005 Honolulu, HI" [1] " 4 nat_coll_wkwage -0.16 1478 1475 2005 Melbourne-Titusville-Cocoa-Palm Bay, FL" [1] " 5 nat_coll_wkwage 0.23 1457 1460 2005 South Bend-Mishawaka, IN" [1] " 6 nat_coll_wkwage 0.11 2012 2015 1990 Stamford, CT" [1] " 7 nat_coll_wkwage 0.19 1160 1162 1990 Vineland-Milville-Bridgetown, NJ" [1] " 1 nat_nocoll_wkwage 0.15 908 909 2010 Anchorage, AK" [1] " 2 nat_nocoll_wkwage 0.12 578 578 1990 Anniston, AL" [1] " 3 nat_nocoll_wkwage 0.22 753 755 2005 Bremerton, WA" [1] " 4 nat_nocoll_wkwage 0.25 836 838 2005 Honolulu, HI" [1] " 5 nat_nocoll_wkwage 0.12 726 727 2010 Honolulu, HI" [1] " 6 nat_nocoll_wkwage 0.15 668 669 2005 Salt Lake City-Ogden, UT" [1] " 1 nat_stemO4_emp -0.70 6997 6948 2010 Bremerton, WA" [1] " 2 nat_stemO4_emp 0.35 7422 7448 2000 Canton, OH" [1] " 3 nat_stemO4_emp -0.14 12441 12423 1990 Colorado Springs, CO" [1] " 4 nat_stemO4_emp -0.26 14899 14861 1990 Fort Lauderdale-Hollywood-Pompano Beach, FL" [1] " 5 nat_stemO4_emp -0.15 13113 13093 1990 Honolulu, HI" [1] " 6 nat_stemO4_emp -0.19 18968 18932 2010 Melbourne-Titusville-Cocoa-Palm Bay, FL" [1] " 7 nat_stemO4_emp -0.24 12286 12257 1990 Miami-Hialeah, FL" [1] " 8 nat_stemO4_emp -0.73 4242 4211 1990 New Haven-Meriden, CT" [1] " 9 nat_stemO4_emp -0.10 23140 23116 1990 Orlando, FL" [1] " 10 nat_stemO4_emp -0.14 37084 37031 2010 Riverside-San Bernardino,CA" [1] " 11 nat_stemO4_emp -0.75 2678 2658 1990 Savannah, GA" [1] " 12 nat_stemO4_emp -0.34 6680 6657 2010 Stockton, CA" [1] " 13 nat_stemO4_emp -0.17 6641 6630 1990 York, PA" [1] " 1 nat_coll_emp -0.13 34162 34117 1990 Augusta-Aiken, GA-SC" [1] " 2 nat_coll_emp -0.28 13501 13463 1990 Benton Harbor, MI" [1] " 3 nat_coll_emp -0.11 27731 27700 1990 Bridgeport, CT" [1] " 4 nat_coll_emp -0.16 40926 40861 2005 Canton, OH" [1] " 5 nat_coll_emp -0.24 76420 76233 2005 Charleston-N.Charleston,SC" [1] " 6 nat_coll_emp -0.10 34868 34832 1990 El Paso, TX" [1] " 7 nat_coll_emp -0.15 44288 44222 2010 Eugene-Springfield, OR" [1] " 8 nat_coll_emp -0.11 16812 16794 1990 Fayetteville, NC" [1] " 9 nat_coll_emp -0.20 111912 111688 2005 Honolulu, HI" [1] " 10 nat_coll_emp -0.16 120107 119909 2010 Honolulu, HI" [1] " 11 nat_coll_emp -0.13 17118 17095 1990 McAllen-Edinburg-Pharr-Mission, TX" [1] " 12 nat_coll_emp -0.15 57438 57353 2005 Melbourne-Titusville-Cocoa-Palm Bay, FL" [1] " 13 nat_coll_emp -0.14 114717 114558 1990 Miami-Hialeah, FL" [1] " 14 nat_coll_emp -0.19 26578 26528 1990 New Haven-Meriden, CT" [1] " 15 nat_coll_emp -0.11 45234 45183 1990 Providence-Fall River-Pawtucket, MA/RI" [1] " 16 nat_coll_emp -0.11 51671 51612 2005 Reno, NV" [1] " 17 nat_coll_emp -0.17 10839 10821 1990 Roanoke, VA" [1] " 18 nat_coll_emp -0.11 27691 27660 1990 Salinas-Sea Side-Monterey, CA" [1] " 19 nat_coll_emp -0.12 213055 212803 1990 San Jose, CA" [1] " 20 nat_coll_emp -0.32 29457 29362 2005 South Bend-Mishawaka, IN" [1] " 21 nat_coll_emp -0.19 20554 20515 1990 Stamford, CT" [1] " 22 nat_coll_emp -0.25 7880 7860 1990 Vineland-Milville-Bridgetown, NJ" [1] " 23 nat_coll_emp -0.18 15334 15306 1990 Waco, TX" [1] " 1 nat_nocoll_emp -0.25 91369 91139 2010 Anchorage, AK" [1] " 2 nat_nocoll_emp -0.23 43693 43594 1990 Anniston, AL" [1] " 3 nat_nocoll_emp -0.11 147732 147571 1990 Augusta-Aiken, GA-SC" [1] " 4 nat_nocoll_emp -0.36 80960 80669 2005 Bremerton, WA" [1] " 5 nat_nocoll_emp -0.14 83261 83141 1990 Bridgeport, CT" [1] " 6 nat_nocoll_emp -0.21 58049 57928 1990 Brownsville-Harlingen-San Benito, TX" [1] " 7 nat_nocoll_emp -0.15 137333 137123 1990 El Paso, TX" [1] " 8 nat_nocoll_emp -0.12 85947 85848 1990 Fayetteville, NC" [1] " 9 nat_nocoll_emp -0.11 178941 178752 1990 Hartford-Bristol-Middleton- New Britain, CT" [1] " 10 nat_nocoll_emp -0.42 222260 221334 2005 Honolulu, HI" [1] " 11 nat_nocoll_emp -0.52 223262 222091 2010 Honolulu, HI" [1] " 12 nat_nocoll_emp -0.12 111057 110927 2010 Kileen-Temple, TX" [1] " 13 nat_nocoll_emp -0.12 59397 59324 1990 Longview-Marshall, TX" [1] " 14 nat_nocoll_emp -0.11 2640898 2638028 1990 Los Angeles-Long Beach, CA" [1] " 15 nat_nocoll_emp -0.13 53446 53375 1990 Medford, OR" [1] " 16 nat_nocoll_emp -0.19 323762 323141 1990 Miami-Hialeah, FL" [1] " 17 nat_nocoll_emp -0.13 73512 73413 1990 New Haven-Meriden, CT" [1] " 18 nat_nocoll_emp -0.13 3919102 3913923 1990 New York-Northeastern NJ" [1] " 19 nat_nocoll_emp -0.16 153151 152905 1990 Providence-Fall River-Pawtucket, MA/RI" [1] " 20 nat_nocoll_emp -0.11 41673 41628 1990 Pueblo, CO" [1] " 21 nat_nocoll_emp -0.10 125549 125418 2010 Reno, NV" [1] " 22 nat_nocoll_emp 0.12 90026 90132 2000 Salem, OR" [1] " 23 nat_nocoll_emp -0.44 52505 52274 2005 Salinas-Sea Side-Monterey, CA" [1] " 24 nat_nocoll_emp -0.40 489617 487649 2005 Salt Lake City-Ogden, UT" [1] " 25 nat_nocoll_emp -0.22 475853 474787 2010 Salt Lake City-Ogden, UT" [1] " 26 nat_nocoll_emp -0.14 634807 633934 2005 San Diego, CA" [1] " 27 nat_nocoll_emp -0.12 403965 403498 1990 San Jose, CA" [1] " 28 nat_nocoll_emp -0.12 667722 666952 2005 Seattle-Everett, WA" [1] " 29 nat_nocoll_emp -0.13 154940 154742 2010 Spokane, WA" [1] " 30 nat_nocoll_emp -0.12 43588 43534 1990 State College, PA" [1] " 31 nat_nocoll_emp -0.13 148615 148416 1990 Stockton, CA" [1] " 32 nat_nocoll_emp -0.18 252484 252027 2010 Tacoma, WA" [1] " 33 nat_nocoll_emp -0.17 36755 36694 1990 Terre Haute, IN" [1] " 34 nat_nocoll_emp -0.12 253797 253496 1990 West Palm Beach-Boca Raton-Delray Beach, FL" [1] " 1 imm_stemO4 -0.30 5404 5388 1990 Atlanta, GA" [1] " 2 imm_stemO4 9.63 509 558 2010 Bremerton, WA" [1] " 3 imm_stemO4 -10.88 239 213 2000 Canton, OH" [1] " 4 imm_stemO4 0.25 26367 26434 2010 Detroit, MI" [1] " 5 imm_stemO4 0.29 3443 3453 2010 Honolulu, HI" [1] " 6 imm_stemO4 -0.78 3322 3296 1990 Portland, OR-WA" [1] " 7 imm_stemO4 -0.28 9018 8993 2000 Portland, OR-WA" [1] " 8 imm_stemO4 -0.49 5972 5943 1990 Riverside-San Bernardino,CA" [1] " 9 imm_stemO4 0.43 12410 12463 2010 Riverside-San Bernardino,CA" [1] " 10 imm_stemO4 0.88 2628 2651 2010 Stockton, CA" >As can be seen, there are more (137) instances outside the tolerance of 0.1 percent but still a relatively small number.
The R programs jole1data_pss.R, jole1data_rep.R, jole1data_all.R, and jole1data_pos.R generate the data for PSS_Data_pss.csv, PSS_Data_rep.csv, PSS_Data_all.csv, and PSS_Data_pos.csv, respectively. As previously mentioned, this is for the original PSS data, the replicated PSS data, the replicated PSS data filtered for all valid incomes, and replicated PSS data filtered for all positive incomes, respectively. The first table output by each of the R programs tab5_pss.R, tab5_rep.R, tab5_all.R, and tab5_pos.R show the main regressions for these four cases. Following are those four tables:
[1] "OUTPUT FROM tab4_pss.R [1] "======================" [1] "1990-2010 USING STUDY'S FORMULA " [1] "" [1] " N INTERCEPT SLOPE STUDY % DIFF S.E. T-STAT P-VAL DESCRIPTION" [1] "-- --------- -------- ------- ------- ------- ------- ------- -----------------------------------" [1] " 1) -0.3322 6.6191 6.6500 -0.46 7.073 0.936 0.350 Weekly Wage, Native STEM" [1] " 2) -0.0764 8.0414 8.0300 0.14 3.804 2.114 0.035 Weekly Wage, Native College-Educated" [1] " 3) 0.0594 3.6174 3.7800 -4.30 1.972 1.835 0.067 Weekly Wage, Native Non-College-Educated" [1] " 4) 0.0272 0.5349 0.5300 0.92 0.296 1.806 0.072 Employment, Native STEM" [1] " 5) 0.0311 2.4942 2.4800 0.57 1.452 1.718 0.087 Employment, Native College-Educated" [1] " 6) -0.2557 -5.1701 -5.1700 0.00 3.178 -1.627 0.104 Employment, Native Non-College-Educated" [1] "" [1] "OUTPUT FROM tab4_rep.R [1] "======================" [1] "1990-2010 USING STUDY'S FORMULA " [1] "" [1] " N INTERCEPT SLOPE STUDY % DIFF S.E. T-STAT P-VAL DESCRIPTION" [1] "-- --------- -------- ------- ------- ------- ------- ------- -----------------------------------" [1] " 1) -0.3542 5.6406 6.6500 -15.18 7.090 0.796 0.427 Weekly Wage, Native STEM" [1] " 2) -0.0416 8.5377 8.0300 6.32 3.825 2.232 0.026 Weekly Wage, Native College-Educated" [1] " 3) 0.0393 3.4182 3.7800 -9.57 1.996 1.712 0.088 Weekly Wage, Native Non-College-Educated" [1] " 4) 0.0257 0.4231 0.5300 -20.18 0.308 1.375 0.170 Employment, Native STEM" [1] " 5) 0.0251 2.2430 2.4800 -9.56 1.521 1.475 0.141 Employment, Native College-Educated" [1] " 6) -0.2720 -7.3076 -5.1700 41.35 3.427 -2.133 0.034 Employment, Native Non-College-Educated" [1] "" [1] "OUTPUT FROM tab4_all.R [1] "======================" [1] "1990-2010 USING STUDY'S FORMULA " [1] "" [1] " N INTERCEPT SLOPE STUDY % DIFF S.E. T-STAT P-VAL DESCRIPTION" [1] "-- --------- -------- ------- ------- ------- ------- ------- -----------------------------------" [1] " 1) -0.3344 5.1055 6.6500 -23.23 7.155 0.714 0.476 Weekly Wage, Native STEM" [1] " 2) -0.0303 7.6353 8.0300 -4.91 3.853 1.982 0.048 Weekly Wage, Native College-Educated" [1] " 3) 0.0607 3.0270 3.7800 -19.92 2.047 1.479 0.140 Weekly Wage, Native Non-College-Educated" [1] " 4) 0.0257 0.4231 0.5300 -20.18 0.308 1.375 0.170 Employment, Native STEM" [1] " 5) 0.0251 2.2430 2.4800 -9.56 1.521 1.475 0.141 Employment, Native College-Educated" [1] " 6) -0.2720 -7.3076 -5.1700 41.35 3.427 -2.133 0.034 Employment, Native Non-College-Educated" [1] "" [1] "OUTPUT FROM tab4_pos.R [1] "======================" [1] "1990-2010 USING STUDY'S FORMULA " [1] "" [1] " N INTERCEPT SLOPE STUDY % DIFF S.E. T-STAT P-VAL DESCRIPTION" [1] "-- --------- -------- ------- ------- ------- ------- ------- -----------------------------------" [1] " 1) -0.3552 5.6635 6.6500 -14.83 7.118 0.796 0.427 Weekly Wage, Native STEM" [1] " 2) -0.0432 8.5724 8.0300 6.75 3.842 2.231 0.026 Weekly Wage, Native College-Educated" [1] " 3) 0.0397 3.4365 3.7800 -9.09 2.006 1.713 0.088 Weekly Wage, Native Non-College-Educated" [1] " 4) 0.0258 0.4032 0.5300 -23.93 0.303 1.331 0.184 Employment, Native STEM" [1] " 5) 0.0260 1.7466 2.4800 -29.57 1.423 1.228 0.220 Employment, Native College-Educated" [1] " 6) -0.2431 -7.1418 -5.1700 38.14 3.264 -2.188 0.029 Employment, Native Non-College-Educated"The following table compares the slopes and p-values from the prior four tables:
[1] " SLOPES P-VALUES [1] " ---------------------------------------------- ----------------------------------" [1] " N STUDY OrgPSS RepPSS AllInc PosInc OrgPSS RepPSS AllInc PosInc DESCRIPTION" [1] "-- ------- -------- -------- -------- -------- ------- ------- ------- ------- -----------------------------------" [1] " 1) 6.6500 6.6191 5.6406 5.1055 5.6635 0.350 0.427 0.476 0.427 Weekly Wage, Native STEM" [1] " 2) 8.0300*** 8.0414 8.5377 7.6353 8.5724 0.035** 0.026** 0.048** 0.026** Weekly Wage, Native College-Educated" [1] " 3) 3.7800** 3.6174 3.4182 3.0270 3.4365 0.067* 0.088* 0.140 0.088* Weekly Wage, Native Non-College-Educated" [1] " 4) 0.5300 0.5349 0.4231 0.4231 0.4032 0.072* 0.170 0.170 0.184 Employment, Native STEM" [1] " 5) 2.4800 2.4942 2.2430 2.2430 1.7466 0.087* 0.141 0.141 0.220 Employment, Native College-Educated" [1] " 6) -5.1700 -5.1701 -7.3076 -7.3076 -7.1418 0.104 0.034** 0.034** 0.029** Employment, Native Non-College-Educated" * Significant at the 10% (0.10) level. ** Significant at the 5% (0.05) level. *** Significant at the 1% (0.01) level.Despite the close replication of the data, there is some difference between OrgPSS (using the original PSS data) and RepPSS (using the replicated PSS data). The significant 8.03 slope for the weekly wage of native college-educated climbs to 8.54 and the less significant 3.78 slope for the weekly wage of native non-college-educated drops to 3.42. There are similar small differences for the other less-significant variables.
The AllInc (filtering all incomes) and PosInc (filtering just positive incomes) does not have a major effect on the key regressions. The significant 8.03 slope for the weekly wage of native college-educated varies between 7.6 and 8.6. They have a bit more effect on the less significant 3.78 slope for the weekly wage of native non-college-educated which drops to between 3.0 and 3.4 for the alternate filtering.
There was generally minor, if any, changes in the p-values obtained using various of the four data files. One notable difference is that the last variable (employment of Native Non-College-Educated) had some significance for a negative correlation in all but the study's data. Still, the p-value appears to have been nearly significant in the study's data.
17. Summary of Replication of Data in PSS_Data.dta
The main finding of interest in the replication of data to this point is the seeming inconsistency of filtering applied to wages versus employment. As mentioned in the prior section, the wage variables were calculated for the population of workers who earned a positive, non-zero income. The employment variables, however, were calculated for the population of workers who earned any valid income, including a zero income. The numbers of workers recorded as earning a zero income is not insignificant as ignoring them made it impossible to replicate the data within any kind of reasonable tolerance.
In any event, this helps underline the importance of replicating a study's result all the way back to the source data. Studies typically extract and aggregate data into one or more data files which are then used to perform the analysis. Often, the programs to do the analysis are provided but the programs to do the extraction and aggregation are not. At most, they are often described somewhat in an appendix as was the case with this study. Replicating the data from the original source accomplishes several things. First, it verifies that the extraction, aggregation, and any cleaning was done correctly. Secondly, it reveals the precise methods that were used and allows alternate methods to be tried. Expanding on this last point, it also allows alternate data or an expanded range of data to be analyzed using the same or modified methods.
One item of note is that this replication is not complete. The following variables have not yet been replicated and the prior analysis currently uses the values provided by the study:
delta_imm_stemO4_H1B_hat80 - instrument for H-1B imputed growth of foreign STEM bartik_coll_wage - instrument to predict the wage growth of college-educated workers based on each city’s industrial composition in 1980 bartik_coll_emp - instrument to predict the employment growth of college-educated workers based on each city’s industrial composition in 1980 bartik_nocoll_wage - instrument to predict the wage growth of non-college-educated workers based on each city’s industrial composition in 1980 bartik_nocoll_emp - instrument to predict the employment growth of non-college-educated workers based on each city’s industrial composition in 1980Following are the regression results using all of the variables:
"1990-2010 USING STUDY'S FORMULA " [1] "" [1] " N INTERCEPT SLOPE STUDY % DIFF S.E. T-STAT P-VAL DESCRIPTION" [1] "-- --------- -------- ------- ------- ------- ------- ------- -----------------------------------" [1] " 1) -0.3322 6.6191 6.6500 -0.46 7.073 0.936 0.350 Weekly Wage, Native STEM" [1] " 2) -0.0764 8.0414 8.0300 0.14 3.804 2.114 0.035 Weekly Wage, Native College-Educated" [1] " 3) 0.0594 3.6174 3.7800 -4.30 1.972 1.835 0.067 Weekly Wage, Native Non-College-Educated" [1] " 4) 0.0272 0.5349 0.5300 0.92 0.296 1.806 0.072 Employment, Native STEM" [1] " 5) 0.0311 2.4942 2.4800 0.57 1.452 1.718 0.087 Employment, Native College-Educated" [1] " 6) -0.2557 -5.1701 -5.1700 0.00 3.178 -1.627 0.104 Employment, Native Non-College-Educated"Next, following are the results when the bartik variables are removed:
"1990-2010 USING STUDY'S FORMULA MINUS BARTIK VARIABLES " [1] "" [1] " N INTERCEPT SLOPE STUDY % DIFF S.E. T-STAT P-VAL DESCRIPTION" [1] "-- --------- -------- ------- ------- ------- ------- ------- -----------------------------------" [1] " 1) -0.0874 5.6811 6.6500 -14.57 6.245 0.910 0.363 Weekly Wage, Native STEM" [1] " 2) 0.0871 8.4148 8.0300 4.79 3.370 2.497 0.013 Weekly Wage, Native College-Educated" [1] " 3) 0.1455 3.5455 3.7800 -6.20 1.979 1.791 0.074 Weekly Wage, Native Non-College-Educated" [1] " 4) 0.0056 0.7897 0.5300 49.00 0.262 3.011 0.003 Employment, Native STEM" [1] " 5) 0.0566 3.6409 2.4800 46.81 1.283 2.838 0.005 Employment, Native College-Educated" [1] " 6) 0.1092 -5.4231 -5.1700 4.89 3.230 -1.679 0.094 Employment, Native Non-College-Educated" [1] ""Finally, following are the results when delta_imm_stemO4_H1B_hat80, the instrument for H-1B imputed growth of foreign STEM, is also removed:
1] "1990-2010 USING STUDY'S FORMULA ON REPLICATED DATA " [1] "" [1] " N INTERCEPT SLOPE STUDY % DIFF S.E. T-STAT P-VAL DESCRIPTION" [1] "-- --------- -------- ------- ------- ------- ------- ------- -----------------------------------" [1] " 1) -0.0830 3.3526 6.6500 -49.58 3.083 1.088 0.277 Weekly Wage, Native STEM" [1] " 2) 0.0946 4.4874 8.0300 -44.12 1.654 2.713 0.007 Weekly Wage, Native College-Educated" [1] " 3) 0.1504 0.9759 3.7800 -74.18 0.970 1.006 0.315 Weekly Wage, Native Non-College-Educated" [1] " 4) 0.0055 0.8456 0.5300 59.54 0.130 6.527 0.000 Employment, Native STEM" [1] " 5) 0.0545 4.7156 2.4800 90.14 0.632 7.466 0.000 Employment, Native College-Educated" [1] " 6) 0.0958 1.5815 -5.1700 -130.59 1.560 1.014 0.311 Employment, Native Non-College-Educated"s [1] ""The following table summarizes the information from the prior three tables:
[1] " INTERCEPTS SLOPES P-VALUES [1] " ----------------------------- ---------------------------- -------------------------" [1] " STUDY STUDY STUDY " [1] " STUDY -H1B IG STUDY -H1B IG STUDY -H1B IG" [1] " N STUDY -BARTIK -BARTIK STUDY -BARTIK -BARTIK STUDY -BARTIK -BARTIK DESCRIPTION" [1] "-- --------- --------- --------- -------- -------- -------- ------- ------- ------- -----------------------------------" [1] " 1) -0.3322 -0.0874 -0.0830 6.6191 5.6811 3.3526 0.350 0.363 0.277 Weekly Wage, Native STEM" [1] " 2) -0.0764 0.0871 0.0946 8.0414 8.4148 4.4874 0.035 0.013 0.007 Weekly Wage, Native College-Educated" [1] " 3) 0.0594 0.1455 0.1504 3.6174 3.5455 0.9759 0.067 0.074 0.315 Weekly Wage, Native Non-College-Educated" [1] " 4) 0.0272 0.0056 0.0055 0.5349 0.7897 0.8456 0.072 0.003 0.000 Employment, Native STEM" [1] " 5) 0.0311 0.0566 0.0545 2.4942 3.6409 4.7156 0.087 0.005 0.000 Employment, Native College-Educated" [1] " 6) -0.2557 0.1092 0.0958 -5.1701 -5.4231 1.5815 0.104 0.094 0.311 Employment, Native Non-College-Educated"As can be seen, removal of the Bartik variables causes significant changes in the intercepts but relatively small changes in the slopes. The additional removal of the H-1B imputed growth variable, on the other hand, causes small changes in the intercept but relatively significant changes in the slopes. Hence, an expanded analysis of this study would do well to replicate these variables as well.
18. UPDATED FINDING: Problem with Study's Values for Labor Force
The above analysis replicated the six key dependent variables and one key independent variable used in the study. However, an updated version of the replication program also replicated labor force and found a serious problem. Following is the result of comparing the labor force as given in the study versus the labor force extracted from IPUMS data:
[1] "COMPARING PSS_Data.dta to IP_Metro_rep8010.csv at tolerance 1" [1] "" [1] "TOTAL DIFF 1990 2000 2005 2010 NA VARIABLE" [1] "----- ---- ---- ---- ---- ---- ---- --------" [1] " 876 0 0 0 0 0 0 nat_stemO4_wkwage" [1] " 876 0 0 0 0 0 0 nat_coll_wkwage" [1] " 876 0 0 0 0 0 0 nat_nocoll_wkwage" [1] " 876 0 0 0 0 0 0 nat_stemO4_emp" [1] " 876 0 0 0 0 0 0 nat_coll_emp" [1] " 876 0 0 0 0 0 0 nat_nocoll_emp" [1] " 876 41 0 1 0 1 39 imm_stemO4" [1] " 876 876 219 219 219 219 0 labforce"As can be seen, every value differs by more than 1 percent. In inspecting the data, it appears that the study is using total population for labor force. As a test, the new replication program was modified to likewise use total population for the labor force. Following is the same comparison as above with this modified data:
[1] "COMPARING PSS_Data.dta to IP_Metro_rep8010_tp.csv at tolerance 1" [1] "" [1] "TOTAL DIFF 1990 2000 2005 2010 NA VARIABLE" [1] "----- ---- ---- ---- ---- ---- ---- --------" [1] " 876 0 0 0 0 0 0 nat_stemO4_wkwage" [1] " 876 0 0 0 0 0 0 nat_coll_wkwage" [1] " 876 0 0 0 0 0 0 nat_nocoll_wkwage" [1] " 876 0 0 0 0 0 0 nat_stemO4_emp" [1] " 876 0 0 0 0 0 0 nat_coll_emp" [1] " 876 0 0 0 0 0 0 nat_nocoll_emp" [1] " 876 41 0 1 0 1 39 imm_stemO4" [1] " 876 0 0 0 0 0 0 labforce"Following is the same comparison with a smaller tolerance of 0.1 percent:
[1] "" [1] "COMPARING PSS_Data.dta to IP_Metro_rep8010_tp.csv at tolerance 0.1" [1] "" [1] "TOTAL DIFF 1990 2000 2005 2010 NA VARIABLE" [1] "----- ---- ---- ---- ---- ---- ---- --------" [1] " 876 446 6 219 219 2 0 nat_stemO4_wkwage" [1] " 876 443 6 219 218 0 0 nat_coll_wkwage" [1] " 876 466 28 219 217 2 0 nat_nocoll_wkwage" [1] " 876 13 8 1 0 4 0 nat_stemO4_emp" [1] " 876 23 15 0 6 2 0 nat_coll_emp" [1] " 876 34 20 1 6 7 0 nat_nocoll_emp" [1] " 876 49 3 2 0 5 39 imm_stemO4" [1] " 876 11 11 0 0 0 0 labforce"Following is a list of the 11 values that differed by more than 0.1 percent:
[1] " 1 labforce -0.19 51694 51595 1990 Anniston, AL" [1] " 2 labforce -0.11 186649 186443 1990 Augusta-Aiken, GA-SC" [1] " 3 labforce -0.12 127440 127289 1990 Bridgeport, CT" [1] " 4 labforce -0.13 94883 94762 1990 Brownsville-Harlingen-San Benito, TX" [1] " 5 labforce -0.12 238580 238289 1990 El Paso, TX" [1] " 6 labforce -0.13 107499 107364 1990 Fayetteville, NC" [1] " 7 labforce -0.12 69402 69321 1990 Medford, OR" [1] " 8 labforce -0.14 108616 108467 1990 New Haven-Meriden, CT" [1] " 9 labforce -0.13 236111 235814 1990 Providence-Fall River-Pawtucket, MA/RI" [1] " 10 labforce -0.11 209787 209562 1990 Stockton, CA" [1] " 11 labforce -0.13 47124 47063 1990 Terre Haute, IN"A review of the study's data file PSS_Data.dta verifies that this variable is called labforce in the file, strongly suggesting that this is the labor force. In the study, the notes for Table 3 state "[t]he dependent variable is the growth in foreign STEM as a percentage of the labor force". Hence, this seems like a serious problem with the data. The effects of this will be looked at more closely in the next section.
Source Code for R Programs and Data Files Used in this Replication