Are Skilled Foreign Workers a Boon to Pay? - Analysis
To reproduce the plots and tables in this analysis, you first need to go to IPUMS USA, and click on "Login" to request an account or sign in if you already have an account. You can then go back to the home page and click on "Browse and Select Data". In order the handle the large amount of data that is extracted, these programs load IPUMS data for just one sample at a time. The variables selected for each sample are the following:
Type Subtype Variable Description --------- ----------- --------- ----------- Household Geographic STATEFIP State (FIPS code) Household Geographic COUNTY County Household Geographic METAREA Metropolitan area Household Geographic PUMA Public Use Microdata Area Person Demographic AGE Age Person Race, ... BPL Birthplace Person Race, ... CITIZEN Citizenship status Person Race, ... YRIMMIG Year of immigration Person Education EDUC Educational attainment Person Work EMPSTAT Employment status Person Work OCC1990 Occupation, 1990 basis Person Work CLASSWKR Class of worker Person Work WKSWORK1 Weeks worked last year Person Work WKSWORK2 Weeks worked last year, intervalled Person Income INCWAGE Wage and salary incomeAfter selecting the variables, click on "Select Samples" and choose just one of the following samples:
Sample Filename ------------- ----------- 1980 5% state ip15_80.dta 1990 5% ip15_90.dta 2000 5% ip15_00.dta 2005 ACS ip15_05.dta 2010 ACS 3yr ip15_10.dtaThe filename is the name of the file into which each sample should be extracted. For the 1980 sample, PUMS will be missing and for the 2010 sample, WKSWORK1 will be missing because they are not defined for those samples. However, the following variables will be authomatically selected for all of the samples:
Type Subtype Variable Description --------- ----------- --------- ----------- Household Technical YEAR Census year Household Technical DATANUM Data set number Household Technical SERIAL Household serial number Household Technical HHWT Household weight Household Geographic METAREAD Metropolitan area [detailed version] Household Group Qrtrs GQ Group quarters status Person Technical PERNUM Person number in sample unit Person Technical PERWT Person weight Person Race, ... BPLD Birthplace [detailed version] Person Education EDUCD Educational attainment [detailed version] Person Work EMPSTATD Employment status [detailed version] Person Work CLASSWKRD Class of worker [detailed version]Once the above five files have been downloaded and placed in the local directory, the first R program stem1data.R can be run via the source("stem1data.R") command. Following is the output:
> source("stem1data.R") [1] "START READ OF ip15_80.dta" [1] "PROCESS ip15_80.dta" [1] "11343120 Initial" [1] "6940893 Age 18-65" [1] "5331242 Worked 1 or more weeks" [1] "5316942 Non-institutional" [1] "5234588 Occupation not Military, Unemployed, or Unknown" [1] "3548371 Metareas (219)" [1] "Change 1980 to 2010 dollars" [1] "3006 Size of aa" [1] "START READ OF ip15_90.dta" [1] "PROCESS ip15_90.dta" [1] "12501046 Initial" [1] "7707006 Age 18-65" [1] "6218598 Worked 1 or more weeks" [1] "6196808 Non-institutional" [1] "6116316 Occupation not Military, Unemployed, or Unknown" [1] "3930743 Metareas (219)" [1] "Change 1990 to 2010 dollars" [1] "6147 Size of aa" [1] "START READ OF ip15_00.dta" [1] "PROCESS ip15_00.dta" [1] "14081466 Initial" [1] "8681911 Age 18-65" [1] "6979381 Worked 1 or more weeks" [1] "6933702 Non-institutional" [1] "6877412 Occupation not Military, Unemployed, or Unknown" [1] "4653729 Metareas (219)" [1] "Change 2000 to 2010 dollars" [1] "9459 Size of aa" [1] "START READ OF ip15_05.dta" [1] "PROCESS ip15_05.dta" [1] "2878380 Initial" [1] "1778997 Age 18-65" [1] "1439462 Worked 1 or more weeks" [1] "1439462 Non-institutional" [1] "1431690 Occupation not Military, Unemployed, or Unknown" [1] "995671 Metareas (219)" [1] "Change 2005 to 2010 dollars" [1] "12471 Size of aa" [1] "START READ OF ip15_10.dta" [1] "PROCESS ip15_10.dta" [1] "Create mm$WKSWORK1" [1] "9093077 Initial" [1] "5672423 Age 18-65" [1] "4438018 Worked 1 or more weeks" [1] "4410068 Non-institutional" [1] "4376811 Occupation not Military, Unemployed, or Unknown" [1] "3059466 Metareas (219)" [1] "15675 Size of aa" [1] "ALL IPUMS FILES READ" [1] "START OF AGGREGATE AND MERGE INTO FINAL FILES" [1] "CREATE FINAL FILES ipums80.txt and ipums80.csv" >For each of the five sample files, the R program will read the file and apply filters so that only individuals with the following characteristics will be included:
This filtering is designed to match the study being replicated. Page 1 of the study's online appendix states:
- Between ages 18 and 65 (inclusive).
- Worked 1 or more weeks in the past year.
- Are non-institutionalized.
- Occupations are not listed as military, unemployed, or unknown.
- Live in one of the 219 metropolitan areas that are consistent across all five samples.
We focus our analysis on 219 Metropolitan Statistical Areas (MSAs) that are consistently identified from 1980- 2010, excluding individuals who do not live in identified MSAs. Our dependent variables of interest include employment, wage, and rent outcomes.
Our “employment sample” calculates various employment variables by counting the number of workers for different demographic groups. This sample is restricted to noninstitutionalized individuals between ages 18 and 65 (inclusive) who report positive weeks worked over the previous year. We exclude individuals in military occupations, unidentified occupations, and occupations that cannot be consistently identified over time.
As previously mentioned, the 219 metropolitan areas was one item that could be exactly duplicated. The study's online appendix lists the samples from 1980-2010 as the "1980, 1990, and 2000 Census, the 2005 American Community Survey (ACS), and the 2008-2010 3-Year ACS". This list shows 304 metropolitan areas for which there were IPUMS records for one or more of these five samples. However, those colored red are either not in a metropolitan area (the first row) or contain NA (no data) for one or more of the samples. When these 85 rows are removed, exactly 219 rows "that are consistently identified" remain. The program will read the file mets219.txt to retrieve this list for filtering.
In the output, the leading numbers give the total population at each point in the filtering. For example, following is the output for the 1980 sample:
[1] "START READ OF ip15_80.dta" [1] "PROCESS ip15_80.dta" [1] "11343120 Initial" [1] "6940893 Age 18-65" [1] "5331242 Worked 1 or more weeks" [1] "5316942 Non-institutional" [1] "5234588 Occupation not Military, Unemployed, or Unknown" [1] "3548371 Metareas (219)" [1] "Change 1980 to 2010 dollars" [1] "3006 Size of aa"A START message is output at the start of reading the file and a PROCESS message is output once the file is fully read and is starting to be processed. The next lines indicated that the total IPUMS population for 1980 was 11,343,120 initially, 6,940,893 when including just those between ages 18 and 65, 5,331,242 when also including just those who worked 1 or more weeks, 5,316,942 when excluding those who are institutionalized, 5,234,588 when excluding occupation listed as Military, Unemployed, or Unknown, and 3,548,371 when excluding those outside the 219 metropolitan areas. This is a 5% sample so the initial 11,343,120 actually represents nearly 227 million individuals.
Following that, is a line indicating that the current dollars in which INCWAGE is designated is converted to 2010 dollars. This is done as described on page 2 of the appendix which states:
Annual wage and salary incomes are converted to constant 2010 dollars using the BLS Inflation Calculator (http://www.bls.gov/data/inflation_calculator.htm).
The populations (PERWT), wage income (INCWAGE), and weeks worked (WKSWORK1) are summed for each metropolitan area and for each of 16 groups. These 16 groups are every possible combination of the four true-false attributes of college-graduate, stem worker, immigrant, and employed. The final number 3006 is the total number of records/groups that have currently been tabulated for all metropolitan area. The 3006 records for 1980 represent about 13.7 groups per metropolitan area. The reason that it is not 16 is because some groups have no members.
The above processing is duplicated for each of the five samples such that the total number of records grows to 15675. This is then written out to the space-delimited file ipums80.txt and the comma-delimited file ipums80.csv. This is because stem1data.R is very resource-intensive and can take a number of minutes to run. However, the resulting data set is much smaller and the programs that process it run much faster. Hence, as long as none of the code in stem1data.R needs to change, the resulting data can be quickly processed by various programs.
Next, the program stem1plot.R can be run to create the two plots in the analysis and the following output:
> source("stem1plot.R") [1] " CORREL" [1] " COEF INTERCEPT SLOPE Y VARIABLE ~ X VARIABLE [, WEIGHTS]" [1] "------- --------- ------- -----------------------------------" [1] "Native College Wages vs. Foreign Stem Workers, 1990-2010" [1] " 0.3791 12.2571 5.8456 native_coll_wkwage_change ~ immig_stem_change" [1] " 0.3791 17.1149 3.1644 native_coll_wkwage_change ~ immig_stem_change, weights=immig_stem" [1] "Native STEM Workers vs. Foreign STEM Workers, 1990-2010" [1] "-0.0959 0.9323 -0.1116 native_stem_change ~ immig_stem_change" [1] "-0.0959 0.8803 -0.2765 native_stem_change ~ immig_stem_change, weights=immig_stem" Press enter to continue, escape to exit >The code for stem1plot.R is fairly short and self-explanatory. It does source the programs stem1labels1.R and stem1labels2.R to specify the selected labels for each of the two plots.