Are Skilled Foreign Workers a Boon to Pay? - Description of R Programs

Are Skilled Foreign Workers a Boon to Pay? - Analysis

To reproduce the plots and tables in this analysis, you first need to go to IPUMS USA, and click on "Login" to request an account or sign in if you already have an account. You can then go back to the home page and click on "Browse and Select Data". In order the handle the large amount of data that is extracted, these programs load IPUMS data for just one sample at a time. The variables selected for each sample are the following:

Type       Subtype      Variable   Description
---------  -----------  ---------  -----------
Household  Geographic   STATEFIP   State (FIPS code)
Household  Geographic   COUNTY     County
Household  Geographic   METAREA    Metropolitan area
Household  Geographic   PUMA       Public Use Microdata Area
Person     Demographic  AGE        Age
Person     Race, ...    BPL        Birthplace
Person     Race, ...    CITIZEN    Citizenship status
Person     Race, ...    YRIMMIG    Year of immigration
Person     Education    EDUC       Educational attainment
Person     Work         EMPSTAT    Employment status
Person     Work         OCC1990    Occupation, 1990 basis
Person     Work         CLASSWKR   Class of worker
Person     Work         WKSWORK1   Weeks worked last year
Person     Work         WKSWORK2   Weeks worked last year, intervalled
Person     Income       INCWAGE    Wage and salary income
After selecting the variables, click on "Select Samples" and choose just one of the following samples:
Sample         Filename
-------------  -----------
1980 5% state  ip15_80.dta
1990 5%        ip15_90.dta
2000 5%        ip15_00.dta
2005 ACS       ip15_05.dta
2010 ACS 3yr   ip15_10.dta
The filename is the name of the file into which each sample should be extracted. For the 1980 sample, PUMS will be missing and for the 2010 sample, WKSWORK1 will be missing because they are not defined for those samples. However, the following variables will be authomatically selected for all of the samples:
Type       Subtype      Variable   Description
---------  -----------  ---------  -----------
Household  Technical    YEAR       Census year
Household  Technical    DATANUM    Data set number
Household  Technical    SERIAL     Household serial number
Household  Technical    HHWT       Household weight
Household  Geographic   METAREAD   Metropolitan area [detailed version]
Household  Group Qrtrs  GQ         Group quarters status
Person     Technical    PERNUM     Person number in sample unit
Person     Technical    PERWT      Person weight
Person     Race, ...    BPLD       Birthplace [detailed version]
Person     Education    EDUCD      Educational attainment [detailed version]
Person     Work         EMPSTATD   Employment status [detailed version]
Person     Work         CLASSWKRD  Class of worker [detailed version]
Once the above five files have been downloaded and placed in the local directory, the first R program stem1data.R can be run via the source("stem1data.R") command. Following is the output:
> source("stem1data.R")
[1] "START READ OF ip15_80.dta"
[1] "PROCESS ip15_80.dta"
[1] "11343120  Initial"
[1] "6940893  Age 18-65"
[1] "5331242  Worked 1 or more weeks"
[1] "5316942  Non-institutional"
[1] "5234588  Occupation not Military, Unemployed, or Unknown"
[1] "3548371  Metareas (219)"
[1] "Change 1980 to 2010 dollars"
[1] "3006  Size of aa"
[1] "START READ OF ip15_90.dta"
[1] "PROCESS ip15_90.dta"
[1] "12501046  Initial"
[1] "7707006  Age 18-65"
[1] "6218598  Worked 1 or more weeks"
[1] "6196808  Non-institutional"
[1] "6116316  Occupation not Military, Unemployed, or Unknown"
[1] "3930743  Metareas (219)"
[1] "Change 1990 to 2010 dollars"
[1] "6147  Size of aa"
[1] "START READ OF ip15_00.dta"
[1] "PROCESS ip15_00.dta"
[1] "14081466  Initial"
[1] "8681911  Age 18-65"
[1] "6979381  Worked 1 or more weeks"
[1] "6933702  Non-institutional"
[1] "6877412  Occupation not Military, Unemployed, or Unknown"
[1] "4653729  Metareas (219)"
[1] "Change 2000 to 2010 dollars"
[1] "9459  Size of aa"
[1] "START READ OF ip15_05.dta"
[1] "PROCESS ip15_05.dta"
[1] "2878380  Initial"
[1] "1778997  Age 18-65"
[1] "1439462  Worked 1 or more weeks"
[1] "1439462  Non-institutional"
[1] "1431690  Occupation not Military, Unemployed, or Unknown"
[1] "995671  Metareas (219)"
[1] "Change 2005 to 2010 dollars"
[1] "12471  Size of aa"
[1] "START READ OF ip15_10.dta"
[1] "PROCESS ip15_10.dta"
[1] "Create mm$WKSWORK1"
[1] "9093077  Initial"
[1] "5672423  Age 18-65"
[1] "4438018  Worked 1 or more weeks"
[1] "4410068  Non-institutional"
[1] "4376811  Occupation not Military, Unemployed, or Unknown"
[1] "3059466  Metareas (219)"
[1] "15675  Size of aa"
[1] "ALL IPUMS FILES READ"
[1] "START OF AGGREGATE AND MERGE INTO FINAL FILES"
[1] "CREATE FINAL FILES ipums80.txt and ipums80.csv"
>
For each of the five sample files, the R program will read the file and apply filters so that only individuals with the following characteristics will be included:
  1. Between ages 18 and 65 (inclusive).
  2. Worked 1 or more weeks in the past year.
  3. Are non-institutionalized.
  4. Occupations are not listed as military, unemployed, or unknown.
  5. Live in one of the 219 metropolitan areas that are consistent across all five samples.
This filtering is designed to match the study being replicated. Page 1 of the study's online appendix states:

We focus our analysis on 219 Metropolitan Statistical Areas (MSAs) that are consistently identified from 1980- 2010, excluding individuals who do not live in identified MSAs. Our dependent variables of interest include employment, wage, and rent outcomes.

Our “employment sample” calculates various employment variables by counting the number of workers for different demographic groups. This sample is restricted to noninstitutionalized individuals between ages 18 and 65 (inclusive) who report positive weeks worked over the previous year. We exclude individuals in military occupations, unidentified occupations, and occupations that cannot be consistently identified over time.

As previously mentioned, the 219 metropolitan areas was one item that could be exactly duplicated. The study's online appendix lists the samples from 1980-2010 as the "1980, 1990, and 2000 Census, the 2005 American Community Survey (ACS), and the 2008-2010 3-Year ACS". This list shows 304 metropolitan areas for which there were IPUMS records for one or more of these five samples. However, those colored red are either not in a metropolitan area (the first row) or contain NA (no data) for one or more of the samples. When these 85 rows are removed, exactly 219 rows "that are consistently identified" remain. The program will read the file mets219.txt to retrieve this list for filtering.

In the output, the leading numbers give the total population at each point in the filtering. For example, following is the output for the 1980 sample:

[1] "START READ OF ip15_80.dta"
[1] "PROCESS ip15_80.dta"
[1] "11343120  Initial"
[1] "6940893  Age 18-65"
[1] "5331242  Worked 1 or more weeks"
[1] "5316942  Non-institutional"
[1] "5234588  Occupation not Military, Unemployed, or Unknown"
[1] "3548371  Metareas (219)"
[1] "Change 1980 to 2010 dollars"
[1] "3006  Size of aa"
A START message is output at the start of reading the file and a PROCESS message is output once the file is fully read and is starting to be processed. The next lines indicated that the total IPUMS population for 1980 was 11,343,120 initially, 6,940,893 when including just those between ages 18 and 65, 5,331,242 when also including just those who worked 1 or more weeks, 5,316,942 when excluding those who are institutionalized, 5,234,588 when excluding occupation listed as Military, Unemployed, or Unknown, and 3,548,371 when excluding those outside the 219 metropolitan areas. This is a 5% sample so the initial 11,343,120 actually represents nearly 227 million individuals.

Following that, is a line indicating that the current dollars in which INCWAGE is designated is converted to 2010 dollars. This is done as described on page 2 of the appendix which states:

Annual wage and salary incomes are converted to constant 2010 dollars using the BLS Inflation Calculator (http://www.bls.gov/data/inflation_calculator.htm).

The populations (PERWT), wage income (INCWAGE), and weeks worked (WKSWORK1) are summed for each metropolitan area and for each of 16 groups. These 16 groups are every possible combination of the four true-false attributes of college-graduate, stem worker, immigrant, and employed. The final number 3006 is the total number of records/groups that have currently been tabulated for all metropolitan area. The 3006 records for 1980 represent about 13.7 groups per metropolitan area. The reason that it is not 16 is because some groups have no members.

The above processing is duplicated for each of the five samples such that the total number of records grows to 15675. This is then written out to the space-delimited file ipums80.txt and the comma-delimited file ipums80.csv. This is because stem1data.R is very resource-intensive and can take a number of minutes to run. However, the resulting data set is much smaller and the programs that process it run much faster. Hence, as long as none of the code in stem1data.R needs to change, the resulting data can be quickly processed by various programs.

Next, the program stem1plot.R can be run to create the two plots in the analysis and the following output:

> source("stem1plot.R")
[1] " CORREL"
[1] "   COEF  INTERCEPT    SLOPE  Y VARIABLE ~ X VARIABLE [, WEIGHTS]"
[1] "-------  ---------  -------  -----------------------------------"
[1] "Native College Wages vs. Foreign Stem Workers, 1990-2010"
[1] " 0.3791   12.2571    5.8456  native_coll_wkwage_change ~ immig_stem_change"
[1] " 0.3791   17.1149    3.1644  native_coll_wkwage_change ~ immig_stem_change, weights=immig_stem"
[1] "Native STEM Workers vs. Foreign STEM Workers, 1990-2010"
[1] "-0.0959    0.9323   -0.1116  native_stem_change ~ immig_stem_change"
[1] "-0.0959    0.8803   -0.2765  native_stem_change ~ immig_stem_change, weights=immig_stem"
Press enter to continue, escape to exit
>
The code for stem1plot.R is fairly short and self-explanatory. It does source the programs stem1labels1.R and stem1labels2.R to specify the selected labels for each of the two plots.
Source Code for R Programs Used in this Analysis
Analysis of Key Finding from "Immigration and American Jobs"
Information on H-1B Visas
Commentary on the Skills Gap
Go to Budget Home Page