The Executive Summary of a 2017 study from the Economic Policy Institute begins as follows:
Many Americans are aware of the often-cited estimate that approximately 11 million unauthorized immigrants reside in the United States. However, the U.S. government does not have an adequate, reliable estimate for the total number of temporary foreign workers who are authorized to be employed in the U.S. labor market in the main nonimmigrant visa classifications that authorize employment.
It turns out that Census data from the annual American Community Survey (ACS) can be used to come up with a reasonable estimate. That data does not contain visa status but one can estimate the number of nonimmigrant visa workers by looking at the number of workers who are not citizens. After all, it would seem that any authorized nonimmigrant worker should have a visa and that any person working with such a visa would be a non-citizen. This should be true whether the visa is an H-1B, L-1, F-1 (for a student engaged in Optional Practical Training) or some other visa.
The following Python code looks at data from the 2016 American Community Survey (ACS). The data can be created by going to the IPUMS USA website, logging in (creating an account, if necessary), and creating an extract with the variables STATEFIP, COUNTY, MET2013, PUMA, CITIZEN, EMPSTAT, and OCC. The variables YEAR, DATANUM, SERIAL, HHWT, GQ, PERNUM, and PERWT are automatically preselected. For samples, select ACS for 2016. For data format, select .csv. For structure, select rectangular. For more information, see IPUMS Documentation: User's Guide. You should receive an email when your extract is ready. You can then download, rename it to acs2016.csv, and place it in the same directory as the following Python code and run the code.
The following code reads the acs2016.csv and will list the percentage of workes with OCC code 1020 (Software developers, applications and systems software) who are US-born citizens, naturalized citizens, and non-citizens. It will do this for all states, displaying them in descending order of the percent who are non-citizens. Following is the code, followed by the output:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
# ACS Occupation Codes at https://usa.ipums.org/usa/volii/occ_acs.shtml
# (described at https://www.census.gov/content/dam/Census/library/publications/2016/acs/acs-35.pdf)
# 110 = Computer and information systems managers
# 1010 = Computer programmers
# 1020 = Software developers, applications and systems software
# print(pd.get_option('display.width'))
pd.set_option('display.width', 120)
def getCitizenStatusByState(min_count, occs, title):
fipref = "https://www2.census.gov/geo/docs/reference/state.txt"
#header=STATE|STUSAB|STATE_NAME|STATENS
#mm = pd.read_csv(fipref, skiprows=1, sep='|', names=['STATE','STUSAB','STATE_NAME','STATENS'])
mm = pd.read_csv(fipref, skiprows=1, sep='|', names=['Statefip','State','State_Name','Statens'])
#print(mm)
usa = pd.read_csv("acs2016.csv")
if len(occs) > 1:
occ_start = occs[0]
for i in range(1,len(occs)):
if occs[i] >= 0:
occ_end = occs[i]
usa.loc[usa['OCC'] == occ_end,'OCC'] = occs[0]
else:
occ_end = -occs[i]
usa.loc[(usa['OCC'] >= occ_start) & (usa['OCC'] <= occ_end),'OCC'] = occs[0]
occ_start = occ_end + 1
#print("usa[{0}] = {1}".format(usa.shape[0], sum(usa['PERWT'])))
print("usa[%d] = %d\n" % (usa.shape[0], sum(usa['PERWT'])))
gg = usa.groupby(['STATEFIP','CITIZEN','EMPSTAT','OCC'])['PERWT'].sum()
uu = gg.unstack('CITIZEN')
uu.columns =['na','baa','nat','nac']
uu = uu.fillna(0)
uu['count'] = uu['na'] + uu['baa'] + uu['nat'] + uu['nac']
uu['nac_p'] = 100 * uu['nac'] / uu['count']
uu['nat_p'] = 100 * uu['nat'] / uu['count']
uu['usa_p'] = 100 * (uu['na'] + uu['baa']) / uu['count']
uu = uu.reset_index(level=['STATEFIP','EMPSTAT','OCC'])
pp = uu[(uu['OCC'] == occs[0]) & (uu['EMPSTAT'] == 1) & (uu['count'] > min_count)]
pp = pp.sort_values(by=['nac_p'], ascending=False)
#pp = pp[pp['COUNTY'] > 0]
pp = pp.merge(mm, left_on=['STATEFIP'],right_on=['Statefip'],how='left')
qqState = pp['State']
qq=pd.DataFrame(qqState, columns=['State'])
qq['count']=pp['count'].astype('int')
qq['non-cit%']=pp['nac_p'].round(1)
qq['natural%']=pp['nat_p'].round(1)
qq['us-born%']=pp['usa_p'].round(1)
qq.index += 1
print(title)
print(qq)
qq.to_csv("state_comp", sep=';')
getCitizenStatusByState(1, [1020], "Software Developers - US-born, Naturalized, and Non-citizen (percent)\n")
As can be seen, many of the states containing major tech hubs appear to have higher percentages of non-citizen software developers. According to a recent Forbes article, the largest tech hubs by jobs are San Jose CA, San Francisco CA, Austin TX, Seattle WA, Boston MA, Washington DC, Raleigh NC, and Baltimore MD. The states containing all of the hubs except for Washington DC and Baltimore MD are between 3 and 15 on the list with non-citizen workers between 34.9 and 22.4 percent. Interestingly, Baltimore's state (MD) and Washington DC are 30 and 40 on the list with non-citizen workers between just 12.8 and 6.3 percent. Hence, those cities near our nation's capital appear to have far fewer non-citizen workers. Also interesting is that there are 7 states that all have over 1000 software developers each without having a single non-citizen worker! Those states are Montana, Alaska, Mississippi, Maine, West Virginia, Idaho, and Wyoming. Of those, all but Idaho do not even have a naturalized citizen! Of course, the Census is a sample so this really means that there were no non-citizen workers found in the sample.
Following is the same data for workes with OCC code 110 and from 1000 to 1299. As can be seen from the ACS Occupation Codes, these include Computer and information systems managers and all workers in Computer and Mathematical Occupations.
getCitizenStatusByState(1, [110,1000,-1299], "Computer and Mathematical Occupations - US-born, Naturalized, and Non-citizen (percent)\n")
Once again, most of the aforementioned tech hubs have higher percentages of non-citizen workers (in Computer or Mathematical Occupations). The states containing all of the hubs except for Washington DC and Baltimore MD are between 2 and 11 on the list with non-citizen workers between 21.3 and 11.6 percent. As before, Baltimore's state (MD) and Washington DC are much lower, at 24 and 35 on the list with non-citizen workers between just 7.6 and 5.7 percent. Now, there is just one state, Wyoming, that has no not-citzen or naturalized citizens in the sample. Still, the large difference in the distribution of non-citizen workers between the states can be seen in these occupations.
import pandas as pd
# ACS Occupation Codes at https://usa.ipums.org/usa/volii/occ_acs.shtml
# (described at https://www.census.gov/content/dam/Census/library/publications/2016/acs/acs-35.pdf)
# 110 = Computer and information systems managers
# 1010 = Computer programmers
# 1020 = Software developers, applications and systems software
# print(pd.get_option('display.width'))
pd.set_option('display.width', 120)
def getCitizenEducByState(min_count, educ_hi, occs, title):
fipref = "https://www2.census.gov/geo/docs/reference/state.txt"
#header=STATE|STUSAB|STATE_NAME|STATENS
#mm = pd.read_csv(fipref, skiprows=1, sep='|', names=['STATE','STUSAB','STATE_NAME','STATENS'])
mm = pd.read_csv(fipref, skiprows=1, sep='|', names=['Statefip','State','State_Name','Statens'])
#print(mm)
usa = pd.read_csv("acs2016.csv")
if len(occs) > 1:
occ_start = occs[0]
for i in range(1,len(occs)):
if occs[i] >= 0:
occ_end = occs[i]
usa.loc[usa['OCC'] == occ_end,'OCC'] = occs[0]
else:
occ_end = -occs[i]
usa.loc[(usa['OCC'] >= occ_start) & (usa['OCC'] <= occ_end),'OCC'] = occs[0]
occ_start = occ_end + 1
usa.loc[usa['EDUCD'] < educ_hi,'EDUCD'] = 0
usa.loc[usa['EDUCD'] >= educ_hi,'EDUCD'] = 1
usa['CIT_EDUC'] = usa['CITIZEN'] * 2 + usa['EDUCD']
#print("usa[{0}] = {1}".format(usa.shape[0], sum(usa['PERWT'])))
print("usa[%d] = %d\n" % (usa.shape[0], sum(usa['PERWT'])))
gg = usa.groupby(['STATEFIP','CIT_EDUC','EMPSTAT','OCC'])['PERWT'].sum()
uu = gg.unstack('CIT_EDUC')
uu.columns =['na0','na1','baa0','baa1','nat0','nat1','nac0','nac1']
uu = uu.fillna(0)
uu['count'] = uu['na0'] + uu['baa0'] + uu['nat0'] + uu['nac0'] + uu['na1'] + uu['baa1'] + uu['nat1'] + uu['nac1']
uu['nac_p0'] = 100 * uu['nac0'] / uu['count']
uu['nac_p1'] = 100 * uu['nac1'] / uu['count']
uu['nat_p0'] = 100 * uu['nat0'] / uu['count']
uu['nat_p1'] = 100 * uu['nat1'] / uu['count']
uu['usa_p0'] = 100 * (uu['na0'] + uu['baa0']) / uu['count']
uu['usa_p1'] = 100 * (uu['na1'] + uu['baa1']) / uu['count']
uu = uu.reset_index(level=['STATEFIP','EMPSTAT','OCC'])
pp = uu[(uu['OCC'] == occs[0]) & (uu['EMPSTAT'] == 1) & (uu['count'] > min_count)]
pp = pp.sort_values(by=['nac_p0'], ascending=False)
#pp = pp[pp['COUNTY'] > 0]
pp = pp.merge(mm, left_on=['STATEFIP'],right_on=['Statefip'],how='left')
qqState = pp['State']
qq=pd.DataFrame(qqState, columns=['State'])
qq['count']=pp['count'].astype('int')
qq['non-cit0%']=pp['nac_p0'].round(1)
qq['non-cit1%']=pp['nac_p1'].round(1)
qq['natural0%']=pp['nat_p0'].round(1)
qq['natural1%']=pp['nat_p1'].round(1)
qq['us-born0%']=pp['usa_p0'].round(1)
qq['us-born1%']=pp['usa_p1'].round(1)
qq.index += 1
print(title)
print(qq)
qq.to_csv("state_comp", sep=';')
getCitizenEducByState(1, 114, [1020], "Software Developers - US-born, Naturalized, and Non-citizen (percent)\n")
getCitizenEducByState(1, 114, [110,1000,-1299], "Computer and Mathematical Occupations - US-born, Naturalized, and Non-citizen (percent)\n")
getCitizenEducByState(1, 115, [1020], "Software Developers - US-born, Naturalized, and Non-citizen (percent)\n")
getCitizenEducByState(1, 115, [110,1000,-1299], "Computer and Mathematical Occupations - US-born, Naturalized, and Non-citizen (percent)\n")
Note: The Jupyter Notebook from which this post is generated can be found at http://econdataus.com/nonimmigrant_workers.ipynb.