The Executive Summary of a 2017 study from the Economic Policy Institute begins as follows:
Many Americans are aware of the often-cited estimate that approximately 11 million unauthorized immigrants reside in the United States. However, the U.S. government does not have an adequate, reliable estimate for the total number of temporary foreign workers who are authorized to be employed in the U.S. labor market in the main nonimmigrant visa classifications that authorize employment.
It turns out that Census data from the annual American Community Survey (ACS) can be used to come up with a reasonable estimate. That data does not contain visa status but one can estimate the number of nonimmigrant visa workers by looking at the number of workers who are not citizens. After all, it would seem that any authorized nonimmigrant worker should have a visa and that any person working with such a visa would be a non-citizen. This should be true whether the visa is an H-1B, L-1, F-1 (for a student engaged in Optional Practical Training) or some other visa. That other visa could be an immigrant visa instead of a nonimmigrant visa. Still, the data at this link suggest that many more nonimmigrant visas are issued each year.
The following Python code looks at data from the 2016 American Community Survey (ACS). The data can be created by going to the IPUMS USA website, logging in (creating an account, if necessary), and creating an extract with the variables STATEFIP, COUNTY, MET2013, PUMA, CITIZEN, EMPSTAT, and OCC. The variables YEAR, DATANUM, SERIAL, HHWT, GQ, PERNUM, and PERWT are automatically preselected. For samples, select ACS for 2016 and for data format, select .csv. For structure, select rectangular. For more information, see IPUMS Documentation: User's Guide. You should receive an email when your extract is ready. You can then download, rename it to acs2016.csv, and place it in the same directory as the following Python code. The following function, getCitizenEducByState, reads acs2016.csv and outputs the specified information.
import pandas as pd
# ACS Occupation Codes at https://usa.ipums.org/usa/volii/occ_acs.shtml
# (described at https://www.census.gov/content/dam/Census/library/publications/2016/acs/acs-35.pdf)
# 110 = Computer and information systems managers
# 1010 = Computer programmers
# 1020 = Software developers, applications and systems software
# print(pd.get_option('display.width'))
pd.set_option('display.width', 120)
def getCitizenEducByState(min_count, isort, perc, educ, occs, cnames, title, hdrs):
fipref = "https://www2.census.gov/geo/docs/reference/state.txt"
#header=STATE|STUSAB|STATE_NAME|STATENS
mm = pd.read_csv(fipref, skiprows=1, sep='|', names=['Statefip','State','State_Name','Statens'])
#print(mm)
usa = pd.read_csv("acs2016.csv")
if len(occs) > 1:
occ_start = occs[0]
for i in range(1,len(occs)):
if occs[i] >= 0:
occ_end = occs[i]
usa.loc[usa['OCC'] == occ_end,'OCC'] = occs[0]
else:
occ_end = -occs[i]
usa.loc[(usa['OCC'] >= occ_start) & (usa['OCC'] <= occ_end),'OCC'] = occs[0]
occ_start = occ_end + 1
usa.loc[usa['CITIZEN'] == 0,'CITIZEN'] = 1 # change NA to 'Born abroad to American parents'
elen = len(educ)
usa.loc[usa['EDUCD'] < educ[0],'EDUCD'] = 0
usa.loc[usa['EDUCD'] >= educ[elen-1],'EDUCD'] = elen
if elen > 1:
for i in range(1,elen):
usa.loc[(usa['EDUCD'] >= educ[i-1]) & (usa['EDUCD'] < educ[i]),'EDUCD'] = i
usa['CIT_EDUC'] = (3-usa['CITIZEN']) * (elen+1) + usa['EDUCD']
#print("usa[{0}] = {1}".format(usa.shape[0], sum(usa['PERWT'])))
usa = usa[(usa['OCC'] == occs[0]) & (usa['EMPSTAT'] == 1)] # filter OCC and EMPSTAT
print("usa[%d] = %d\n" % (usa.shape[0], sum(usa['PERWT']))) # counts after filtering
gg = usa.groupby(['STATEFIP','CIT_EDUC','EMPSTAT','OCC'])['PERWT'].sum()
uu = gg.unstack('CIT_EDUC')
nce =len(uu.columns)
uu.columns = cnames
uu = uu.fillna(0) # check
uu['count'] = 0
for i in range(0, nce):
uu['count'] += uu.iloc[:,i]
if perc == 1:
for i in range(0, nce):
uu.iloc[:,i] = 100 * uu.iloc[:,i] / uu['count']
uu = uu.reset_index(level=['STATEFIP','EMPSTAT','OCC'])
#pp = uu[(uu['OCC'] == occs[0]) & (uu['EMPSTAT'] == 1) & (uu['count'] > min_count)]
pp = uu[(uu['count'] > min_count)]
pp = pp.sort_values(by=[cnames[isort]], ascending=False)
pp = pp.merge(mm, left_on=['STATEFIP'],right_on=['Statefip'],how='left')
qqState = pp['State']
qq=pd.DataFrame(qqState, columns=['State'])
qq['count']=pp['count'].astype('int')
qq['count']=qq.apply(lambda x: "{:,.0f}".format(x['count']), axis=1)
if perc == 1:
#title = pd.concat(title,' (percent)')
#title = "".join((title,'(percent)'))
title = title + ' (percent)' + '\n'
for i in range(0, nce):
qq[cnames[i]]=pp[cnames[i]].round(1)
else:
title = title + ' (counts)' + '\n'
for i in range(0, nce):
qq[cnames[i]]=pp.apply(lambda x: "{:,.0f}".format(x[cnames[i]]), axis=1)
qq.index += 1
qq.columns = pd.MultiIndex.from_tuples(hdrs, names=['Status','','Degree',''])
print(title)
print(qq)
qq.to_csv("state_comp", sep=';')
The following table describes all of the parameters that are passed to getCitizenEducByState:
Parameter | Usage |
---|---|
min_count | Minimum count to include in output |
isort | Column by which to sort data (0-based) |
perc | 0=output counts, 1=output percents |
educ | IPUMS EDUC detailed codes to define education levels |
occs | ACS OCC codes to define occupations |
cnames | simple column names of citizen/education groups |
title | title of table |
hdrs | multi-row headers of citizen/education groups |
The following code specifies the education levels to be used and sets up a multi-row header that corresponds to those levels. It also specifies other parameters shown in the table above, including occupation codes of 1 through 99999 to indicate all occupations. It then calls getCitizenEducByState to generate the following output.
tt = [('','','','State'),('','','','count'),
('Non-citizen','','Bachelor','& below'),('Non-citizen','','Master','Degree'),('Non-citizen','','above','Master'),
('Naturalized','','Bachelor','& below'),('Naturalized','','Master','Degree'),('Naturalized','','above','Master'),
('U.S. Born' ,'','Bachelor','& below'),('U.S. Born' ,'','Master','Degree'),('U.S. Born' ,'','above','Master')
]
per = 0
educs = [114,115]
cnames = ['non-cit_<','non-cit_ms','non-cit_>','natur_<','natur_ms','natur_>','us-born_<','us-born_ms','us-born_>']
getCitizenEducByState(1, 0, per, educs, [1,-99999], cnames, "All Occupations - Workers by Citizen Status and Education Level",tt)
As can be seen above, California has the most non-citizen workers with a Bachelor's degree or less with over 2.8 million. The percentage of the total count for all of these numbers can be seen at this link. As can be seen there, the 2.8 million workers represent 15.2 percent of all workers. Many of these workers are likely on un low-skilled worker visas such as H-2A and H-2B since the numbers include all occupations.
The following code calls getCitizenEducByState and lists the number of workes with OCC code 1020 (Software developers, applications and systems software) by citizen status and education level. In this case, nearly all of the non-citizen workers are likely on skilled visas like an H-1B, L-1, F-1 (for a student engaged in Optional Practical Training).
getCitizenEducByState(1, 0, per, educs, [1020], cnames, "Software Developers - Workers by Citizen Status and Education Level",tt)
The following sections look at other major occupation groups.
getCitizenEducByState(1, 0, per, educs, [1000,-1299], cnames, "Computer and Mathematical Occupations - Non-citizen, Naturalized, and US-born",tt)
getCitizenEducByState(1, 0, per, educs, [3000,-3599], cnames, "Healthcare Practitioners and Technical Occupations - Non-citizen, Naturalized, and US-born",tt)
getCitizenEducByState(1, 0, per, educs, [3600,-3699], cnames, "Healthcare Support Occupations - Non-citizen, Naturalized, and US-born",tt)
Note: As previously mentioned, the percentage of the total count for all of these numbers can be seen at this link.