Output from a prior Jupyter notebook checked the validity of a March 29, 2018 study from the Pew Research Center by looking at Labor Condition Application (LCA) data from 2017. The analysis concludes:
As can be seen, Santa Clara County had over twice as many requests as the number two worksite county, New York County. Santa Clara County is the location of Silicon Valley. Hence, Silicon Valley did have the most requests for H-1B workers by worksite county despite the San Jose-Sunnyvale-Santa Clara, CA area being ranked tenth in the Pew's table of employer metropolitan areas. In fact, the H-1B visa approvals per 100 workers shown in the table appears to be largely meaningless. This is especially the case for the 32 H-1B visa approvals per 100 workers listed for College Station, Texas. This is because the Pew study is comparing the H-1B workers who are working in the Worksite Cities (where they are actually working) to the population of workers in the Employer City (where the company headquarters is located). As shown above, these are often very different cities, especially in the case of [IT consulting firms]https://en.wikipedia.org/wiki/List_of_IT_consulting_firms).
Unlike the Pew study data, the LCA data does have the worksite location. Of course, many workers who was requested were not approved for H-1B visa. For this reason, it's not possible to calculate the ratio of H-1B workers to the number of workers in the cities where they work. However, it turns out that Census data from the annual American Community Survey (ACS) can be used to come up with a reasonable estimate. That data does not contain H-1B status but one can estimate the number of H-1B workers by looking at the number of workers who are not citizens, especially those who work in occupations that attract the most H-1B workers. As shown in Table 8B in this USCIS document, about 69 percent of all H-1B beneficiaries in 2016 were in computer-related occupations. Of course, this may include some workers who are here on other visas such as L-1 workers and F-1 students engaged in Optional Practical Training (OPT).
The following Python code looks at data from the 2016 American Community Survey (ACS). The data can be created by going to the IPUMS USA website, logging in (creating an account, if necessary), and creating an extract with the variables STATEFIP, COUNTY, MET2013, PUMA, CITIZEN, EMPSTAT, and OCC. The variables YEAR, DATANUM, SERIAL, HHWT, GQ, PERNUM, and PERWT are automatically preselected. For samples, select ACS for 2016. For data format, select .csv. For structure, select rectangular. For more information, see IPUMS Documentation: User's Guide. You should receive an email when your extract is ready. You can then download, rename it to acs2016.csv, and place it in the same directory as the following Python code and run the code.
The following code reads the acs2016.csv and will list the percentage of workes with OCC code 1020 (Software developers, applications and systems software) who are US-born citizens, naturalized citizens, and non-citizens. It will do this for all metropolitan areas with 5000 or more such workers display them in descending order of the percent who are non-citizens. Following is the code, followed by the output:
import pandas as pd
# ACS Occupation Codes at https://usa.ipums.org/usa/volii/occ_acs.shtml
# (described at https://www.census.gov/content/dam/Census/library/publications/2016/acs/acs-35.pdf)
# 110 = Computer and information systems managers
# 1010 = Computer programmers
# 1020 = Software developers, applications and systems software
# print(pd.get_option('display.width'))
pd.set_option('display.width', 120)
def getCitizenStatusByMetro(min_count, occs, title):
metref = "https://www2.census.gov/programs-surveys/metro-micro/geographies/reference-files/2017/delineation-files/list2.xls"
xx = pd.read_excel(metref, skiprows=2)
mm = xx.groupby(['CBSA Code'])['CBSA Title'].min()
mm = mm.reset_index(level=['CBSA Code'])
mm.columns = ['CBSA Code','Metro']
mm['Metro Code'] = pd.to_numeric(mm['CBSA Code'],errors='coerce')
usa = pd.read_csv("acs2016.csv")
if len(occs) > 1:
occ_start = occs[0]
for i in range(1,len(occs)):
if occs[i] >= 0:
occ_end = occs[i]
usa.loc[usa['OCC'] == occ_end,'OCC'] = occs[0]
else:
occ_end = -occs[i]
usa.loc[(usa['OCC'] >= occ_start) & (usa['OCC'] <= occ_end),'OCC'] = occs[0]
occ_start = occ_end + 1
#print("usa[{0}] = {1}".format(usa.shape[0], sum(usa['PERWT'])))
print("usa[%d] = %d\n" % (usa.shape[0], sum(usa['PERWT'])))
gg = usa.groupby(['MET2013','CITIZEN','EMPSTAT','OCC'])['PERWT'].sum()
uu = gg.unstack('CITIZEN')
uu.columns =['na','baa','nat','nac']
uu = uu.fillna(0)
uu['count'] = uu['na'] + uu['baa'] + uu['nat'] + uu['nac']
uu['usa_p'] = 100 * (uu['na'] + uu['baa']) / uu['count']
uu['nat_p'] = 100 * uu['nat'] / uu['count']
uu['nac_p'] = 100 * uu['nac'] / uu['count']
uu = uu.reset_index(level=['MET2013','EMPSTAT','OCC'])
pp = uu[(uu['OCC'] == occs[0]) & (uu['EMPSTAT'] == 1) & (uu['count'] > min_count)]
pp = pp.sort_values(by=['nac_p'], ascending=False)
pp = pp.merge(mm, left_on='MET2013',right_on='Metro Code',how='left')
qq=pd.DataFrame(pp['Metro'], columns=['Metro'])
qq['count']=pp['count'].astype('int')
qq['us-born%']=pp['usa_p'].round(1)
qq['natural%']=pp['nat_p'].round(1)
qq['non-cit%']=pp['nac_p'].round(1)
print(title)
print(qq)
qq.to_csv("metro_comp", sep=';')
getCitizenStatusByMetro(5000, [1020], "Software Developers - US-born, Naturalized, and Non-citizen (percent)\n")
As can be seen, the San Jose-Sunnyvale-Santa Clara, CA metro area had the highest percentage of non-citizens working in software development (OCC 1020) at over 49 percent. The nearby San Francisco-Oakland-Hayward, CA metro area came in fourth with a non-citizen percentage of 36.8 percent and the Sacramento--Roseville--Arden-Arcade, CA came in fifth with 36.8 percent.
There is one minor problem with looking at metropolitan areas. They don't cover a number of areas that may have a large number of H-1B workers. For example, they don't cover the Silicon Valley cities of Cupertino (home of Apple), Mountain View (home of Google), and Menlo Park (home of Facebook). The following Python code does pretty much the same as the prior code except that it looks at all counties instead of metropolitan areas with 5000 or more software developers. Following is the code, followed by the resulting output:
import pandas as pd
# ACS Occupation Codes at https://usa.ipums.org/usa/volii/occ_acs.shtml
# (described at https://www.census.gov/content/dam/Census/library/publications/2016/acs/acs-35.pdf)
# 110 = Computer and information systems managers
# 1010 = Computer programmers
# 1020 = Software developers, applications and systems software
# print(pd.get_option('display.width'))
pd.set_option('display.width', 120)
def getCitizenStatusByCounty(min_count, occs, title):
fipref = "https://www2.census.gov/geo/docs/reference/codes/files/national_county.txt"
mm = pd.read_csv(fipref, skiprows=0, names=['State','StateCode','CountyCode','County','H1'])
mm['CountyCode'] *= 10
usa = pd.read_csv("acs2016.csv")
if len(occs) > 1:
occ_start = occs[0]
for i in range(1,len(occs)):
if occs[i] >= 0:
occ_end = occs[i]
usa.loc[usa['OCC'] == occ_end,'OCC'] = occs[0]
else:
occ_end = -occs[i]
usa.loc[(usa['OCC'] >= occ_start) & (usa['OCC'] <= occ_end),'OCC'] = occs[0]
occ_start = occ_end + 1
#print("usa[{0}] = {1}".format(usa.shape[0], sum(usa['PERWT'])))
print("usa[%d] = %d\n" % (usa.shape[0], sum(usa['PERWT'])))
gg = usa.groupby(['STATEFIP','COUNTY','CITIZEN','EMPSTAT','OCC'])['PERWT'].sum()
uu = gg.unstack('CITIZEN')
uu.columns =['na','baa','nat','nac']
uu = uu.fillna(0)
uu['count'] = uu['na'] + uu['baa'] + uu['nat'] + uu['nac']
uu['usa_p'] = 100 * (uu['na'] + uu['baa']) / uu['count']
uu['nat_p'] = 100 * uu['nat'] / uu['count']
uu['nac_p'] = 100 * uu['nac'] / uu['count']
uu = uu.reset_index(level=['STATEFIP','COUNTY','EMPSTAT','OCC'])
pp = uu[(uu['OCC'] == occs[0]) & (uu['EMPSTAT'] == 1) & (uu['count'] > min_count)]
pp = pp.sort_values(by=['nac_p'], ascending=False)
pp = pp[pp['COUNTY'] > 0]
pp = pp.merge(mm, left_on=['STATEFIP','COUNTY'],right_on=['StateCode','CountyCode'],how='left')
qqCounty = pp['County'].str.replace(' County','') + ", " + pp['State']
qq=pd.DataFrame(qqCounty, columns=['County'])
qq['count']=pp['count'].astype('int')
qq['us-born%']=pp['usa_p'].round(1)
qq['natural%']=pp['nat_p'].round(1)
qq['non-cit%']=pp['nac_p'].round(1)
print(title)
print(qq)
qq.to_csv("county_comp", sep=';')
getCitizenStatusByCounty(5000, [1020], "Software Developers - US-born, Naturalized, and Non-citizen (percent)\n")
As can be seen, Santa Clara County has the same 49.2 percent of non-citizens working in software development as did the San Jose-Sunnyvale-Santa Clara, CA metro area. Surprisingly, there's a couple of counties in New Jersey that have a much smaller number of software developers but a higher percentage of them are non-citizen. This could merit further investigation, especially the high 70.7 percent of non-citizen software developers in Hudson County. In any case, San Mateo and Alameda counties, both of which border Santa Clara County, are fifth and sixth with percentages of 46.1 and 42, respectively. San Francisco has a lower percentage of non-citizens software developers at 25.9 percent. In total, the four counties in or close to Silicon Valley (Santa Clara, Alameda, San Mateo, and San Francisco) have 152,991 software developers with 67,056, 43.8 percent of the total, being non-citizens. That appears to be the area with the most H-1B visas for skilled workers, at least for software developers.
The above Python functions can be called with any specified occupations. Following are the results for metro areas when looking at all computer and mathematical occupations according to the ACS Occupation Codes. It also includes the occupation "Computer and information systems managers" (Occupation Code 110) since those jobs are closely related. The minimum count is increased to 20,000 to limit the output to the 47 metro areas with the largest counts.
getCitizenStatusByMetro(20000, [110,1000,-1299], "Computer and Mathematical Occupations - US-born, Naturalized, and Non-citizen (percent)\n")
As can be seen, the San Jose-Sunnyvale-Santa Clara, CA metro area still has the highest percentage of non-citizen workers with the percent dropping from 49.2 for software developers to 41.1 for all computer and mathematical occupations, including managers. If managers are excluded, the percent drops just to 42.7. In any case, the San Francisco-Oakland-Hayward, CA metro area is now second with the percentage of non-citizen workers dropping from 36.8 to 26.5.
Following are the results for counties when looking at all computer and mathematical occupations. The minimum count is again increased to 20,000 to limit the output to the 52 counties with the largest counts.
getCitizenStatusByCounty(20000, [110,1000,-1299], "Computer and Mathematical Occupations - US-born, Naturalized, and Non-citizen (percent)\n")
As can be seen, Hudson and Middesex counties are still at the top with their percentage of non-citizen workers dropping to 52.7 (from 70.7) and 45.0 (from 59.3), respectively. The next three are Santa Clara, San Mateo, and Alameda counties, all in or bordering Silicon Valley. The percentage of non-citizen workers in Santa Clara County drops from 49.2 for software developers to 41.1 for all computer and mathematical occupations, just like the San Jose-Sunnyvale-Santa Clara, CA metro area. The percentages of non-citizen workers in San Mateo and Alameda counties have dropped to 32.1 (from 46.1) and 31.9 (from 42.0), repecively. San Francico County has dropped from 25.9 to 19.9 percent. Hence, Silicon Valley does appear to be the area with the most H-1B visas for skilled workers, even for the expanded category of all computer and mathematical occupations. As mentioned above, Table 8B in this USCIS document shows that about 69 percent of all H-1B beneficiaries in 2016 were in computer-related occupations. Hence, these results are very much contrary to the Pew Study's contention that East Coast and Texas metros had the most H-1B visas for skilled workers from 2010 to 2016. It appears that the contrary results of the Pew Study come from incorrectly assuming that H-1B workers work in the city listed as the Employer City. This assumption appears to be especially incorrect for [IT consulting firms]https://en.wikipedia.org/wiki/List_of_IT_consulting_firms).
Note: The Jupyter Notebook from which this post is generated can be found at http://econdataus.com/pew_h1b_2.ipynb.