## The Use of Nonimmigrant Workers in the U.S. Computer Industry by State

The Executive Summary of a [2017 study from the Economic Policy Institute](https://www.epi.org/publication/temporary-foreign-workers-by-the-numbers-new-estimates-by-visa-classification/) begins as follows:

> Many Americans are aware of the often-cited estimate that approximately 11 million unauthorized immigrants reside in the United States. However, the U.S. government does not have an adequate, reliable estimate for the total number of temporary foreign workers who are authorized to be employed in the U.S. labor market in the main nonimmigrant visa classifications that authorize employment.

It turns out that Census data from the annual American Community Survey (ACS) can be used to come up with a reasonable estimate. That data does not contain visa status but one can estimate the number of nonimmigrant visa workers by looking at the number of workers who are not citizens. After all, it would seem that any authorized nonimmigrant worker should have a visa and that any person working with such a visa would be a non-citizen. This should be true whether the visa is an H-1B, L-1, F-1 (for a student engaged in Optional Practical Training) or some other visa.

The following Python code looks at data from the 2016 American Community Survey (ACS). The data can be created by going to the IPUMS USA website, logging in (creating an account, if necessary), and creating an extract with the variables STATEFIP, COUNTY, MET2013, PUMA, CITIZEN, EMPSTAT, and OCC. The variables YEAR, DATANUM, SERIAL, HHWT, GQ, PERNUM, and PERWT are automatically preselected. For samples, select ACS for 2016. For data format, select .csv. For structure, select rectangular. For more information, see IPUMS Documentation: User's Guide. You should receive an email when your extract is ready. You can then download, rename it to acs2016.csv, and place it in the same directory as the following Python code and run the code.

The following code reads the acs2016.csv and will list the percentage of workes with OCC code 1020 (Software developers, applications and systems software) who are US-born citizens, naturalized citizens, and non-citizens. It will do this for all states, displaying them in descending order of the percent who are non-citizens. Following is the code, followed by the output:


In [40]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
# ACS Occupation Codes at https://usa.ipums.org/usa/volii/occ_acs.shtml
# (described at https://www.census.gov/content/dam/Census/library/publications/2016/acs/acs-35.pdf)
#  110 = Computer and information systems managers
# 1010 = Computer programmers
# 1020 = Software developers, applications and systems software 
# print(pd.get_option('display.width'))
pd.set_option('display.width', 120)

def getCitizenStatusByState(min_count, occs, title):
    fipref = "https://www2.census.gov/geo/docs/reference/state.txt"
    #header=STATE|STUSAB|STATE_NAME|STATENS
    #mm = pd.read_csv(fipref, skiprows=1, sep='|', names=['STATE','STUSAB','STATE_NAME','STATENS'])
    mm = pd.read_csv(fipref, skiprows=1, sep='|', names=['Statefip','State','State_Name','Statens'])
    #print(mm)

    usa = pd.read_csv("acs2016.csv")
    if len(occs) > 1:
        occ_start = occs[0]
        for i in range(1,len(occs)):
            if occs[i] >= 0:
                occ_end = occs[i]
                usa.loc[usa['OCC'] == occ_end,'OCC'] = occs[0]
            else:
                occ_end = -occs[i]
                usa.loc[(usa['OCC'] >= occ_start) & (usa['OCC'] <= occ_end),'OCC'] = occs[0]
            occ_start = occ_end + 1
    #print("usa[{0}] = {1}".format(usa.shape[0], sum(usa['PERWT'])))
    print("usa[%d] = %d\n" % (usa.shape[0], sum(usa['PERWT'])))

    gg = usa.groupby(['STATEFIP','CITIZEN','EMPSTAT','OCC'])['PERWT'].sum()
    uu = gg.unstack('CITIZEN')
    uu.columns =['na','baa','nat','nac']
    uu = uu.fillna(0)
    uu['count'] = uu['na'] + uu['baa'] + uu['nat'] + uu['nac']
    uu['nac_p'] = 100 * uu['nac'] / uu['count']
    uu['nat_p'] = 100 * uu['nat'] / uu['count']
    uu['usa_p'] = 100 * (uu['na'] + uu['baa']) / uu['count']
    uu = uu.reset_index(level=['STATEFIP','EMPSTAT','OCC'])
    pp = uu[(uu['OCC'] == occs[0]) & (uu['EMPSTAT'] == 1) & (uu['count'] > min_count)]
    pp = pp.sort_values(by=['nac_p'], ascending=False)
    #pp = pp[pp['COUNTY'] > 0]
    pp = pp.merge(mm, left_on=['STATEFIP'],right_on=['Statefip'],how='left')
    qqState = pp['State']
    qq=pd.DataFrame(qqState, columns=['State'])
    qq['count']=pp['count'].astype('int')
    qq['non-cit%']=pp['nac_p'].round(1)
    qq['natural%']=pp['nat_p'].round(1)
    qq['us-born%']=pp['usa_p'].round(1)
    qq.index += 1
    print(title)
    print(qq)
    qq.to_csv("state_comp", sep=';')

getCitizenStatusByState(1, [1020], "Software Developers - US-born, Naturalized, and Non-citizen (percent)\n")


usa[3156487] = 323127515

Software Developers - US-born, Naturalized, and Non-citizen (percent)

   State   count  non-cit%  natural%  us-born%
1     NJ   49210      40.2      23.7      36.1
2     DE    3211      38.3      20.3      41.4
3     WA   74135      34.9      11.4      53.7
4     CA  269117      34.1      23.0      42.9
5     WI   18925      31.0       2.8      66.2
6     CT   13065      28.1      19.6      52.3
7     NC   34435      27.1      11.7      61.2
8     RI    3723      26.5       3.4      70.1
9     IL   50267      25.7      14.9      59.4
10    KS   12186      24.4       8.9      66.8
11    GA   34985      24.2      14.2      61.6
12    MI   23829      24.1      10.0      66.0
13    ND    2054      23.4       0.0      76.6
14    MA   60387      22.9      16.6      60.5
15    TX   97008      22.4      14.9      62.7
16    PA   38971      22.4       9.5      68.1
17    TN   12993      21.2       4.7      74.0
18    AR    4333      18.9       4.1      77.0
19    IA  

As can be seen, many of the states containing major tech hubs appear to have higher percentages of non-citizen software developers. According to a recent [Forbes article](https://www.forbes.com/sites/karstenstrauss/2017/07/26/americas-biggest-tech-hubs-by-the-jobs/#5e33635b2f15), the largest tech hubs by jobs are San Jose CA, San Francisco CA, Austin TX, Seattle WA, Boston MA, Washington DC, Raleigh NC, and Baltimore MD.  The states containing all of the hubs except for Washington DC and Baltimore MD are between 3 and 15 on the list with non-citizen workers between 34.9 and 22.4 percent. Interestingly, Baltimore's state (MD) and Washington DC are 30 and 40 on the list with non-citizen workers between just 12.8 and 6.3 percent. Hence, those cities near our nation's capital appear to have far fewer non-citizen workers. Also interesting is that there are 7 states that all have over 1000 software developers each without having a single non-citizen worker!  Those states are Montana, Alaska, Mississippi, Maine, West Virginia, Idaho, and Wyoming.  Of those, all but Idaho do not even have a naturalized citizen!  Of course, the Census is a sample so this really means that there were no non-citizen workers found in the sample. 

Following is the same data for workes with OCC code 110 and from 1000 to 1299.  As can be seen from the [ACS Occupation Codes](https://usa.ipums.org/usa/volii/occ_acs.shtml), these include Computer and information systems managers and all workers in Computer and Mathematical Occupations.


In [41]:
getCitizenStatusByState(1, [110,1000,-1299], "Computer and Mathematical Occupations - US-born, Naturalized, and Non-citizen (percent)\n")


usa[3156487] = 323127515

Computer and Mathematical Occupations - US-born, Naturalized, and Non-citizen (percent)

   State   count  non-cit%  natural%  us-born%
1     NJ  208422      24.9      22.3      52.9
2     WA  175878      21.3      10.7      67.9
3     CA  724997      20.7      20.7      58.6
4     DE   16000      19.9      11.3      68.8
5     CT   70449      16.9      10.3      72.8
6     RI   17629      15.9       5.3      78.8
7     MA  167006      15.5      12.5      72.0
8     IL  221425      13.9      14.0      72.1
9     TX  407614      13.3      12.7      74.0
10    GA  165287      12.6      11.0      76.4
11    NC  161050      11.6       9.5      78.9
12    AR   22253      11.2       4.9      83.9
13    FL  237195      10.5      14.8      74.7
14    AZ   97859      10.4       7.5      82.1
15    MN  121767      10.1      11.1      78.8
16    WI   94069       9.6       4.4      86.0
17    VA  243303       9.4      15.1      75.5
18    NY  273539       9.3      17.8   

Once again, most of the aforementioned tech hubs have higher percentages of non-citizen workers (in Computer or Mathematical Occupations). The states containing all of the hubs except for Washington DC and Baltimore MD are between 2 and 11 on the list with non-citizen workers between 21.3 and 11.6 percent. As before, Baltimore's state (MD) and Washington DC are much lower, at 24 and 35 on the list with non-citizen workers between just 7.6 and 5.7 percent. Now, there is just one state, Wyoming, that has no not-citzen or naturalized citizens in the sample.  Still, the large difference in the distribution of non-citizen workers between the states can be seen in these occupations. 

### Splitting U.S.-born, Naturalized, and Non-citizens into Those Without (0) and With (1) an Advanced Degree


In [42]:
import pandas as pd
# ACS Occupation Codes at https://usa.ipums.org/usa/volii/occ_acs.shtml
# (described at https://www.census.gov/content/dam/Census/library/publications/2016/acs/acs-35.pdf)
#  110 = Computer and information systems managers
# 1010 = Computer programmers
# 1020 = Software developers, applications and systems software 
# print(pd.get_option('display.width'))
pd.set_option('display.width', 120)

def getCitizenEducByState(min_count, educ_hi, occs, title):
    fipref = "https://www2.census.gov/geo/docs/reference/state.txt"
    #header=STATE|STUSAB|STATE_NAME|STATENS
    #mm = pd.read_csv(fipref, skiprows=1, sep='|', names=['STATE','STUSAB','STATE_NAME','STATENS'])
    mm = pd.read_csv(fipref, skiprows=1, sep='|', names=['Statefip','State','State_Name','Statens'])
    #print(mm)

    usa = pd.read_csv("acs2016.csv")
    if len(occs) > 1:
        occ_start = occs[0]
        for i in range(1,len(occs)):
            if occs[i] >= 0:
                occ_end = occs[i]
                usa.loc[usa['OCC'] == occ_end,'OCC'] = occs[0]
            else:
                occ_end = -occs[i]
                usa.loc[(usa['OCC'] >= occ_start) & (usa['OCC'] <= occ_end),'OCC'] = occs[0]
            occ_start = occ_end + 1
    usa.loc[usa['EDUCD'] <  educ_hi,'EDUCD'] = 0
    usa.loc[usa['EDUCD'] >= educ_hi,'EDUCD'] = 1
    usa['CIT_EDUC'] = usa['CITIZEN'] * 2 + usa['EDUCD']
    #print("usa[{0}] = {1}".format(usa.shape[0], sum(usa['PERWT'])))
    print("usa[%d] = %d\n" % (usa.shape[0], sum(usa['PERWT'])))

    gg = usa.groupby(['STATEFIP','CIT_EDUC','EMPSTAT','OCC'])['PERWT'].sum()
    uu = gg.unstack('CIT_EDUC')
    uu.columns =['na0','na1','baa0','baa1','nat0','nat1','nac0','nac1']
    uu = uu.fillna(0)
    uu['count'] = uu['na0'] + uu['baa0'] + uu['nat0'] + uu['nac0'] + uu['na1'] + uu['baa1'] + uu['nat1'] + uu['nac1']
    uu['nac_p0'] = 100 * uu['nac0'] / uu['count']
    uu['nac_p1'] = 100 * uu['nac1'] / uu['count']
    uu['nat_p0'] = 100 * uu['nat0'] / uu['count']
    uu['nat_p1'] = 100 * uu['nat1'] / uu['count']
    uu['usa_p0'] = 100 * (uu['na0'] + uu['baa0']) / uu['count']
    uu['usa_p1'] = 100 * (uu['na1'] + uu['baa1']) / uu['count']
    uu = uu.reset_index(level=['STATEFIP','EMPSTAT','OCC'])
    pp = uu[(uu['OCC'] == occs[0]) & (uu['EMPSTAT'] == 1) & (uu['count'] > min_count)]
    pp = pp.sort_values(by=['nac_p0'], ascending=False)
    #pp = pp[pp['COUNTY'] > 0]
    pp = pp.merge(mm, left_on=['STATEFIP'],right_on=['Statefip'],how='left')
    qqState = pp['State']
    qq=pd.DataFrame(qqState, columns=['State'])
    qq['count']=pp['count'].astype('int')
    qq['non-cit0%']=pp['nac_p0'].round(1)
    qq['non-cit1%']=pp['nac_p1'].round(1)
    qq['natural0%']=pp['nat_p0'].round(1)
    qq['natural1%']=pp['nat_p1'].round(1)
    qq['us-born0%']=pp['usa_p0'].round(1)
    qq['us-born1%']=pp['usa_p1'].round(1)
    qq.index += 1
    print(title)
    print(qq)
    qq.to_csv("state_comp", sep=';')

getCitizenEducByState(1, 114, [1020], "Software Developers - US-born, Naturalized, and Non-citizen (percent)\n")


usa[3156487] = 323127515

Software Developers - US-born, Naturalized, and Non-citizen (percent)

   State   count  non-cit0%  non-cit1%  natural0%  natural1%  us-born0%  us-born1%
1     NJ   49210       19.5       20.7       10.3       13.4       27.0        9.1
2     CT   13065       16.7       11.3        8.5       11.1       40.2       12.2
3     NC   34435       16.5       10.6        5.0        6.7       52.3        8.9
4     DE    3211       16.0       22.3       10.7        9.6       26.6       14.8
5     AR    4333       15.3        3.6        4.1        0.0       68.1        8.9
6     WA   74135       14.7       20.2        5.1        6.3       45.3        8.3
7     CA  269117       14.4       19.7       12.1       10.9       33.5        9.4
8     NE    6111       13.3        2.9        4.5        3.1       60.0       16.1
9     MI   23829       12.6       11.4        4.2        5.8       52.2       13.8
10    IL   50267       12.6       13.1        5.7        9.2       44.9  

In [43]:
getCitizenEducByState(1, 114, [110,1000,-1299], "Computer and Mathematical Occupations - US-born, Naturalized, and Non-citizen (percent)\n")


usa[3156487] = 323127515

Computer and Mathematical Occupations - US-born, Naturalized, and Non-citizen (percent)

   State   count  non-cit0%  non-cit1%  natural0%  natural1%  us-born0%  us-born1%
1     NJ  208422       12.2       12.7       12.0       10.2       43.1        9.7
2     CA  724997        9.7       11.1       13.1        7.6       48.2       10.4
3     WA  175878        9.6       11.7        6.4        4.3       57.4       10.5
4     CT   70449        9.6        7.3        5.2        5.1       60.6       12.2
5     DE   16000        8.4       11.5        8.8        2.5       56.5       12.3
6     AR   22253        7.8        3.4        4.6        0.3       75.9        7.9
7     IL  221425        7.2        6.7        7.7        6.3       58.6       13.6
8     TX  407614        6.9        6.4        7.2        5.4       64.4        9.7
9     FL  237195        6.7        3.9       10.3        4.5       64.6       10.2
10    NC  161050        6.4        5.2        5.0      

### Splitting U.S.-born, Naturalized, and Non-citizens into Those Without (0) and With (1) a Post-Masters Degree


In [44]:
getCitizenEducByState(1, 115, [1020], "Software Developers - US-born, Naturalized, and Non-citizen (percent)\n")


usa[3156487] = 323127515

Software Developers - US-born, Naturalized, and Non-citizen (percent)

   State   count  non-cit0%  non-cit1%  natural0%  natural1%  us-born0%  us-born1%
1     NJ   49210       39.0        1.2       23.0        0.7       34.5        1.6
2     DE    3211       38.3        0.0       18.9        1.4       35.9        5.5
3     WA   74135       32.9        2.1        9.5        1.9       52.6        1.0
4     CA  269117       31.3        2.8       21.2        1.8       41.2        1.7
5     WI   18925       29.8        1.1        2.8        0.0       64.9        1.3
6     CT   13065       28.1        0.0       19.1        0.5       51.9        0.4
7     NC   34435       26.7        0.4       10.9        0.9       60.8        0.4
8     RI    3723       26.5        0.0        3.4        0.0       70.1        0.0
9     IL   50267       25.2        0.5       14.3        0.6       57.4        2.1
10    KS   12186       23.9        0.5        8.9        0.0       64.7  

In [45]:
getCitizenEducByState(1, 115, [110,1000,-1299], "Computer and Mathematical Occupations - US-born, Naturalized, and Non-citizen (percent)\n")


usa[3156487] = 323127515

Computer and Mathematical Occupations - US-born, Naturalized, and Non-citizen (percent)

   State   count  non-cit0%  non-cit1%  natural0%  natural1%  us-born0%  us-born1%
1     NJ  208422       24.2        0.7       21.0        1.3       51.6        1.3
2     WA  175878       20.1        1.2        9.4        1.4       66.1        1.8
3     CA  724997       19.1        1.6       19.5        1.2       56.9        1.7
4     DE   16000       19.1        0.8       11.0        0.3       67.4        1.3
5     CT   70449       16.7        0.1        9.6        0.7       71.7        1.1
6     RI   17629       15.4        0.5        4.7        0.6       77.8        1.0
7     MA  167006       14.0        1.4       11.3        1.2       69.5        2.5
8     IL  221425       13.4        0.5       13.3        0.7       70.7        1.5
9     TX  407614       12.9        0.4       11.9        0.8       72.9        1.1
10    GA  165287       12.2        0.4       10.7      

Note: The Jupyter Notebook from which this post is generated can be found at [http://econdataus.com/nonimmigrant_workers.ipynb](http://econdataus.com/nonimmigrant_workers.ipynb).
