The following Python code and output show the percentages that correspond to the counts shown at this link. See that page for the corresponding counts and explanation of the data. The code below is identical to the code there except that the parameter per is set to 1, indicating that percents are to be output.
import pandas as pd
# ACS Occupation Codes at https://usa.ipums.org/usa/volii/occ_acs.shtml
# (described at https://www.census.gov/content/dam/Census/library/publications/2016/acs/acs-35.pdf)
# 110 = Computer and information systems managers
# 1010 = Computer programmers
# 1020 = Software developers, applications and systems software
# print(pd.get_option('display.width'))
pd.set_option('display.width', 120)
def getCitizenEducByState(min_count, isort, perc, educ, occs, cnames, title, hdrs):
fipref = "https://www2.census.gov/geo/docs/reference/state.txt"
#header=STATE|STUSAB|STATE_NAME|STATENS
mm = pd.read_csv(fipref, skiprows=1, sep='|', names=['Statefip','State','State_Name','Statens'])
#print(mm)
usa = pd.read_csv("acs2016.csv")
if len(occs) > 1:
occ_start = occs[0]
for i in range(1,len(occs)):
if occs[i] >= 0:
occ_end = occs[i]
usa.loc[usa['OCC'] == occ_end,'OCC'] = occs[0]
else:
occ_end = -occs[i]
usa.loc[(usa['OCC'] >= occ_start) & (usa['OCC'] <= occ_end),'OCC'] = occs[0]
occ_start = occ_end + 1
usa.loc[usa['CITIZEN'] == 0,'CITIZEN'] = 1 # change NA to 'Born abroad to American parents'
elen = len(educ)
usa.loc[usa['EDUCD'] < educ[0],'EDUCD'] = 0
usa.loc[usa['EDUCD'] >= educ[elen-1],'EDUCD'] = elen
if elen > 1:
for i in range(1,elen):
usa.loc[(usa['EDUCD'] >= educ[i-1]) & (usa['EDUCD'] < educ[i]),'EDUCD'] = i
usa['CIT_EDUC'] = (3-usa['CITIZEN']) * (elen+1) + usa['EDUCD']
#print("usa[{0}] = {1}".format(usa.shape[0], sum(usa['PERWT'])))
usa = usa[(usa['OCC'] == occs[0]) & (usa['EMPSTAT'] == 1)] # filter OCC and EMPSTAT
print("usa[%d] = %d\n" % (usa.shape[0], sum(usa['PERWT']))) # counts after filtering
gg = usa.groupby(['STATEFIP','CIT_EDUC','EMPSTAT','OCC'])['PERWT'].sum()
uu = gg.unstack('CIT_EDUC')
nce =len(uu.columns)
uu.columns = cnames
uu = uu.fillna(0) # check
uu['count'] = 0
for i in range(0, nce):
uu['count'] += uu.iloc[:,i]
if perc == 1:
for i in range(0, nce):
uu.iloc[:,i] = 100 * uu.iloc[:,i] / uu['count']
uu = uu.reset_index(level=['STATEFIP','EMPSTAT','OCC'])
#pp = uu[(uu['OCC'] == occs[0]) & (uu['EMPSTAT'] == 1) & (uu['count'] > min_count)]
pp = uu[(uu['count'] > min_count)]
pp = pp.sort_values(by=[cnames[isort]], ascending=False)
pp = pp.merge(mm, left_on=['STATEFIP'],right_on=['Statefip'],how='left')
qqState = pp['State']
qq=pd.DataFrame(qqState, columns=['State'])
qq['count']=pp['count'].astype('int')
qq['count']=qq.apply(lambda x: "{:,.0f}".format(x['count']), axis=1)
if perc == 1:
#title = pd.concat(title,' (percent)')
#title = "".join((title,'(percent)'))
title = title + ' (percent)' + '\n'
for i in range(0, nce):
qq[cnames[i]]=pp[cnames[i]].round(1)
else:
title = title + ' (counts)' + '\n'
for i in range(0, nce):
qq[cnames[i]]=pp.apply(lambda x: "{:,.0f}".format(x[cnames[i]]), axis=1)
qq.index += 1
qq.columns = pd.MultiIndex.from_tuples(hdrs, names=['Status','','Degree',''])
print(title)
print(qq)
qq.to_csv("state_comp", sep=';')
The following table describes all of the parameters that are passed to getCitizenEducByState:
Parameter | Usage |
---|---|
min_count | Minimum count to include in output |
isort | Column by which to sort data (0-based) |
perc | 0=output counts, 1=output percents |
educ | IPUMS EDUC detailed codes to define education levels |
occs | ACS OCC codes to define occupations |
cnames | simple column names of citizen/education groups |
title | title of table |
hdrs | multi-row headers of citizen/education groups |
The following code specifies the education levels to be used and sets up a multi-row header that corresponds to those levels. It also specifies other parameters shown in the table above, including occupation codes of 1 through 99999 to indicate all occupations. It then calls getCitizenEducByState to generate the following output.
tt = [('','','','State'),('','','','count'),
('Non-citizen','','Bachelor','& below'),('Non-citizen','','Master','Degree'),('Non-citizen','','above','Master'),
('Naturalized','','Bachelor','& below'),('Naturalized','','Master','Degree'),('Naturalized','','above','Master'),
('U.S. Born' ,'','Bachelor','& below'),('U.S. Born' ,'','Master','Degree'),('U.S. Born' ,'','above','Master')
]
per = 1
educs = [114,115]
cnames = ['non-cit_<','non-cit_ms','non-cit_>','natur_<','natur_ms','natur_>','us-born_<','us-born_ms','us-born_>']
getCitizenEducByState(1, 0, per, educs, [1,-99999], cnames, "All Occupations - Workers by Citizen Status and Education Level",tt)
The following code calls getCitizenEducByState and lists the number of workes with OCC code 1020 (Software developers, applications and systems software) by citizen status and education level.
getCitizenEducByState(1, 0, per, educs, [1020], cnames, "Software Developers - Workers by Citizen Status and Education Level",tt)
The following sections look at other major occupation groups.
getCitizenEducByState(1, 0, per, educs, [1000,-1299], cnames, "Computer and Mathematical Occupations - Non-citizen, Naturalized, and US-born",tt)
getCitizenEducByState(1, 0, per, educs, [3000,-3599], cnames, "Healthcare Practitioners and Technical Occupations - Non-citizen, Naturalized, and US-born",tt)
getCitizenEducByState(1, 0, per, educs, [3600,-3699], cnames, "Healthcare Support Occupations - Non-citizen, Naturalized, and US-born",tt)
Note: As previously mentioned, the counts corresponding to the above percents can be seen at this link.