The Wikipedia page for Labor Condition Application states the following:
The Labor Condition Application (LCA) is an application filed by prospective employers on behalf of workers applying for work authorization for the non-immigrant statuses H-1B, H-1B1 (a variant of H-1B for people from Singapore and Chile) and E-3 (a variant of H-1B for workers from Australia). The application is submitted to and needs to be approved by the United States Department of Labor Employment and Training Administration (DOLETA)'s Office of Foreign Labor Certification (OFLC). The form used to submit the application is ETA Form 9035.
Links to disclosure data from these applications can be found by going to the U.S. Department of Labor website and clicking on the Disclosure Data tab. These include links to disclosure data for fiscal years 2015 through 2018 at H-1B_FY2015.xlsx, H-1B_FY2016.xlsx, H-1B_FY2017.xlsx, and H-1B_FY2018.xlsx. The actual names of the files at these links are H-1B_Disclosure_Data_FY15_Q4.xlsx, H-1B_Disclosure_Data_FY16.xlsx, H-1B_Disclosure_Data_FY17.xlsx, and H-1B_FY2018.xlsx. These files need to be downloaded to the same directory as this Python notebook in order for them to be read by the following code. It can take several minutes to load the XLSX files the first time so the code writes the needed subsets of the data out to CSV files which load quickly in subsequent runs.
import pandas as pd
import os.path
import time
pd.set_option('display.width', 120)
#pd.set_option('max_rows', 200)
def load_lca(year):
fields = ['CASE_STATUS','VISA_CLASS','EMPLOYER_NAME','EMPLOYER_CITY','EMPLOYER_STATE',
'EMPLOYER_POSTAL_CODE','SOC_NAME','TOTAL_WORKERS','FULL_TIME_POSITION','PW_WAGE_LEVEL',
'WORKSITE_CITY','WORKSITE_COUNTY','WORKSITE_STATE','WORKSITE_POSTAL_CODE']
if year == 2015:
xlsx_file = 'H-1B_Disclosure_Data_FY15_Q4.xlsx'
elif year == 2016:
xlsx_file = 'H-1B_Disclosure_Data_FY16.xlsx'
fields.remove('PW_WAGE_LEVEL') # avoid warning for missing label
elif year == 2017:
xlsx_file = 'H-1B_Disclosure_Data_FY17.xlsx'
elif year == 2018:
xlsx_file = 'H-1B_FY2018.xlsx'
else:
print("ERROR: Invalid year " + str(year))
return None
yr = year % 100
csv_file = 'H-1B_FY'+str(yr)+'.csv'
if (os.path.isfile(csv_file)):
start = time.time()
yy = pd.read_csv(csv_file)
end = time.time()
print('Seconds to load: ' + str(end - start))
else:
print("Loading " + xlsx_file)
start = time.time()
zz = pd.read_excel(xlsx_file)
end = time.time()
print('Seconds to load: ' + str(end - start))
if year == 2015:
zz['TOTAL_WORKERS'] = zz['TOTAL WORKERS']
#zz.rename(columns={'TOTAL WORKERS':'TOTAL_WORKERS'})
yy = zz.loc[:, fields]
if year == 2016:
yy['PW_WAGE_LEVEL'] = None # avoid warning for missing label
yy.to_csv(csv_file)
return yy
dd15 = load_lca(2015)
dd16 = load_lca(2016)
dd17 = load_lca(2017)
dd18 = load_lca(2018)
As can be seen in the output above, it takes the code several minutes to read each of the XLSX file. The code therefore writes the needed subset of data out to CSV files and attempts to read these first in future executions. Surprisingly, it takes only a couple of seconds to read each of the CSV files.
The following two python functions can be used to filter and group the loaded LCA data. Specifically, function group_lca will filter and group the data loaded from one year's LCA file and function group_lca1518 will call group_lca to filter and group the data for each of years 2015 through 2018 and join the data into a single dataframe. The calls to load_lca following the functions load the data for the four years and the call to group_lca1518 groups the data by CASE_STATUS. The None in the last two arguments specifies that there is no filtering for this call.
def group_lca(gg, label, group, filter, value, inflag=False, numeric=False):
if filter != None:
for i in range(len(filter)):
if inflag:
gg = gg[gg[filter[i]].str.contains(value[i]) == True]
else:
gg = gg[gg[filter[i]] == value[i]]
gg = gg.groupby(group).agg({'TOTAL_WORKERS':'sum'})
gg = gg.sort_values(by='TOTAL_WORKERS',ascending=False)
gg = gg.rename(columns={'TOTAL_WORKERS':label})
if numeric == False:
gg[label] = gg.apply(lambda x: "{:,.0f}".format(x[label]), axis=1)
return gg
def group_lca1518(nn, group, filter, value, inflag=False, numeric=False):
gg = group_lca(dd18, '2018', group, filter, value, inflag, numeric)
gg17 = group_lca(dd17, '2017', group, filter, value, inflag, numeric)
gg = gg17.join(gg, how='right')
gg16 = group_lca(dd16, '2016', group, filter, value, inflag, numeric)
gg = gg16.join(gg, how='right')
gg15 = group_lca(dd15, '2015', group, filter, value, inflag, numeric)
gg = gg15.join(gg, how='right')
gg = gg[0:nn]
return(gg)
dd15 = load_lca(2015)
dd16 = load_lca(2016)
dd17 = load_lca(2017)
dd18 = load_lca(2018)
ggstat = group_lca1518(10, ['CASE_STATUS'], None, None)
print(ggstat)
As can be seen, it only took a second or two to load each of the CSV files. The call to group_lca1518 then shows that request for over a million workers were made in each of 2015 through 2017. A bit less than a million workers have been requested so far in 2018. According to the page where the 2018 data was obtained, that file covers determinations issued between October 1, 2017 through March 31, 2018. Hence, there will an additional 6 months of determinations for 2018. In any case, the output shows that the great majority of these requests have the status of CERTIFIED.
The "prevailing wage" is the minimum salary that an H-1B worker can be paid and is determined by guidance set forth in this document. It lists four skill levels, described briefly as follows:
Level I (entry) wage rates are assigned to job offers for beginning level employees who have only a basic understanding of the occupation.
Level II (qualified) wage rates are assigned to job offers for qualified employees who have attained, either through education or experience, a good understanding of the occupation.
Level III (experienced) wage rates are assigned to job offers for experienced employees who have a sound understanding of the occupation and have attained, either through education or experience, special skills or knowledge.
Level IV (fully competent) wage rates are assigned to job offers for competent employees who have sufficient experience in the occupation to plan and conduct work requiring judgment and the independent evaluation, selection, modification, and application of standard procedures and techniques.
A Computerworld article describes how these are used to determing the prevailing wage as follows:
Congress set four wage level tiers for visa workers. Level 1 is intended for entry-level and is at the 17th percentile. Level 3 is the occupational mean, or the 50th percentile. Level 4 is at the 66th percentile.
From the pattern, I assume that Level 2 is at the 33rd percentile. Hence, Level 3 should set the prevailing wage at the median wage with Levels 1 and 2 setting it to less and Level 4 setting it to more. The following call to group_lca1518 shows the number of workers requested in 2015 through 2018, by wage level. The last two arguments of ['CASE_STATUS'] and ['CERTIFIED'] cause the code to only count certified requests.
gglvl = group_lca1518(10, ['PW_WAGE_LEVEL'], ['CASE_STATUS'], ['CERTIFIED'])
print(gglvl)
As can be seen, 2015 and 2017 had the most workers requested at Level I, slightly above the number requested at Level II. Well behind those were Level III and Level IV. For 2016, the LCA disclosure data did not contain the wage level. For 2018, the current order of the requests are Level II, Level I, Level III, and Level IV. This of course could change since there will be an additional six months of determinations. However, it may also be due to an increased scrutiny of wage levels described in an article titled United States: Increased Scrutiny Over H-1B Visas: Beware Of Level 1 Wages!. It states:
The recent USCIS memo concludes a Level 1 (entry level) designation for a position, covered under the Computer Programmer occupation classification, would not qualify as a specialty occupation position. As a result, this combination (i.e., Level 1 and the Computer Programmer SOC code) would probably be one to stay away from if trying to show position is a specialty occupation. This is consistent with Senator Grassley's bill and messages to establish more of a merit based system focusing on the best and the brightest over those that are more entry level (i.e., those only deserving of a level 1 wage).
The following code again looks at the wage levels but also includes the setting of FULL_TIME_POSITION in the grouping.
gglvlft = group_lca1518(10, ['PW_WAGE_LEVEL','FULL_TIME_POSITION'], ['CASE_STATUS'], ['CERTIFIED'])
print(gglvlft)
As can be seen, the great majority of the requests are for full-time positions and the ordering of wage levels remains pretty much the same for full-time and part-time positions.
The following call to group_lca1518 lists the 20 employers who requested the most H-1B workers in 2018.
ggemp = group_lca1518(20, ['EMPLOYER_NAME'], ['CASE_STATUS'], ['CERTIFIED'])
print(ggemp)
As can be seen, the employers 'AMAZON.COM SERVICES, INC.' and 'COGNIZANT TECHNOLOGY SOLUTIONS US CORP' have NaN for one or years prior to 2018. The latter employer appears to simply be another name for 'COGNIZANT TECHNOLOGY SOLUTIONS U.S. CORPORATION', the fourth employer in the list. The list does not contain another name for 'AMAZON.COM SERVICES, INC.' but those can be searched for via the following command:
ggamaz = group_lca1518(20, ['EMPLOYER_NAME'], ['EMPLOYER_NAME','CASE_STATUS'], ['AMAZON','CERTIFIED$'], True)
print(ggamaz)
Most of these appear to be alternate names for AMAZON. This makes clear the fact that there are no apparent rules for spell-checking or limiting the names for a company. Hence, the list of the top 20 requestors of H-1B visas should be taken to be an estimate, at least until further cleaning of the data can be done. In any event, another striking thing about the the top 20 requestors is the huge increase in the number of H-1B workers requested by 'ERNST & YOUNG U.S. LLP'. The following code searches for other spellings:
ggey = group_lca1518(20, ['EMPLOYER_NAME'], ['EMPLOYER_NAME','CASE_STATUS'], ['ERNST ','CERTIFIED$'], True)
print(ggey)
ggey = group_lca(dd15, '2015', ['EMPLOYER_NAME'], ['EMPLOYER_NAME','CASE_STATUS'], ['ERNST ','CERTIFIED$'], True)
print(ggey)
ggey = group_lca(dd16, '2016', ['EMPLOYER_NAME'], ['EMPLOYER_NAME','CASE_STATUS'], ['ERNST ','CERTIFIED$'], True)
print(ggey)
ggey = group_lca(dd17, '2017', ['EMPLOYER_NAME'], ['EMPLOYER_NAME','CASE_STATUS'], ['ERNST ','CERTIFIED$'], True)
print(ggey)
As can be seen, there's many alternate names but none requesting a large number of workers. The following plot shows the change in H-1B workers requested for the top 6 requestors in 2018. This ignores any of the alternate names above or for any of the other top 6 requestors:
import pandas as pd
import matplotlib.pyplot as plt
gg6 = group_lca1518(6, ['EMPLOYER_NAME'], ['CASE_STATUS'], ['CERTIFIED'], False, True)
gg6T = gg6.T
gg6T.columns = ['ERNST & YOUNG','DELOITTE','COGNIZANT','HCL AMERICA','APPLE','QUALCOMM']
print(gg6T)
fig, ax = plt.subplots(1, 1, figsize=(12, 8))
ax.plot(gg6T)
ax.set_title('CERTIFIED H-1B WORKERS REQUESTED FOR TOP 6 REQUESTORS IN 2018')
ax.set_xlabel('Year')
ax.set_ylabel('Certified H-1B Workers Requested')
ax.grid(zorder=0)
ax.legend(gg6T.columns)
The plot shows the huge increase in H-1B workers requested by 'ERNST & YOUNG U.S. LLP' in 2018. It does show a decrease in workers requested by 'DELOITTE CONSULTING LLP', another of the "Big Four accounting firms". However, the decrease in 2018 may be due in part to the fact that the data only includes the first half of fiscal year 2018. In any event, the reason for the large increase in requests for Ernst & Young is unclear. In order to get a more detailed view of the increase, the following code shows the values of SOC_NAME for the requested workers. SOC_NAME is the Occupational Name of the job as classified by the Standard Occupational Classification (SOC) system.
ggsoc = group_lca1518(20, ['SOC_NAME'],['CASE_STATUS','EMPLOYER_NAME'],['CERTIFIED','ERNST & YOUNG U.S. LLP'])
print(ggsoc)
As can be seen, the additional H-1B workers requested are chiefly for accounting, computer, and financial occupations. It would seem possible that the increase may have something to do with the 2017 Tax Cuts and Jobs Act. Any connection may become more clear with the information about requests during the rest of 2018, the number of requests that are actually approved, and any reported changes in the current work force of Ernst and Young. In any event, the large increase in requests would seem to merit additional study.