The following Python code does some preliminary exploration of campaign finance data from the Federal Election Committee (FEC) website. It requires that the following three files be downloaded and unzipped and placed in the specified directories (relative to the notebook)
Candidate master for 2017-2018 - unzip and place cn.txt in directory fec/cm18/
Committee master for 2017-2018 - unzip and place cm.txt in directory fec/cn18/
Contributions by individuals for 2017-2018 - unzip and place itcont.txt in directory fec/indiv18/
The individual contribution table is very large. It will be loaded into object ii and only reloaded if ii doesn't exist. The following code will create ii if it doesn't exist and output the first row to show all available fields:
import pandas as pd
pd.set_option('display.width', 220)
pd.set_option('max_rows', 200)
year = 2018
try:
ii
except NameError:
yr = year % 100
filepath = "fec/indiv" + str(yr) + "/itcont.txt"
hdr = pd.read_csv("https://www.fec.gov/files/bulk-downloads/data_dictionaries/indiv_header_file.csv", skiprows=0)
print("READING "+filepath)
ii = pd.read_table(filepath, sep='|', header=None, names=hdr)
print(ii.shape)
hdr = pd.read_csv("https://www.fec.gov/data/advanced/files/bulk-downloads/data_dictionaries/cm_header_file.csv", skiprows=0)
cm = pd.read_table("fec/cm" + str(yr) + "/cm.txt", sep='|', header=None, names=hdr.columns)
hdr = pd.read_csv("https://www.fec.gov/data/advanced/files/bulk-downloads/data_dictionaries/cn_header_file.csv", skiprows=0)
cn = pd.read_table("fec/cn" + str(yr) + "/cn.txt", sep='|', header=None, names=hdr.columns)
ii=ii.merge(cm, left_on='CMTE_ID', right_on='CMTE_ID')
ii=ii.merge(cn, left_on='CAND_ID', right_on='CAND_ID', how='left')
ii['TRANSACTION_DT'] = ii['TRANSACTION_DT'].fillna(0).astype(int)
ii['TRANSACTION_DT'] = pd.to_datetime(ii['TRANSACTION_DT'],format='%m%d%Y',errors='coerce')
ii.rename(columns={'TRANSACTION_DT':'DATE', 'TRANSACTION_AMT':'AMT', 'CAND_PTY_AFFILIATION':'PARTY'}, inplace=True)
#ii['AMOUNT'] = ii['AMT'].apply(lambda x: '{:,}'.format(x))
print(ii.iloc[0,:])
The following code aggregates the individual contributions by the name, city, and state of the contributor to get an initial estimate of the total contributions made by each contributor. This is only an initial estimate because, if a single contributor should vary or misspell any of these items, those contributions will be grouped separately. For example, note that items 3, 10, 37, 40, and 43 all appear to be for the same contributor, Richard Uihlein of Lake Forest, IL. The names listed are UIHLEIN, RICHARD; UIHLEIN, RICHARD E.; UIHLEIN, RICHARD E.; UIHLEIN, RICHARD and UIHLEIN, RICHARD E. MR. Of these, only the first and fourth and the second and third appear to be the same. However, the first (item 3) and fourth (item 40) differ in that the first state is IL and the second state is IN. Then, the second (item 10) and third (item 37) differ because the latter item misspells the city as LAKE FORST. In any event, the final total following the table shows that over $1.9 billion has already been contributed to federal candidates, PACs, and other committees.
jj = ii.groupby(['NAME','CITY','STATE'])['AMT'].agg(['sum'])
jj = jj.reset_index(level=['NAME','CITY','STATE'])
jj.columns = ['NAME','CITY','STATE','AMT']
jj['AMOUNT'] = jj['AMT'].apply(lambda x: '{:,}'.format(x))
jj = jj.sort_values(by='AMT',ascending=False)
jj.index = range(1,len(jj)+1)
print(jj[['NAME','CITY','STATE','AMOUNT']].head(100))
print('\nTOTAL = ' + '{:,}'.format(sum(jj['AMT'])))
jj = ii[pd.notnull(ii['CAND_ID'])]
jj = jj.groupby(['CMTE_NM','CAND_NAME','PARTY'])['AMT'].agg(['sum'])
jj = jj.reset_index(level=['CMTE_NM','CAND_NAME','PARTY'])
jj.columns = ['CMTE_NM','CAND_NAME','PARTY','AMT']
jj['AMOUNT'] = jj['AMT'].apply(lambda x: '{:,}'.format(x))
jj = jj.sort_values(by='AMT',ascending=False)
jj.index = range(1,len(jj)+1)
print(jj[['CMTE_NM','CAND_NAME','PARTY','AMOUNT']].head(100))
print('\nTOTAL = ' + '{:,}'.format(sum(jj['AMT'])))
jj = ii[pd.isnull(ii['CAND_ID'])]
jj = jj.groupby(['CMTE_NM'])['AMT'].agg(['sum'])
jj = jj.reset_index(level=['CMTE_NM'])
jj.columns = ['CMTE_NM','AMT']
jj['AMOUNT'] = jj['AMT'].apply(lambda x: '{:,}'.format(x))
jj = jj.sort_values(by='AMT',ascending=False)
jj.index = range(1,len(jj)+1)
print(jj[['CMTE_NM','AMOUNT']].head(100))
print('\nTOTAL = ' + '{:,}'.format(sum(jj['AMT'])))
jj = ii.groupby(['CAND_NAME','PARTY'])['AMT'].agg(['sum'])
jj = jj.reset_index(level=['CAND_NAME','PARTY'])
jj.columns = ['CAND_NAME','PARTY','AMT']
jj['AMOUNT'] = jj['AMT'].apply(lambda x: '{:,}'.format(x))
jj = jj.sort_values(by='AMT',ascending=False)
jj.index = range(1,len(jj)+1)
print(jj[['CAND_NAME','PARTY','AMOUNT']].head(100))
print('\nTOTAL = ' + '{:,}'.format(sum(jj['AMT'])))
First, list all contributions to committees connected to specific candidates. Then, list all contributions to committees not connected to specific candidates.
def getIndividual2Candidate(name, city, state):
jj = ii[ii['NAME'].str.contains(name, na=False)]
if pd.notnull(city): jj = jj[jj['CITY'].str.contains(city, na=False)]
if pd.notnull(state): jj = jj[jj['STATE'].str.contains(state, na=False)]
jj = jj[pd.notnull(jj['CAND_ID'])]
jj['AMOUNT'] = jj['AMT'].apply(lambda x: '{:,}'.format(x))
jj = jj.sort_values(by='DATE',ascending=True)
jj.index = range(1,len(jj)+1)
kk = jj[['NAME','DATE','AMOUNT','CITY','STATE','CAND_NAME','PARTY']]
print(kk.head(100))
print('\nSUBTOTAL = ' + '{:,}'.format(sum(jj['AMT'])))
def getIndividual2Committee(name, city, state):
jj = ii[ii['NAME'].str.contains(name, na=False)]
if pd.notnull(city): jj = jj[jj['CITY'].str.contains(city, na=False)]
if pd.notnull(state): jj = jj[jj['STATE'].str.contains(state, na=False)]
jj = jj[pd.isnull(jj['CAND_ID'])]
jj['AMOUNT'] = jj['AMT'].apply(lambda x: '{:,}'.format(x))
jj = jj.sort_values(by='DATE',ascending=True)
jj.index = range(1,len(jj)+1)
kk = jj[['NAME','DATE','AMOUNT','CITY','STATE','CMTE_NM']]
print(kk.head(100))
print('\nSUBTOTAL = ' + '{:,}'.format(sum(jj['AMT'])))
def getIndividual2All(name, city, state):
getIndividual2Candidate(name, city, state)
getIndividual2Committee(name, city, state)
#getIndividual2All('UIHLEIN, RICHARD')
getIndividual2All('KOCH,', None, 'KS')
def getIndividual2Committee1(cmte_nm):
jj = ii[ii['CMTE_NM'].str.contains(cmte_nm, na=False)]
jj['AMOUNT'] = jj['AMT'].apply(lambda x: '{:,}'.format(x))
jj = jj.sort_values(by='DATE',ascending=True)
jj.index = range(1,len(jj)+1)
kk = jj[['NAME','DATE','AMOUNT','CITY','STATE','CMTE_NM']]
print(kk.head(100))
print('\nSUBTOTAL = ' + '{:,}'.format(sum(jj['AMT'])))
getIndividual2Committee1('CROSSROAD')
def getIndividual2Candidate1(cand_name):
jj = ii[ii['CAND_NAME'].str.contains(cand_name, na=False)]
jj['AMOUNT'] = jj['AMT'].apply(lambda x: '{:,}'.format(x))
jj = jj.sort_values(by='DATE',ascending=True)
jj.index = range(1,len(jj)+1)
kk = jj[['NAME','DATE','AMOUNT','CITY','STATE','CAND_NAME','PARTY']]
print(kk.head(100))
print('\nSUBTOTAL = ' + '{:,}'.format(sum(jj['AMT'])))
getIndividual2Candidate1('TRUMP')