Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save grahamanderson/e9cb7be0a9c78e3fb2549e783b36fa30 to your computer and use it in GitHub Desktop.

Select an option

Save grahamanderson/e9cb7be0a9c78e3fb2549e783b36fa30 to your computer and use it in GitHub Desktop.
Notebook (1 of 2) for Electology's 2016 Presidential Election 9x9 Pairwise Analysis
{
"cells": [
{
"metadata": {},
"cell_type": "markdown",
"source": "# 2016 Presidential Matchup \nThis notebook, for my client, [Electology](https://electology.org), analyzes the 2016 presidential election though proprietary survey data conducted by GfK.\n- For the 9x9 Pairwise comparison, I created a (perhaps) new kind of chart?! I call it the BxR Plot. [Bxr](http://www.urbandictionary.com/define.php?term=bxr) is nerdy slang for a rumble! Loosely based––of course.\n- Post the BxR plot code, I wanted to find factors that differentiated each nominee. I'm just beginning this one. It's kind of what a Random forest feature importances chart would be––minus the actual prediction model. \n- The other Electology notebook containing Seaborn FacetGrid plots is [here](https://gist.github.com/grahamanderson/cc51a81c2412d796acf3456c75ff5518)\n- MRP models coming next!"
},
{
"metadata": {
"trusted": true,
"collapsed": true
},
"cell_type": "code",
"source": "%config InlineBackend.figure_formats = {'png', 'retina'}\nimport matplotlib.pyplot as plt\nimport matplotlib.patches as patches\nfrom matplotlib.offsetbox import AnnotationBbox, OffsetImage\nfrom matplotlib._png import read_png\nimport matplotlib.gridspec as gridspec\n%matplotlib inline\n\nimport pandas as pd\nimport numpy as np\n\n# from IPython.core.display import display, HTML\n#from matplotlib.offsetbox import TextArea, DrawingArea, OffsetImage, AnnotationBbox, AnchoredOffsetbox",
"execution_count": 12,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### BxR Plot function to run in advance"
},
{
"metadata": {
"trusted": true,
"collapsed": false
},
"cell_type": "code",
"source": "# Make a BXR graph by filtering the dataframe\ndef factor_bxr(filter=df_long.index != -999): #-999 provides all the rows--no filtering\n nominees_df = df_long.ix[:,171:180][filter]\n (nominees_df.rename(columns=lambda x: x.replace(\"'\",'')\n .replace(\",\",'').split(\" \")[1]\n .strip().title(), inplace=True)) \n nominees_df.fillna(0, inplace=True)\n \n # Prepare data for honest assessment...and make the Pairwise BXR Graph\n data= pairwise_bxr_data(nominees_df)\n pairwise_bxr_graph(data)\n\n# Make a Text File of Honest Assessment by a Column Value_Counts\ndef honest_assessment_by_factor(col, df):\n brackets = df[col].value_counts()\n #counts = df[col].value_counts().values.tolist()\n \n mystr = col\n \n for key, value in brackets.items():\n temp_df = df[df[col]==key]\n nominees_df = temp_df.ix[:,171:180]\n\n # Clean up the nominee column\n (nominees_df.rename(columns=lambda x: x.replace(\"'\",'')\n .replace(\",\",'').split(\" \")[1].strip().title(), inplace=True)) \n nominees_df.fillna(0, inplace=True)\n\n # Do the Pairwise Scoring\n data= pairwise_bxr_data(nominees_df)\n \n # Make the Rank DF and format it for printing\n rank_table = (make_rank_df(data)\n .ix[:,['incumbent', 'win_loss']]\n .to_string(index=False))\n \n # Display\n mystr = mystr\n mystr = mystr + \"\\n\\nFactor: {0}\\nData Count: {1}\\n{2}\".format(key,value,rank_table )\n mystr = mystr +\"\\n--------------------------------------\\n\"\n return mystr\n \n \n# Scoring Logic: Comparison of Sums for the Incumbent and Challenger\ndef pairwise_scoring(row,incumbent,challenger):\n df = row.copy() # for safety\n if df[incumbent] > df[challenger]:\n df[incumbent] = 1\n df[challenger] = 0\n elif df[incumbent] < df[challenger]:\n df[incumbent] = 0\n df[challenger] = 1\n else:\n df[incumbent] = 0\n df[challenger] = 0 \n return df\n\n# The formatted data that Matplotlib uses to make the BxR Plot\ndef pairwise_bxr_data(df):\n honest_array = []\n\n # Iterate over Incumbants\n for incumbent in range(0, len(df.columns)):\n # Iterate over Challengers\n for challenger in range(0, len(df.columns)):\n if incumbent != challenger: \n # Make a dataframe out of the Incumbent and Challenger \n match_df = df.ix[:, np.r_[incumbent, challenger]] \n\n # With the pairwise function Convert the columns to 1 or 0: win=1, loss=0, tie=0\n match_df = match_df.apply(lambda row: pairwise_scoring(row,0,1 ), axis=1)\n\n # Sum of Total for both candidates\n denom = match_df.ix[:,0].sum() + match_df.ix[:,1].sum() \n\n # Add to the array if the result is actually a number\n if match_df.ix[:,0].sum() != 0:\n honest_array.append(\n [\n match_df.columns[0],\n match_df.ix[:,0].sum() / denom, \n match_df.columns[1],\n match_df.ix[:,1].sum() / denom\n ]\n )\n else:\n # Account for Sanders vs Sanders...\n # which will be replaced graphically by an star graphic\n match_df = df.ix[:, incumbent].to_frame() # make the series a dataframe so targeting is consistent\n honest_array.append(\n [\n match_df.columns[0],\n .50, \n match_df.columns[0],\n .50\n ]\n )\n\n # Put array into Final Dataframe \n honest_df = pd.DataFrame(data=honest_array, columns=['incumbent', 'iscore','challenger', 'cscore'])\n #honest_df\n #honest_df.ix[honest_df.incumbent=='Clinton',:]\n\n # Add a temporary Rank Column to add number of wins (for sorting)\n for i in honest_df.incumbent.unique(): \n honest_df.ix[honest_df.incumbent==i, 'rank'] = len(honest_df[(honest_df.incumbent==i) \n & (honest_df.iscore>honest_df.cscore)])\n \n # add a challenger ranking to for easier sorting\n for i in honest_df.challenger.unique(): \n honest_df.ix[honest_df.challenger==i, 'c_rank'] = len(honest_df[(honest_df.challenger==i) \n & (honest_df.iscore<honest_df.cscore)])\n\n\n # Sort Incumbents by Incumbent Rank and Challenger Rank \n honest_df.sort_values(['rank', 'c_rank'], ascending=[False, False], inplace=True)\n honest_df.reset_index(inplace=True)\n return honest_df\n\n\n\n# Make the Dataframe for the Win-Loss and Picture Columns\ndef make_rank_df(df):\n\n rank_df = (\n df.sort_values(by='rank',ascending=False)\n .drop_duplicates(subset='incumbent')\n .reset_index()\n .ix[:,['incumbent','rank']]\n )\n\n # Add a win-loss text column (8-1) and PNG column\n rank_df['win_loss']= rank_df['rank'].apply(lambda x: \"{0:.0f}-{1:.0f}\"\n .format(x, rank_df['rank'].max()-x))\n rank_df ['image'] = rank_df['incumbent'].apply(lambda x: \"{0}.png\".format(x.lower()))\n return rank_df",
"execution_count": 38,
"outputs": []
},
{
"metadata": {
"trusted": true,
"collapsed": true
},
"cell_type": "code",
"source": "# Function that makes the Pairwise BXR graph \ndef pairwise_bxr_graph(df):\n side = 10 # side length of box\n star = False\n percent_size =28\n nominee_size =20\n win_col_size = 70\n fontname = 'Univers'\n incumbent_win_box_color = '#'\n challenger_win_box_color = '#'\n nrows= 10 # 0-8 for 9 rows\n ncols = 10 # 0=Picture Column, 1-9 = Pairwise Matrix, 10 = Win-Loss Column\n\n # GridSpec is 1 index not 0!\n plt.figure(figsize=(34, 32))\n gs = gridspec.GridSpec(nrows=nrows, ncols=ncols+1, wspace=0, hspace=0,\n width_ratios= [1,1,1,1,1,1,1,1,1,1,1.3],\n height_ratios = [.3,1,1,1,1,1,1,1,1,1,1]\n ) \n\n # Dictionary to store all the plots\n axes = {}\n\n\n ######################\n # Win-Loss Columns and Picture Datafarame\n rank_df = make_rank_df(df)\n\n\n ######################\n # Incumbent Row (9x1)\n\n for i in range(0,1):\n for j in range (0, ncols+1):\n axes[i] = plt.subplot(gs[i,j:j+1])\n axes[i].axis(\"off\")\n\n axes[i].xaxis.set_ticks_position('none') \n axes[i].yaxis.set_ticks_position('none') \n axes[i].xaxis.set_ticklabels([])\n axes[i].yaxis.set_ticklabels([]) \n axes[i].set_xlim(0,9); axes[i].set_ylim(0,9) # forces the axes to draw :\\\n\n # Draw Blue Boxes for the Header Row\n rectangle = plt.Rectangle((0, 0), side, side, fc='#06477D', \n linewidth=1, color='#979797' )\n axes[i].add_patch(rectangle)\n\n\n ##################\n # Annotate: Add Incumbent Name, Win/Loss Text, or nothing (if 0)\n\n if j==0:\n # Skip--Don't Annotate as you are on the picture column\n pass\n\n elif j < ncols: \n axes[i].annotate( rank_df.ix[j-1, 'incumbent'].lower(), \n xy=(4.5, 4.5), xycoords='data',\n ha='center', va='center',\n color='w', size= nominee_size, fontname=fontname, \n fontweight='regular', family='sans-serif')\n #print(rank_df.ix[j-1, 'incumbent'].lower())\n\n # Add Win-Loss Columns\n else:\n axes[i].annotate( 'Win/Loss', \n xy=(4.5, 4.5), xycoords='data',\n ha='center', va='center',\n color='w', size= 30 , fontname=fontname, \n fontweight='regular', family='sans-serif')\n\n\n\n ######################\n # Picture Column for Incumbents 1x9\n\n for i in range(1,nrows):\n axes[i] = plt.subplot(gs[i,0])\n axes[i].xaxis.set_ticks_position('none') \n axes[i].yaxis.set_ticks_position('none') \n axes[i].xaxis.set_ticklabels([])\n axes[i].yaxis.set_ticklabels([]) \n\n # Draw border around the image\n for spine in axes[i].spines.values():\n spine.set_edgecolor('#F0EEEE')\n spine.set_linewidth(1)\n\n # Image of the Incumbent\n image = plt.imread(\"./images/{0}\".format(rank_df.ix[i-1,'image']))\n thisimage = np.copy(image)\n axes[i].imshow(thisimage, aspect='auto', extent=None)\n\n\n\n\n ##################\n # Pairwise Comparison Columns (9x9)\n\n\n # the for loop skips picture column (at 0) and win-loss column (at 10)\n for i in range(1,nrows):\n for j in range (1, ncols):\n axes[i] = plt.subplot(gs[i,j])\n\n\n # Make a Cell\n axes[i].set_xlim(0,9); axes[i].set_ylim(0,9) # forces the axes to draw :\\\n axes[i].axis(\"off\")\n axes[i].xaxis.set_ticks_position('none') \n axes[i].yaxis.set_ticks_position('none') \n axes[i].xaxis.set_ticklabels([])\n axes[i].yaxis.set_ticklabels([]) \n\n\n ##############\n # What's the Current Dataframe Row? Offset the Grid Row and Col\n\n # account for the 0 index of pandas and 1 index of Gridspec\n df_cur_row = (i-1) * (ncols-1) + j-1\n\n\n # print(\"row: {0}, col: {1}, grid_cell: {2}, \n # dataframe_row: {3}, nominee {4}, challenger {5}\".format(i,j, \n # grid_cell, df_cur_row, honest_df.ix[df_cur_row ,'incumbent'], \n # honest_df.ix[df_cur_row ,'challenger']))\n\n ##############\n # Incumbent Won\n if df.ix[df_cur_row ,'iscore'] > df.ix[df_cur_row, 'cscore']: \n #print(\"I won: {0}\".format(triangle_p))\n star=False\n rectangle = plt.Rectangle((0, 0), side, side, \n fc='#58BB58', linewidth=1, color='#979797' )\n axes[i].add_patch(rectangle)\n\n triangle_p = side - ((side**2 * df.ix[df_cur_row, 'cscore'] )/side)*2\n points = [[triangle_p, 0], [side-1, 0], [side-1, side-1]]\n polygon = plt.Polygon(points, fc='w', linewidth=0, color='#979797')\n axes[i].add_patch(polygon)\n\n incumbent_text_color = '#FFFFFF'\n challenger_text_color = '#404040'\n\n ##############\n # Challenger Won\n\n elif df.ix[df_cur_row ,'iscore'] < df.ix[df_cur_row, 'cscore']:\n star=False \n rectangle = plt.Rectangle((0, 0), side, side, \n fc='#76A176', linewidth=1, color='#979797' )\n axes[i].add_patch(rectangle)\n\n triangle_p = side - ((side**2 * df.ix[df_cur_row,'iscore'] )/side)*2\n points = [[0, side-1], [0, triangle_p], [side-1,side-1] ]\n polygon = plt.Polygon(points, fc='w', linewidth=0, color='#979797')\n axes[i].add_patch(polygon)\n\n incumbent_text_color = '#404040'\n challenger_text_color = '#FFFFFF'\n\n ##############\n # Put Star and Empty Cell Here (Challenger and Incumbent is same person)\n else:\n star=True\n rectangle = plt.Rectangle((0, 0), side, side, \n fc='#FFFFFF', linewidth=1, color='#979797' )\n axes[i].add_patch(rectangle)\n\n # Photoshop much better at scaling than MAtplotlib :\\\n arr_star = read_png('./images/small_star.png')\n imagebox = OffsetImage(arr_star, zoom=1, origin='upper', interpolation='sinc')\n xy = [4.5, 4.5] # coordinates to position this image\n\n ab = AnnotationBbox(imagebox, xy=xy, xybox=(-1,0), \n xycoords='data',boxcoords=\"offset points\",\n frameon=False) \n axes[i].add_artist(ab)\n\n\n ##############\n # Box Outline (looks better visually--adds separation b/w cells)\n box_outline = plt.Rectangle((0, 0), side-1, side-1, \n fill=False, lw=1, color='#F0EEEE', zorder=1 )\n axes[i].add_patch(box_outline)\n\n\n\n ##############\n # Add Incumbent Challenger Text if this is not an Empty Cell with a Star Graphic\n\n if star==False:\n # Incumbent Percentage\n axes[i].annotate('{:0.0%}'.format(df.ix[df_cur_row ,'iscore']), \n xy=(.5, 8.5), xycoords='data',\n horizontalalignment='left', verticalalignment='top',\n color=incumbent_text_color, size= percent_size, \n fontname=fontname, fontweight='medium', family='sans-serif'\n )\n # Incumbent Name \n axes[i].annotate(df.ix[df_cur_row ,'incumbent'], \n xy=(.5, 7.0), xycoords='data',\n horizontalalignment='left', verticalalignment='top',\n color=incumbent_text_color, size= nominee_size, \n fontname=fontname, fontweight='medium', family='sans-serif'\n )\n\n\n # Challenger Percentage\n axes[i].annotate('{:0.0%}'.format(df.ix[df_cur_row ,'cscore']), \n xy=(8.5, 1.5), xycoords='data',\n horizontalalignment='right', verticalalignment='bottom',\n color=challenger_text_color, size= percent_size, \n fontname=fontname, fontweight='medium', family='sans-serif'\n )\n # Challenger Name\n axes[i].annotate(df.ix[df_cur_row ,'challenger'],\n xy=(8.5, .5), xycoords='data',\n horizontalalignment='right', verticalalignment='bottom',\n color=challenger_text_color, size= nominee_size, \n fontname=fontname, fontweight='medium', family='sans-serif'\n )\n ##############\n # Win-Loss Column\n\n for i in range(1,nrows):\n axes[i] = plt.subplot(gs[i,10])\n axes[i].set_xlim(0,9); axes[i].set_ylim(0,9) # forces the axes to draw :\\\n axes[i].axis(\"off\")\n axes[i].xaxis.set_ticks_position('none') \n axes[i].yaxis.set_ticks_position('none') \n axes[i].xaxis.set_ticklabels([])\n axes[i].yaxis.set_ticklabels([]) \n\n if i==1:\n fc='#58BB58'; text_color= '#FFFFFF'\n else:\n fc = '#FFFFFF'; text_color = '#404040'\n\n rectangle = plt.Rectangle((0, 0), side*2, side, fc=fc, linewidth=1, color='#979797' )\n axes[i].add_patch(rectangle)\n\n # Add Win-Loss Text (remeber the 0 index of Pandas and 1 index of Gridspec)\n axes[i].annotate(rank_df.ix[i-1,'win_loss'], \n xy=(4.5, 4.5), xycoords='data',\n horizontalalignment='center', verticalalignment='center',\n color=text_color, size= 62, fontname=fontname, \n fontweight='light', family='sans-serif'\n )\n\n # Add Box Outline (looks better visually--adds separation b/w cells)\n box_outline = plt.Rectangle((0, 0), side-1, side-1, fill=False, \n lw=1, color='#F0EEEE', zorder=1 ) \n axes[i].add_patch(box_outline) ",
"execution_count": 34,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n### BxR Plot––Honest Assessment for Everyone (Long Set)"
},
{
"metadata": {
"trusted": true,
"collapsed": false
},
"cell_type": "code",
"source": "df_long = pd.read_csv('./csv/df_long_raw.csv', index_col=0)\n\n# Slice and Clean up the df for Honest Assessment\nnominees_df = df_long.ix[:,171:180]\n(nominees_df.rename(columns=lambda x: x.replace(\"'\",'')\n .replace(\",\",'').split(\" \")[1]\n .strip().title(), inplace=True)) \nnominees_df.fillna(0, inplace=True)\n\n# Prepare data for honest assessment...and make the Pairwise BXR Graph\ndata= pairwise_bxr_data(nominees_df)\npairwise_bxr_graph(data)",
"execution_count": 36,
"outputs": [
{
"output_type": "display_data",
"data": {
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment