Last active
March 18, 2017 21:54
-
-
Save grahamanderson/e9cb7be0a9c78e3fb2549e783b36fa30 to your computer and use it in GitHub Desktop.
Notebook (1 of 2) for Electology's 2016 Presidential Election 9x9 Pairwise Analysis
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "cells": [ | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "# 2016 Presidential Matchup \nThis notebook, for my client, [Electology](https://electology.org), analyzes the 2016 presidential election though proprietary survey data conducted by GfK.\n- For the 9x9 Pairwise comparison, I created a (perhaps) new kind of chart?! I call it the BxR Plot. [Bxr](http://www.urbandictionary.com/define.php?term=bxr) is nerdy slang for a rumble! Loosely based––of course.\n- Post the BxR plot code, I wanted to find factors that differentiated each nominee. I'm just beginning this one. It's kind of what a Random forest feature importances chart would be––minus the actual prediction model. \n- The other Electology notebook containing Seaborn FacetGrid plots is [here](https://gist.github.com/grahamanderson/cc51a81c2412d796acf3456c75ff5518)\n- MRP models coming next!" | |
| }, | |
| { | |
| "metadata": { | |
| "trusted": true, | |
| "collapsed": true | |
| }, | |
| "cell_type": "code", | |
| "source": "%config InlineBackend.figure_formats = {'png', 'retina'}\nimport matplotlib.pyplot as plt\nimport matplotlib.patches as patches\nfrom matplotlib.offsetbox import AnnotationBbox, OffsetImage\nfrom matplotlib._png import read_png\nimport matplotlib.gridspec as gridspec\n%matplotlib inline\n\nimport pandas as pd\nimport numpy as np\n\n# from IPython.core.display import display, HTML\n#from matplotlib.offsetbox import TextArea, DrawingArea, OffsetImage, AnnotationBbox, AnchoredOffsetbox", | |
| "execution_count": 12, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "### BxR Plot function to run in advance" | |
| }, | |
| { | |
| "metadata": { | |
| "trusted": true, | |
| "collapsed": false | |
| }, | |
| "cell_type": "code", | |
| "source": "# Make a BXR graph by filtering the dataframe\ndef factor_bxr(filter=df_long.index != -999): #-999 provides all the rows--no filtering\n nominees_df = df_long.ix[:,171:180][filter]\n (nominees_df.rename(columns=lambda x: x.replace(\"'\",'')\n .replace(\",\",'').split(\" \")[1]\n .strip().title(), inplace=True)) \n nominees_df.fillna(0, inplace=True)\n \n # Prepare data for honest assessment...and make the Pairwise BXR Graph\n data= pairwise_bxr_data(nominees_df)\n pairwise_bxr_graph(data)\n\n# Make a Text File of Honest Assessment by a Column Value_Counts\ndef honest_assessment_by_factor(col, df):\n brackets = df[col].value_counts()\n #counts = df[col].value_counts().values.tolist()\n \n mystr = col\n \n for key, value in brackets.items():\n temp_df = df[df[col]==key]\n nominees_df = temp_df.ix[:,171:180]\n\n # Clean up the nominee column\n (nominees_df.rename(columns=lambda x: x.replace(\"'\",'')\n .replace(\",\",'').split(\" \")[1].strip().title(), inplace=True)) \n nominees_df.fillna(0, inplace=True)\n\n # Do the Pairwise Scoring\n data= pairwise_bxr_data(nominees_df)\n \n # Make the Rank DF and format it for printing\n rank_table = (make_rank_df(data)\n .ix[:,['incumbent', 'win_loss']]\n .to_string(index=False))\n \n # Display\n mystr = mystr\n mystr = mystr + \"\\n\\nFactor: {0}\\nData Count: {1}\\n{2}\".format(key,value,rank_table )\n mystr = mystr +\"\\n--------------------------------------\\n\"\n return mystr\n \n \n# Scoring Logic: Comparison of Sums for the Incumbent and Challenger\ndef pairwise_scoring(row,incumbent,challenger):\n df = row.copy() # for safety\n if df[incumbent] > df[challenger]:\n df[incumbent] = 1\n df[challenger] = 0\n elif df[incumbent] < df[challenger]:\n df[incumbent] = 0\n df[challenger] = 1\n else:\n df[incumbent] = 0\n df[challenger] = 0 \n return df\n\n# The formatted data that Matplotlib uses to make the BxR Plot\ndef pairwise_bxr_data(df):\n honest_array = []\n\n # Iterate over Incumbants\n for incumbent in range(0, len(df.columns)):\n # Iterate over Challengers\n for challenger in range(0, len(df.columns)):\n if incumbent != challenger: \n # Make a dataframe out of the Incumbent and Challenger \n match_df = df.ix[:, np.r_[incumbent, challenger]] \n\n # With the pairwise function Convert the columns to 1 or 0: win=1, loss=0, tie=0\n match_df = match_df.apply(lambda row: pairwise_scoring(row,0,1 ), axis=1)\n\n # Sum of Total for both candidates\n denom = match_df.ix[:,0].sum() + match_df.ix[:,1].sum() \n\n # Add to the array if the result is actually a number\n if match_df.ix[:,0].sum() != 0:\n honest_array.append(\n [\n match_df.columns[0],\n match_df.ix[:,0].sum() / denom, \n match_df.columns[1],\n match_df.ix[:,1].sum() / denom\n ]\n )\n else:\n # Account for Sanders vs Sanders...\n # which will be replaced graphically by an star graphic\n match_df = df.ix[:, incumbent].to_frame() # make the series a dataframe so targeting is consistent\n honest_array.append(\n [\n match_df.columns[0],\n .50, \n match_df.columns[0],\n .50\n ]\n )\n\n # Put array into Final Dataframe \n honest_df = pd.DataFrame(data=honest_array, columns=['incumbent', 'iscore','challenger', 'cscore'])\n #honest_df\n #honest_df.ix[honest_df.incumbent=='Clinton',:]\n\n # Add a temporary Rank Column to add number of wins (for sorting)\n for i in honest_df.incumbent.unique(): \n honest_df.ix[honest_df.incumbent==i, 'rank'] = len(honest_df[(honest_df.incumbent==i) \n & (honest_df.iscore>honest_df.cscore)])\n \n # add a challenger ranking to for easier sorting\n for i in honest_df.challenger.unique(): \n honest_df.ix[honest_df.challenger==i, 'c_rank'] = len(honest_df[(honest_df.challenger==i) \n & (honest_df.iscore<honest_df.cscore)])\n\n\n # Sort Incumbents by Incumbent Rank and Challenger Rank \n honest_df.sort_values(['rank', 'c_rank'], ascending=[False, False], inplace=True)\n honest_df.reset_index(inplace=True)\n return honest_df\n\n\n\n# Make the Dataframe for the Win-Loss and Picture Columns\ndef make_rank_df(df):\n\n rank_df = (\n df.sort_values(by='rank',ascending=False)\n .drop_duplicates(subset='incumbent')\n .reset_index()\n .ix[:,['incumbent','rank']]\n )\n\n # Add a win-loss text column (8-1) and PNG column\n rank_df['win_loss']= rank_df['rank'].apply(lambda x: \"{0:.0f}-{1:.0f}\"\n .format(x, rank_df['rank'].max()-x))\n rank_df ['image'] = rank_df['incumbent'].apply(lambda x: \"{0}.png\".format(x.lower()))\n return rank_df", | |
| "execution_count": 38, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "trusted": true, | |
| "collapsed": true | |
| }, | |
| "cell_type": "code", | |
| "source": "# Function that makes the Pairwise BXR graph \ndef pairwise_bxr_graph(df):\n side = 10 # side length of box\n star = False\n percent_size =28\n nominee_size =20\n win_col_size = 70\n fontname = 'Univers'\n incumbent_win_box_color = '#'\n challenger_win_box_color = '#'\n nrows= 10 # 0-8 for 9 rows\n ncols = 10 # 0=Picture Column, 1-9 = Pairwise Matrix, 10 = Win-Loss Column\n\n # GridSpec is 1 index not 0!\n plt.figure(figsize=(34, 32))\n gs = gridspec.GridSpec(nrows=nrows, ncols=ncols+1, wspace=0, hspace=0,\n width_ratios= [1,1,1,1,1,1,1,1,1,1,1.3],\n height_ratios = [.3,1,1,1,1,1,1,1,1,1,1]\n ) \n\n # Dictionary to store all the plots\n axes = {}\n\n\n ######################\n # Win-Loss Columns and Picture Datafarame\n rank_df = make_rank_df(df)\n\n\n ######################\n # Incumbent Row (9x1)\n\n for i in range(0,1):\n for j in range (0, ncols+1):\n axes[i] = plt.subplot(gs[i,j:j+1])\n axes[i].axis(\"off\")\n\n axes[i].xaxis.set_ticks_position('none') \n axes[i].yaxis.set_ticks_position('none') \n axes[i].xaxis.set_ticklabels([])\n axes[i].yaxis.set_ticklabels([]) \n axes[i].set_xlim(0,9); axes[i].set_ylim(0,9) # forces the axes to draw :\\\n\n # Draw Blue Boxes for the Header Row\n rectangle = plt.Rectangle((0, 0), side, side, fc='#06477D', \n linewidth=1, color='#979797' )\n axes[i].add_patch(rectangle)\n\n\n ##################\n # Annotate: Add Incumbent Name, Win/Loss Text, or nothing (if 0)\n\n if j==0:\n # Skip--Don't Annotate as you are on the picture column\n pass\n\n elif j < ncols: \n axes[i].annotate( rank_df.ix[j-1, 'incumbent'].lower(), \n xy=(4.5, 4.5), xycoords='data',\n ha='center', va='center',\n color='w', size= nominee_size, fontname=fontname, \n fontweight='regular', family='sans-serif')\n #print(rank_df.ix[j-1, 'incumbent'].lower())\n\n # Add Win-Loss Columns\n else:\n axes[i].annotate( 'Win/Loss', \n xy=(4.5, 4.5), xycoords='data',\n ha='center', va='center',\n color='w', size= 30 , fontname=fontname, \n fontweight='regular', family='sans-serif')\n\n\n\n ######################\n # Picture Column for Incumbents 1x9\n\n for i in range(1,nrows):\n axes[i] = plt.subplot(gs[i,0])\n axes[i].xaxis.set_ticks_position('none') \n axes[i].yaxis.set_ticks_position('none') \n axes[i].xaxis.set_ticklabels([])\n axes[i].yaxis.set_ticklabels([]) \n\n # Draw border around the image\n for spine in axes[i].spines.values():\n spine.set_edgecolor('#F0EEEE')\n spine.set_linewidth(1)\n\n # Image of the Incumbent\n image = plt.imread(\"./images/{0}\".format(rank_df.ix[i-1,'image']))\n thisimage = np.copy(image)\n axes[i].imshow(thisimage, aspect='auto', extent=None)\n\n\n\n\n ##################\n # Pairwise Comparison Columns (9x9)\n\n\n # the for loop skips picture column (at 0) and win-loss column (at 10)\n for i in range(1,nrows):\n for j in range (1, ncols):\n axes[i] = plt.subplot(gs[i,j])\n\n\n # Make a Cell\n axes[i].set_xlim(0,9); axes[i].set_ylim(0,9) # forces the axes to draw :\\\n axes[i].axis(\"off\")\n axes[i].xaxis.set_ticks_position('none') \n axes[i].yaxis.set_ticks_position('none') \n axes[i].xaxis.set_ticklabels([])\n axes[i].yaxis.set_ticklabels([]) \n\n\n ##############\n # What's the Current Dataframe Row? Offset the Grid Row and Col\n\n # account for the 0 index of pandas and 1 index of Gridspec\n df_cur_row = (i-1) * (ncols-1) + j-1\n\n\n # print(\"row: {0}, col: {1}, grid_cell: {2}, \n # dataframe_row: {3}, nominee {4}, challenger {5}\".format(i,j, \n # grid_cell, df_cur_row, honest_df.ix[df_cur_row ,'incumbent'], \n # honest_df.ix[df_cur_row ,'challenger']))\n\n ##############\n # Incumbent Won\n if df.ix[df_cur_row ,'iscore'] > df.ix[df_cur_row, 'cscore']: \n #print(\"I won: {0}\".format(triangle_p))\n star=False\n rectangle = plt.Rectangle((0, 0), side, side, \n fc='#58BB58', linewidth=1, color='#979797' )\n axes[i].add_patch(rectangle)\n\n triangle_p = side - ((side**2 * df.ix[df_cur_row, 'cscore'] )/side)*2\n points = [[triangle_p, 0], [side-1, 0], [side-1, side-1]]\n polygon = plt.Polygon(points, fc='w', linewidth=0, color='#979797')\n axes[i].add_patch(polygon)\n\n incumbent_text_color = '#FFFFFF'\n challenger_text_color = '#404040'\n\n ##############\n # Challenger Won\n\n elif df.ix[df_cur_row ,'iscore'] < df.ix[df_cur_row, 'cscore']:\n star=False \n rectangle = plt.Rectangle((0, 0), side, side, \n fc='#76A176', linewidth=1, color='#979797' )\n axes[i].add_patch(rectangle)\n\n triangle_p = side - ((side**2 * df.ix[df_cur_row,'iscore'] )/side)*2\n points = [[0, side-1], [0, triangle_p], [side-1,side-1] ]\n polygon = plt.Polygon(points, fc='w', linewidth=0, color='#979797')\n axes[i].add_patch(polygon)\n\n incumbent_text_color = '#404040'\n challenger_text_color = '#FFFFFF'\n\n ##############\n # Put Star and Empty Cell Here (Challenger and Incumbent is same person)\n else:\n star=True\n rectangle = plt.Rectangle((0, 0), side, side, \n fc='#FFFFFF', linewidth=1, color='#979797' )\n axes[i].add_patch(rectangle)\n\n # Photoshop much better at scaling than MAtplotlib :\\\n arr_star = read_png('./images/small_star.png')\n imagebox = OffsetImage(arr_star, zoom=1, origin='upper', interpolation='sinc')\n xy = [4.5, 4.5] # coordinates to position this image\n\n ab = AnnotationBbox(imagebox, xy=xy, xybox=(-1,0), \n xycoords='data',boxcoords=\"offset points\",\n frameon=False) \n axes[i].add_artist(ab)\n\n\n ##############\n # Box Outline (looks better visually--adds separation b/w cells)\n box_outline = plt.Rectangle((0, 0), side-1, side-1, \n fill=False, lw=1, color='#F0EEEE', zorder=1 )\n axes[i].add_patch(box_outline)\n\n\n\n ##############\n # Add Incumbent Challenger Text if this is not an Empty Cell with a Star Graphic\n\n if star==False:\n # Incumbent Percentage\n axes[i].annotate('{:0.0%}'.format(df.ix[df_cur_row ,'iscore']), \n xy=(.5, 8.5), xycoords='data',\n horizontalalignment='left', verticalalignment='top',\n color=incumbent_text_color, size= percent_size, \n fontname=fontname, fontweight='medium', family='sans-serif'\n )\n # Incumbent Name \n axes[i].annotate(df.ix[df_cur_row ,'incumbent'], \n xy=(.5, 7.0), xycoords='data',\n horizontalalignment='left', verticalalignment='top',\n color=incumbent_text_color, size= nominee_size, \n fontname=fontname, fontweight='medium', family='sans-serif'\n )\n\n\n # Challenger Percentage\n axes[i].annotate('{:0.0%}'.format(df.ix[df_cur_row ,'cscore']), \n xy=(8.5, 1.5), xycoords='data',\n horizontalalignment='right', verticalalignment='bottom',\n color=challenger_text_color, size= percent_size, \n fontname=fontname, fontweight='medium', family='sans-serif'\n )\n # Challenger Name\n axes[i].annotate(df.ix[df_cur_row ,'challenger'],\n xy=(8.5, .5), xycoords='data',\n horizontalalignment='right', verticalalignment='bottom',\n color=challenger_text_color, size= nominee_size, \n fontname=fontname, fontweight='medium', family='sans-serif'\n )\n ##############\n # Win-Loss Column\n\n for i in range(1,nrows):\n axes[i] = plt.subplot(gs[i,10])\n axes[i].set_xlim(0,9); axes[i].set_ylim(0,9) # forces the axes to draw :\\\n axes[i].axis(\"off\")\n axes[i].xaxis.set_ticks_position('none') \n axes[i].yaxis.set_ticks_position('none') \n axes[i].xaxis.set_ticklabels([])\n axes[i].yaxis.set_ticklabels([]) \n\n if i==1:\n fc='#58BB58'; text_color= '#FFFFFF'\n else:\n fc = '#FFFFFF'; text_color = '#404040'\n\n rectangle = plt.Rectangle((0, 0), side*2, side, fc=fc, linewidth=1, color='#979797' )\n axes[i].add_patch(rectangle)\n\n # Add Win-Loss Text (remeber the 0 index of Pandas and 1 index of Gridspec)\n axes[i].annotate(rank_df.ix[i-1,'win_loss'], \n xy=(4.5, 4.5), xycoords='data',\n horizontalalignment='center', verticalalignment='center',\n color=text_color, size= 62, fontname=fontname, \n fontweight='light', family='sans-serif'\n )\n\n # Add Box Outline (looks better visually--adds separation b/w cells)\n box_outline = plt.Rectangle((0, 0), side-1, side-1, fill=False, \n lw=1, color='#F0EEEE', zorder=1 ) \n axes[i].add_patch(box_outline) ", | |
| "execution_count": 34, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "---\n### BxR Plot––Honest Assessment for Everyone (Long Set)" | |
| }, | |
| { | |
| "metadata": { | |
| "trusted": true, | |
| "collapsed": false | |
| }, | |
| "cell_type": "code", | |
| "source": "df_long = pd.read_csv('./csv/df_long_raw.csv', index_col=0)\n\n# Slice and Clean up the df for Honest Assessment\nnominees_df = df_long.ix[:,171:180]\n(nominees_df.rename(columns=lambda x: x.replace(\"'\",'')\n .replace(\",\",'').split(\" \")[1]\n .strip().title(), inplace=True)) \nnominees_df.fillna(0, inplace=True)\n\n# Prepare data for honest assessment...and make the Pairwise BXR Graph\ndata= pairwise_bxr_data(nominees_df)\npairwise_bxr_graph(data)", | |
| "execution_count": 36, | |
| "outputs": [ | |
| { | |
| "output_type": "display_data", | |
| "data": { |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment