Skip to content

Instantly share code, notes, and snippets.

@leighajarett
Last active March 22, 2024 16:56
Show Gist options
  • Select an option

  • Save leighajarett/ea74f73f2b6d2a514885bd02f12b5800 to your computer and use it in GitHub Desktop.

Select an option

Save leighajarett/ea74f73f2b6d2a514885bd02f12b5800 to your computer and use it in GitHub Desktop.
Examples for leveraging the BigQuery Public Geo Boundaries tables for geospatial analytics
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "leveraging_us_boundary_datasets_bq",
"provenance": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "CDNOnOXP4kYb"
},
"source": [
"# Leveraging the US Geo Boundaries Dataset in BQ"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "0VnBzJdE4VCK"
},
"source": [
"This Notebook runs through example queries for how to use BigQuery Public Datasets, specifically the US Geo Boundaries, to join against first party data for more meaningful geospatail analysis"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ACfJ_Y7J4vYQ"
},
"source": [
"First, we need to establish a connection to BigQuery\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "cKxwBhl9490M",
"outputId": "dfeca504-b3d3-4064-cb51-90c60ef68c23"
},
"source": [
"from google.colab import auth\n",
"auth.authenticate_user()\n",
"print('Authenticated')"
],
"execution_count": 1,
"outputs": [
{
"output_type": "stream",
"text": [
"Authenticated\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "xzdeqvmy5GPg"
},
"source": [
"Colab includes the google.colab.data_table package that can be used to display large pandas dataframes as an interactive data table. It can be enabled with:"
]
},
{
"cell_type": "code",
"metadata": {
"id": "bvf8LRK85LLr"
},
"source": [
"%load_ext google.colab.data_table"
],
"execution_count": 2,
"outputs": []
},
{
"source": [
"Now, we can create a testing dataset to save some functions later on in this notebooks"
],
"cell_type": "markdown",
"metadata": {}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
" %%bigquery \n",
" CREATE SCHEMA testing"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "fQwRuzRO5S-a"
},
"source": [
"The first query we run maps the latitude and longitude for each customer, from our sample retail data, to a metropolitan division"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 419
},
"id": "89tGRR9q4j_a",
"outputId": "160bd587-68de-497a-a389-cf93f8adaf75"
},
"source": [
"%%bigquery \n",
"\n",
"SELECT \n",
" cust.id as customer_id, \n",
" metro.name as metro_name \n",
"FROM `looker-private-demo.retail.customers` as cust\n",
",`bigquery-public-data.geo_us_boundaries.metropolitan_divisions` as metro\n",
"WHERE ST_WITHIN(ST_GEOGPOINT(cust.longitude, cust.latitude),metro.metdiv_geom)"
],
"execution_count": 3,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>customer_id</th>\n",
" <th>metro_name</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>53571</td>\n",
" <td>Frederick-Gaithersburg-Rockville, MD</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>4998</td>\n",
" <td>Frederick-Gaithersburg-Rockville, MD</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>75772</td>\n",
" <td>Frederick-Gaithersburg-Rockville, MD</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>52108</td>\n",
" <td>Frederick-Gaithersburg-Rockville, MD</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>84456</td>\n",
" <td>San Rafael, CA</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21975</th>\n",
" <td>55437</td>\n",
" <td>Montgomery County-Bucks County-Chester County, PA</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21976</th>\n",
" <td>40101</td>\n",
" <td>Montgomery County-Bucks County-Chester County, PA</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21977</th>\n",
" <td>75812</td>\n",
" <td>Montgomery County-Bucks County-Chester County, PA</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21978</th>\n",
" <td>1514</td>\n",
" <td>Montgomery County-Bucks County-Chester County, PA</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21979</th>\n",
" <td>68967</td>\n",
" <td>Montgomery County-Bucks County-Chester County, PA</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>21980 rows × 2 columns</p>\n",
"</div>"
],
"text/plain": [
" customer_id metro_name\n",
"0 53571 Frederick-Gaithersburg-Rockville, MD\n",
"1 4998 Frederick-Gaithersburg-Rockville, MD\n",
"2 75772 Frederick-Gaithersburg-Rockville, MD\n",
"3 52108 Frederick-Gaithersburg-Rockville, MD\n",
"4 84456 San Rafael, CA\n",
"... ... ...\n",
"21975 55437 Montgomery County-Bucks County-Chester County, PA\n",
"21976 40101 Montgomery County-Bucks County-Chester County, PA\n",
"21977 75812 Montgomery County-Bucks County-Chester County, PA\n",
"21978 1514 Montgomery County-Bucks County-Chester County, PA\n",
"21979 68967 Montgomery County-Bucks County-Chester County, PA\n",
"\n",
"[21980 rows x 2 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 3
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "MfwXzvGH5mTo"
},
"source": [
"Alternatively, we can map the customers to Designated Market Areas (DMAs) for targeted advertising use cases"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 419
},
"id": "caO11k815gZF",
"outputId": "1a76654e-5275-4d22-d5c6-02fb35adf73f"
},
"source": [
"%%bigquery \n",
"\n",
"SELECT \n",
" cust.id as customer_id, \n",
" dma.dma_name \n",
"FROM `looker-private-demo.retail.customers` as cust\n",
",`bigquery-public-data.geo_us_boundaries.designated_market_area` as dma\n",
"WHERE ST_WITHIN(ST_GEOGPOINT(cust.longitude, cust.latitude),dma.dma_geom)\n"
],
"execution_count": 4,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>customer_id</th>\n",
" <th>dma_name</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>76175</td>\n",
" <td>Buffalo, NY</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>41549</td>\n",
" <td>Evansville, IN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>62511</td>\n",
" <td>Evansville, IN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>21589</td>\n",
" <td>Huntsville-Decatur (Florence), AL</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>76805</td>\n",
" <td>Buffalo, NY</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>68413</th>\n",
" <td>52417</td>\n",
" <td>Harlingen-Weslaco-Brownsville-McAllen, TX</td>\n",
" </tr>\n",
" <tr>\n",
" <th>68414</th>\n",
" <td>20006</td>\n",
" <td>Harlingen-Weslaco-Brownsville-McAllen, TX</td>\n",
" </tr>\n",
" <tr>\n",
" <th>68415</th>\n",
" <td>81512</td>\n",
" <td>Harlingen-Weslaco-Brownsville-McAllen, TX</td>\n",
" </tr>\n",
" <tr>\n",
" <th>68416</th>\n",
" <td>32543</td>\n",
" <td>Harlingen-Weslaco-Brownsville-McAllen, TX</td>\n",
" </tr>\n",
" <tr>\n",
" <th>68417</th>\n",
" <td>23642</td>\n",
" <td>Harlingen-Weslaco-Brownsville-McAllen, TX</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>68418 rows × 2 columns</p>\n",
"</div>"
],
"text/plain": [
" customer_id dma_name\n",
"0 76175 Buffalo, NY\n",
"1 41549 Evansville, IN\n",
"2 62511 Evansville, IN\n",
"3 21589 Huntsville-Decatur (Florence), AL\n",
"4 76805 Buffalo, NY\n",
"... ... ...\n",
"68413 52417 Harlingen-Weslaco-Brownsville-McAllen, TX\n",
"68414 20006 Harlingen-Weslaco-Brownsville-McAllen, TX\n",
"68415 81512 Harlingen-Weslaco-Brownsville-McAllen, TX\n",
"68416 32543 Harlingen-Weslaco-Brownsville-McAllen, TX\n",
"68417 23642 Harlingen-Weslaco-Brownsville-McAllen, TX\n",
"\n",
"[68418 rows x 2 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 4
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "38TtJZch57Pk"
},
"source": [
"We can also map them to zipcode, for example - for filling in missing data so that aggregations show the correct numbers"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 623
},
"id": "fU9w5h4F6COS",
"outputId": "2ec92ea3-f2ec-4d7c-f6c7-52195506aea7"
},
"source": [
"%%bigquery \n",
"\n",
"SELECT \n",
" zip.zip_code, \n",
" count(distinct cust.id) as unique_customers\n",
"FROM `looker-private-demo.retail.customers` as cust\n",
",`bigquery-public-data.geo_us_boundaries.zip_codes` as zip\n",
"WHERE ST_WITHIN(ST_GEOGPOINT(cust.longitude, cust.latitude),zip.zip_code_geom)\n",
"GROUP BY 1"
],
"execution_count": 5,
"outputs": [
{
"output_type": "execute_result",
"data": {
@leighajarett
Copy link
Author

Adding notebook

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment