Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save Kelechiede/98e2371ec938eaeadae832ef04adc8b5 to your computer and use it in GitHub Desktop.

Select an option

Save Kelechiede/98e2371ec938eaeadae832ef04adc8b5 to your computer and use it in GitHub Desktop.
Analyzing Tesla and GameStop stock using yfinance and web scraping
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": "<p style=\"\">\n <a href=\"https://skills.network/?utm_medium=Exinfluencer&amp;utm_source=Exinfluencer&amp;utm_content=000026UJ&amp;utm_term=10006555&amp;utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkPY0220ENSkillsNetwork900-2022-01-01\" target=\"_blank\">\n <img src=\"https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png\" width=\"200\" alt=\"Skills Network Logo\">\n </a>\n</p>\n"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "<h1>Extracting and Visualizing Stock Data</h1>\n<h2>Description</h2>\n"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Extracting essential data from a dataset and displaying it is a necessary part of data science; therefore individuals can make correct decisions based on the data. In this assignment, you will extract some stock data, you will then display this data in a graph.\n"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "<h2>Table of Contents</h2>\n<div class=\"alert alert-block alert-info\" style=\"\">\n <ul>\n <li>Define a Function that Makes a Graph</li>\n <li>Question 1: Use yfinance to Extract Stock Data</li>\n <li>Question 2: Use Webscraping to Extract Tesla Revenue Data</li>\n <li>Question 3: Use yfinance to Extract Stock Data</li>\n <li>Question 4: Use Webscraping to Extract GME Revenue Data</li>\n <li>Question 5: Plot Tesla Stock Graph</li>\n <li>Question 6: Plot GameStop Stock Graph</li>\n </ul>\n<p>\n Estimated Time Needed: <strong>30 min</strong></p>\n</div>\n\n<hr>\n"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "***Note***:- If you are working Locally using anaconda, please uncomment the following code and execute it.\nUse the version as per your python version.\n"
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"id": "1bf71f97-58bd-4a9e-81e7-ce570ad376b3"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": "Collecting yfinance\n Downloading yfinance-0.2.52-py2.py3-none-any.whl.metadata (5.8 kB)\nCollecting pandas>=1.3.0 (from yfinance)\n Downloading pandas-2.2.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (89 kB)\nCollecting numpy>=1.16.5 (from yfinance)\n Downloading numpy-2.2.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (62 kB)\nRequirement already satisfied: requests>=2.31 in /opt/conda/lib/python3.12/site-packages (from yfinance) (2.32.3)\nCollecting multitasking>=0.0.7 (from yfinance)\n Downloading multitasking-0.0.11-py3-none-any.whl.metadata (5.5 kB)\nCollecting lxml>=4.9.1 (from yfinance)\n Downloading lxml-5.3.0-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (3.8 kB)\nRequirement already satisfied: platformdirs>=2.0.0 in /opt/conda/lib/python3.12/site-packages (from yfinance) (4.3.6)\nRequirement already satisfied: pytz>=2022.5 in /opt/conda/lib/python3.12/site-packages (from yfinance) (2024.2)\nRequirement already satisfied: frozendict>=2.3.4 in /opt/conda/lib/python3.12/site-packages (from yfinance) (2.4.6)\nCollecting peewee>=3.16.2 (from yfinance)\n Downloading peewee-3.17.8.tar.gz (948 kB)\n\u001b[2K \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m948.2/948.2 kB\u001b[0m \u001b[31m44.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n Installing build dependencies ... \u001b[?done\n\u001b[?25h Getting requirements to build wheel ... \u001b[?25ldone\n\u001b[?25h Preparing metadata (pyproject.toml) ... \u001b[?25ldone\n\u001b[?25hRequirement already satisfied: beautifulsoup4>=4.11.1 in /opt/conda/lib/python3.12/site-packages (from yfinance) (4.12.3)\nCollecting html5lib>=1.1 (from yfinance)\n Downloading html5lib-1.1-py2.py3-none-any.whl.metadata (16 kB)\nRequirement already satisfied: soupsieve>1.2 in /opt/conda/lib/python3.12/site-packages (from beautifulsoup4>=4.11.1->yfinance) (2.5)\nRequirement already satisfied: six>=1.9 in /opt/conda/lib/python3.12/site-packages (from html5lib>=1.1->yfinance) (1.17.0)\nRequirement already satisfied: webencodings in /opt/conda/lib/python3.12/site-packages (from html5lib>=1.1->yfinance) (0.5.1)\nRequirement already satisfied: python-dateutil>=2.8.2 in /opt/conda/lib/python3.12/site-packages (from pandas>=1.3.0->yfinance) (2.9.0.post0)\nCollecting tzdata>=2022.7 (from pandas>=1.3.0->yfinance)\n Downloading tzdata-2025.1-py2.py3-none-any.whl.metadata (1.4 kB)\nRequirement already satisfied: charset_normalizer<4,>=2 in /opt/conda/lib/python3.12/site-packages (from requests>=2.31->yfinance) (3.4.1)\nRequirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.12/site-packages (from requests>=2.31->yfinance) (3.10)\nRequirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/lib/python3.12/site-packages (from requests>=2.31->yfinance) (2.3.0)\nRequirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.12/site-packages (from requests>=2.31->yfinance) (2024.12.14)\nDownloading yfinance-0.2.52-py2.py3-none-any.whl (108 kB)\nDownloading html5lib-1.1-py2.py3-none-any.whl (112 kB)\nDownloading lxml-5.3.0-cp312-cp312-manylinux_2_28_x86_64.whl (4.9 MB)\n\u001b[2K \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m4.9/4.9 MB\u001b[0m \u001b[31m129.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n\u001b[?25hDownloading multitasking-0.0.11-py3-none-any.whl (8.5 kB)\nDownloading numpy-2.2.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.1 MB)\n\u001b[2K \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m16.1/16.1 MB\u001b[0m \u001b[31m124.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n\u001b[?25hDownloading pandas-2.2.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.7 MB)\n\u001b[2K \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m12.7/12.7 MB\u001b[0m \u001b[31m125.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n\u001b[?25hDownloading tzdata-2025.1-py2.py3-none-any.whl (346 kB)\nBuilding wheels for collected packages: peewee\n Building wheel for peewee (pyproject.toml) ... \u001b[?done\n\u001b[?25h Created wheel for peewee: filename=peewee-3.17.8-cp312-cp312-linux_x86_64.whl size=303769 sha256=716adbdf3e79bb0243f78d62c3b41d2f69ae81cf037b9987904c97377bf57755\n Stored in directory: /home/jupyterlab/.cache/pip/wheels/8f/65/34/456800445efeafb05164fe95285c70e81ba1d96bae30f43917\nSuccessfully built peewee\nInstalling collected packages: peewee, multitasking, tzdata, numpy, lxml, html5lib, pandas, yfinance\nSuccessfully installed html5lib-1.1 lxml-5.3.0 multitasking-0.0.11 numpy-2.2.2 pandas-2.2.3 peewee-3.17.8 tzdata-2025.1 yfinance-0.2.52\nCollecting bs4\n Downloading bs4-0.0.2-py2.py3-none-any.whl.metadata (411 bytes)\nRequirement already satisfied: beautifulsoup4 in /opt/conda/lib/python3.12/site-packages (from bs4) (4.12.3)\nRequirement already satisfied: soupsieve>1.2 in /opt/conda/lib/python3.12/site-packages (from beautifulsoup4->bs4) (2.5)\nDownloading bs4-0.0.2-py2.py3-none-any.whl (1.2 kB)\nInstalling collected packages: bs4\nSuccessfully installed bs4-0.0.2\nRequirement already satisfied: nbformat in /opt/conda/lib/python3.12/site-packages (5.10.4)\nRequirement already satisfied: fastjsonschema>=2.15 in /opt/conda/lib/python3.12/site-packages (from nbformat) (2.21.1)\nRequirement already satisfied: jsonschema>=2.6 in /opt/conda/lib/python3.12/site-packages (from nbformat) (4.23.0)\nRequirement already satisfied: jupyter-core!=5.0.*,>=4.12 in /opt/conda/lib/python3.12/site-packages (from nbformat) (5.7.2)\nRequirement already satisfied: traitlets>=5.1 in /opt/conda/lib/python3.12/site-packages (from nbformat) (5.14.3)\nRequirement already satisfied: attrs>=22.2.0 in /opt/conda/lib/python3.12/site-packages (from jsonschema>=2.6->nbformat) (25.1.0)\nRequirement already satisfied: jsonschema-specifications>=2023.03.6 in /opt/conda/lib/python3.12/site-packages (from jsonschema>=2.6->nbformat) (2024.10.1)\nRequirement already satisfied: referencing>=0.28.4 in /opt/conda/lib/python3.12/site-packages (from jsonschema>=2.6->nbformat) (0.36.2)\nRequirement already satisfied: rpds-py>=0.7.1 in /opt/conda/lib/python3.12/site-packages (from jsonschema>=2.6->nbformat) (0.22.3)\nRequirement already satisfied: platformdirs>=2.5 in /opt/conda/lib/python3.12/site-packages (from jupyter-core!=5.0.*,>=4.12->nbformat) (4.3.6)\nRequirement already satisfied: typing-extensions>=4.4.0 in /opt/conda/lib/python3.12/site-packages (from referencing>=0.28.4->jsonschema>=2.6->nbformat) (4.12.2)\n"
}
],
"source": "!pip install yfinance\n!pip install bs4\n!pip install nbformat"
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"id": "2814ec9f-8d4a-44a0-b292-bfaa353fd779"
},
"outputs": [],
"source": "import yfinance as yf\nimport pandas as pd\nimport requests\nfrom bs4 import BeautifulSoup\nimport plotly.graph_objects as go\nfrom plotly.subplots import make_subplots"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "In Python, you can ignore warnings using the warnings module. You can use the filterwarnings function to filter or ignore specific warning messages or categories.\n"
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"id": "7c4c608d-8817-41ae-a239-0c8999971e85"
},
"outputs": [],
"source": "import warnings\n# Ignore all warnings\nwarnings.filterwarnings(\"ignore\", category=FutureWarning)"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "## Define Graphing Function\n"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "In this section, we define the function `make_graph`. **You don't have to know how the function works, you should only care about the inputs. It takes a dataframe with stock data (dataframe must contain Date and Close columns), a dataframe with revenue data (dataframe must contain Date and Revenue columns), and the name of the stock.**\n"
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"id": "549b0dc1-cdda-44e3-931f-4937749aa6f2"
},
"outputs": [],
"source": "def make_graph(stock_data, revenue_data, stock):\n fig = make_subplots(rows=2, cols=1, shared_xaxes=True, subplot_titles=(\"Historical Share Price\", \"Historical Revenue\"), vertical_spacing = .3)\n stock_data_specific = stock_data[stock_data.Date <= '2021-06-14']\n revenue_data_specific = revenue_data[revenue_data.Date <= '2021-04-30']\n fig.add_trace(go.Scatter(x=pd.to_datetime(stock_data_specific.Date, infer_datetime_format=True), y=stock_data_specific.Close.astype(\"float\"), name=\"Share Price\"), row=1, col=1)\n fig.add_trace(go.Scatter(x=pd.to_datetime(revenue_data_specific.Date, infer_datetime_format=True), y=revenue_data_specific.Revenue.astype(\"float\"), name=\"Revenue\"), row=2, col=1)\n fig.update_xaxes(title_text=\"Date\", row=1, col=1)\n fig.update_xaxes(title_text=\"Date\", row=2, col=1)\n fig.update_yaxes(title_text=\"Price ($US)\", row=1, col=1)\n fig.update_yaxes(title_text=\"Revenue ($US Millions)\", row=2, col=1)\n fig.update_layout(showlegend=False,\n height=900,\n title=stock,\n xaxis_rangeslider_visible=True)\n fig.show()"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Use the make_graph function that we\u2019ve already defined. You\u2019ll need to invoke it in questions 5 and 6 to display the graphs and create the dashboard. \n&gt; **Note: You don\u2019t need to redefine the function for plotting graphs anywhere else in this notebook; just use the existing function.**\n"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "## Question 1: Use yfinance to Extract Stock Data\n"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Using the `Ticker` function enter the ticker symbol of the stock we want to extract data on to create a ticker object. The stock is Tesla and its ticker symbol is `TSLA`.\n"
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"id": "a5a83f58-fc26-4d8c-845a-d679a51e84e0"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": "yfinance.Ticker object <TSLA>\n"
}
],
"source": "tesla = yf.Ticker(\"TSLA\")\nprint(tesla)"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Using the ticker object and the function `history` extract stock information and save it in a dataframe named `tesla_data`. Set the `period` parameter to ` \"max\" ` so we get information for the maximum amount of time.\n"
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"id": "3a3fe378-3552-4f89-9cff-a42f3473d9e7"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": " Open High Low Close \\\nDate \n2010-06-29 00:00:00-04:00 1.266667 1.666667 1.169333 1.592667 \n2010-06-30 00:00:00-04:00 1.719333 2.028000 1.553333 1.588667 \n2010-07-01 00:00:00-04:00 1.666667 1.728000 1.351333 1.464000 \n2010-07-02 00:00:00-04:00 1.533333 1.540000 1.247333 1.280000 \n2010-07-06 00:00:00-04:00 1.333333 1.333333 1.055333 1.074000 \n... ... ... ... ... \n2025-01-27 00:00:00-05:00 394.799988 406.690002 389.000000 397.149994 \n2025-01-28 00:00:00-05:00 396.910004 400.589996 386.500000 398.089996 \n2025-01-29 00:00:00-05:00 395.209991 398.589996 384.480011 389.100006 \n2025-01-30 00:00:00-05:00 410.779999 412.500000 384.410004 400.279999 \n2025-01-31 00:00:00-05:00 401.529999 419.989990 401.339996 404.600006 \n\n Volume Dividends Stock Splits \nDate \n2010-06-29 00:00:00-04:00 281494500 0.0 0.0 \n2010-06-30 00:00:00-04:00 257806500 0.0 0.0 \n2010-07-01 00:00:00-04:00 123282000 0.0 0.0 \n2010-07-02 00:00:00-04:00 77097000 0.0 0.0 \n2010-07-06 00:00:00-04:00 103003500 0.0 0.0 \n... ... ... ... \n2025-01-27 00:00:00-05:00 58125500 0.0 0.0 \n2025-01-28 00:00:00-05:00 48910700 0.0 0.0 \n2025-01-29 00:00:00-05:00 68033600 0.0 0.0 \n2025-01-30 00:00:00-05:00 98092900 0.0 0.0 \n2025-01-31 00:00:00-05:00 83283600 0.0 0.0 \n\n[3672 rows x 7 columns]\n"
}
],
"source": "# Extract historical stock data\ntesla_data = tesla.history(period=\"max\")\nprint(tesla_data)"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "**Reset the index** using the `reset_index(inplace=True)` function on the tesla_data DataFrame and display the first five rows of the `tesla_data` dataframe using the `head` function. Take a screenshot of the results and code from the beginning of Question 1 to the results below.\n"
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"id": "597d4a06-3465-4ac1-8da6-bc116eb8fc41"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": " Date Open High Low Close \\\n0 2010-06-29 00:00:00-04:00 1.266667 1.666667 1.169333 1.592667 \n1 2010-06-30 00:00:00-04:00 1.719333 2.028000 1.553333 1.588667 \n2 2010-07-01 00:00:00-04:00 1.666667 1.728000 1.351333 1.464000 \n3 2010-07-02 00:00:00-04:00 1.533333 1.540000 1.247333 1.280000 \n4 2010-07-06 00:00:00-04:00 1.333333 1.333333 1.055333 1.074000 \n\n Volume Dividends Stock Splits \n0 281494500 0.0 0.0 \n1 257806500 0.0 0.0 \n2 123282000 0.0 0.0 \n3 77097000 0.0 0.0 \n4 103003500 0.0 0.0 \n"
}
],
"source": "# Reset index\ntesla_data.reset_index(inplace=True)\n\n# Display first five rows\nimport pandas as pd\nprint(tesla_data.head()) # This prints the first five rows of the DataFrame\n"
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"id": "ea5fdf46-9b93-4b75-ba81-33630684987f"
},
"outputs": [
{
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Date</th>\n <th>Open</th>\n <th>High</th>\n <th>Low</th>\n <th>Close</th>\n <th>Volume</th>\n <th>Dividends</th>\n <th>Stock Splits</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>2010-06-29 00:00:00-04:00</td>\n <td>1.266667</td>\n <td>1.666667</td>\n <td>1.169333</td>\n <td>1.592667</td>\n <td>281494500</td>\n <td>0.0</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>2010-06-30 00:00:00-04:00</td>\n <td>1.719333</td>\n <td>2.028000</td>\n <td>1.553333</td>\n <td>1.588667</td>\n <td>257806500</td>\n <td>0.0</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>2010-07-01 00:00:00-04:00</td>\n <td>1.666667</td>\n <td>1.728000</td>\n <td>1.351333</td>\n <td>1.464000</td>\n <td>123282000</td>\n <td>0.0</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>2010-07-02 00:00:00-04:00</td>\n <td>1.533333</td>\n <td>1.540000</td>\n <td>1.247333</td>\n <td>1.280000</td>\n <td>77097000</td>\n <td>0.0</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>4</th>\n <td>2010-07-06 00:00:00-04:00</td>\n <td>1.333333</td>\n <td>1.333333</td>\n <td>1.055333</td>\n <td>1.074000</td>\n <td>103003500</td>\n <td>0.0</td>\n <td>0.0</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " Date Open High Low Close \\\n0 2010-06-29 00:00:00-04:00 1.266667 1.666667 1.169333 1.592667 \n1 2010-06-30 00:00:00-04:00 1.719333 2.028000 1.553333 1.588667 \n2 2010-07-01 00:00:00-04:00 1.666667 1.728000 1.351333 1.464000 \n3 2010-07-02 00:00:00-04:00 1.533333 1.540000 1.247333 1.280000 \n4 2010-07-06 00:00:00-04:00 1.333333 1.333333 1.055333 1.074000 \n\n Volume Dividends Stock Splits \n0 281494500 0.0 0.0 \n1 257806500 0.0 0.0 \n2 123282000 0.0 0.0 \n3 77097000 0.0 0.0 \n4 103003500 0.0 0.0 "
},
"metadata": {},
"output_type": "display_data"
}
],
"source": "from IPython.display import display\ndisplay(tesla_data.head())\n"
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "f0e1c2ec-707a-4f81-a2ea-da39d0dba802"
},
"outputs": [],
"source": ""
},
{
"cell_type": "markdown",
"metadata": {},
"source": "## Question 2: Use Webscraping to Extract Tesla Revenue Data\n"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Use the `requests` library to download the webpage https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/revenue.htm Save the text of the response as a variable named `html_data`.\n"
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"id": "ce025a42-bf33-4a4d-9d8d-3a550025a0c6"
},
"outputs": [],
"source": "# Re-import required libraries after execution state reset\nimport pandas as pd\nimport requests\nfrom bs4 import BeautifulSoup\n\n# Step 1: Download the webpage using requests\nurl = \"https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/revenue.htm\"\nhtml_data = requests.get(url).text\n"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Parse the html data using `beautiful_soup` using parser i.e `html5lib` or `html.parser`.\n"
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"id": "21f804bd-b1c5-4031-b10d-2d0dc3d50f54"
},
"outputs": [],
"source": "# Step 2: Parse the html data using BeautifulSoup\nsoup = BeautifulSoup(html_data, \"html.parser\")"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Using `BeautifulSoup` or the `read_html` function extract the table with `Tesla Revenue` and store it into a dataframe named `tesla_revenue`. The dataframe should have columns `Date` and `Revenue`.\n"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "<details><summary>Step-by-step instructions</summary>\n\n```\n\nHere are the step-by-step instructions:\n\n1. Create an Empty DataFrame\n2. Find the Relevant Table\n3. Check for the Tesla Quarterly Revenue Table\n4. Iterate Through Rows in the Table Body\n5. Extract Data from Columns\n6. Append Data to the DataFrame\n\n```\n</details>\n"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "<details><summary>Click here if you need help locating the table</summary>\n\n```\n \nBelow is the code to isolate the table, you will now need to loop through the rows and columns like in the previous lab\n \nsoup.find_all(\"tbody\")[1]\n \nIf you want to use the read_html function the table is located at index 1\n\nWe are focusing on quarterly revenue in the lab.\n```\n\n</details>\n"
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"id": "15263934-8f52-4bb1-8e1a-b81a0e88b623"
},
"outputs": [],
"source": "# Step 3: Extract the Tesla Revenue Table\ntables = soup.find_all(\"table\")\ntesla_revenue = pd.read_html(str(tables[1]))[0] # The required table is at index 1\n\n# Step 4: Rename columns to 'Date' and 'Revenue'\ntesla_revenue.columns = [\"Date\", \"Revenue\"]"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Execute the following line to remove the comma and dollar sign from the `Revenue` column. \n"
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"id": "0eb97880-9d34-4a8c-a220-c034bb9b991a"
},
"outputs": [],
"source": "# tesla_revenue[\"Revenue\"] = tesla_revenue['Revenue'].str.replace(',|\\$',\"\")\n\n# Step 5: Clean the Revenue column\ntesla_revenue[\"Revenue\"] = tesla_revenue[\"Revenue\"].str.replace(\",|\\$\", \"\", regex=True)\n"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Execute the following lines to remove an null or empty strings in the Revenue column.\n"
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"id": "c13e1d8a-4e7d-460f-959f-7a04ba578b0a"
},
"outputs": [],
"source": "# tesla_revenue.dropna(inplace=True)\n\n# tesla_revenue = tesla_revenue[tesla_revenue['Revenue'] != \"\"]\n# Step 6: Remove null or empty strings in the Revenue column\ntesla_revenue.dropna(inplace=True)\ntesla_revenue = tesla_revenue[tesla_revenue[\"Revenue\"] != \"\"]"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Display the last 5 row of the `tesla_revenue` dataframe using the `tail` function. Take a screenshot of the results.\n"
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"id": "1e4aa44b-4dc1-4ee0-9a10-eb480fbba6c9"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": " Date Revenue\n48 2010-09-30 31\n49 2010-06-30 28\n50 2010-03-31 21\n52 2009-09-30 46\n53 2009-06-30 27\n"
}
],
"source": "# Display the last 5 rows\nprint(tesla_revenue.tail())\n"
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "4aa68120-a804-4f79-9af8-e38fc47ccfd6"
},
"outputs": [],
"source": ""
},
{
"cell_type": "markdown",
"metadata": {},
"source": "## Question 3: Use yfinance to Extract Stock Data\n"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Using the `Ticker` function enter the ticker symbol of the stock we want to extract data on to create a ticker object. The stock is GameStop and its ticker symbol is `GME`.\n"
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"id": "e3effc81-9c5c-4f54-89e1-9f36f721df31"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": "yfinance.Ticker object <GME>\n"
}
],
"source": "# Question 3: Use yfinance to Extract GameStop Stock Data\ngme = yf.Ticker(\"GME\")\nprint(gme)"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Using the ticker object and the function `history` extract stock information and save it in a dataframe named `gme_data`. Set the `period` parameter to ` \"max\" ` so we get information for the maximum amount of time.\n"
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"id": "7396daa6-6a2d-484d-9dbb-8e3b9438d212"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": " Open High Low Close \\\nDate \n2002-02-13 00:00:00-05:00 1.620129 1.693350 1.603296 1.691667 \n2002-02-14 00:00:00-05:00 1.712707 1.716073 1.670625 1.683250 \n2002-02-15 00:00:00-05:00 1.683251 1.687459 1.658002 1.674835 \n2002-02-19 00:00:00-05:00 1.666417 1.666417 1.578047 1.607504 \n2002-02-20 00:00:00-05:00 1.615920 1.662210 1.603296 1.662210 \n... ... ... ... ... \n2025-01-27 00:00:00-05:00 26.920000 27.680000 26.799999 26.969999 \n2025-01-28 00:00:00-05:00 27.000000 27.590000 26.650000 27.459999 \n2025-01-29 00:00:00-05:00 27.410000 27.740000 27.059999 27.510000 \n2025-01-30 00:00:00-05:00 27.840000 28.230000 27.709999 27.990000 \n2025-01-31 00:00:00-05:00 27.790001 28.180000 26.900000 26.900000 \n\n Volume Dividends Stock Splits \nDate \n2002-02-13 00:00:00-05:00 76216000 0.0 0.0 \n2002-02-14 00:00:00-05:00 11021600 0.0 0.0 \n2002-02-15 00:00:00-05:00 8389600 0.0 0.0 \n2002-02-19 00:00:00-05:00 7410400 0.0 0.0 \n2002-02-20 00:00:00-05:00 6892800 0.0 0.0 \n... ... ... ... \n2025-01-27 00:00:00-05:00 5060300 0.0 0.0 \n2025-01-28 00:00:00-05:00 3169900 0.0 0.0 \n2025-01-29 00:00:00-05:00 3220400 0.0 0.0 \n2025-01-30 00:00:00-05:00 3343900 0.0 0.0 \n2025-01-31 00:00:00-05:00 4472100 0.0 0.0 \n\n[5780 rows x 7 columns]\n"
}
],
"source": "# Extract historical stock data\ngme_data = gme.history(period=\"max\")\nprint(gme_data)"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "**Reset the index** using the `reset_index(inplace=True)` function on the gme_data DataFrame and display the first five rows of the `gme_data` dataframe using the `head` function. Take a screenshot of the results and code from the beginning of Question 3 to the results below.\n"
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"id": "b021442a-e4c8-4e2d-b798-8e8e799a2171"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": " Date Open High Low Close Volume \\\n0 2002-02-13 00:00:00-05:00 1.620129 1.693350 1.603296 1.691667 76216000 \n1 2002-02-14 00:00:00-05:00 1.712707 1.716073 1.670625 1.683250 11021600 \n2 2002-02-15 00:00:00-05:00 1.683251 1.687459 1.658002 1.674835 8389600 \n3 2002-02-19 00:00:00-05:00 1.666417 1.666417 1.578047 1.607504 7410400 \n4 2002-02-20 00:00:00-05:00 1.615920 1.662210 1.603296 1.662210 6892800 \n\n Dividends Stock Splits \n0 0.0 0.0 \n1 0.0 0.0 \n2 0.0 0.0 \n3 0.0 0.0 \n4 0.0 0.0 \n"
}
],
"source": "# Reset index\ngme_data.reset_index(inplace=True)\n\n# Display first five rows of GameStop Stock Data\nprint(gme_data.head())"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "## Question 4: Use Webscraping to Extract GME Revenue Data\n"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Use the `requests` library to download the webpage https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/stock.html. Save the text of the response as a variable named `html_data_2`.\n"
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"id": "2e745adb-09de-4966-a7d2-d42b947030b6"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": "<html>\n<head>\n\n</head>\n<body>\n<div style=\"margin: 50px auto; width: 50%; border: 1px solid #dfdfdf; padding: 20px 50px 30px 50px; font-family:helvetica;\">\n\n<h1>We do not allow automated access to our servers.</h1>\n<h2><p>Automated access to our data is prohibited by our data provider.</p> \n<p>If you are a user attempting to access the site via a browser, please follow this process to regain access:</p>\n\n<ul>\n <li>Go to <a href='https://whatismyipaddress.com/' target='_blank' rel='noopener noreferrer'>whatismyipaddress</a> and obtain your IPv4 address</li>\n <li>Email us your IPv4 address at <a href=\"/cdn-cgi/l/email-protection\" class=\"__cf_email__\" data-cfemail=\"4821262e270825292b3a273c3a2d262c3b66262d3c\">[email&#160;protected]</a></li>\n <li>We will add you to our whitelist within 24 hours</li>\n</ul>\n</h2>\n</div>\n<script data-cfasync=\"false\">!function(){\"use strict\";function e(e){try{if(\"undefined\"==typeof console)return;\"error\"in console?console.error(e):console.log(e)}catch(e){}}function t(e,t){var r=e.substr(t,2);return parseInt(r,16)}function r(r,n){for(var c=\"\",o=t(r,n),a=n+2;a<r.length;a+=2){var l=t(r,a)^o;c+=String.fromCharCode(l)}try{c=decodeURIComponent(escape(c))}catch(t){e(t)}return function(e){return i.innerHTML='<a href=\"'+e.replace(/\"/g,\"&quot;\")+'\"></a>',i.childNodes[0].getAttribute(\"href\")||\"\"}(c)}function n(t){try{(function(t){for(var n=t.querySelectorAll(\"a\"),o=0;o<n.length;o++)try{var a=n[o],i=a.href.indexOf(c);i>-1&&(a.href=\"mailto:\"+r(a.href,i+c.length))}catch(t){e(t)}})(t),function(t){for(var n=t.querySelectorAll(o),c=0;c<n.length;c++)try{var i=n[c],l=i.parentNode,u=i.getAttribute(a);if(u){var f=r(u,0),d=document.createTextNode(f);l.replaceChild(d,i)}}catch(t){e(t)}}(t),function(t){for(var r=t.querySelectorAll(\"template\"),c=0;c<r.length;c++)try{n(r[c].content)}catch(t){e(t)}}(t)}catch(t){e(t)}}var c=\"/cdn-cgi/l/email-protection#\",o=\".__cf_email__\",a=\"data-cfemail\",i=document.createElement(\"div\");n(document),function(){var e=document.currentScript||document.scripts[document.scripts.length-1];e.parentNode.removeChild(e)}()}();</script><script>(function(){function c(){var b=a.contentDocument||a.contentWindow.document;if(b){var d=b.createElement('script');d.innerHTML=\"window.__CF$cv$params={r:'90b748f90ffa5a39',t:'MTczODQ2ODkzOS4wMDAwMDA='};var a=document.createElement('script');a.nonce='';a.src='/cdn-cgi/challenge-platform/scripts/jsd/main.js';document.getElementsByTagName('head')[0].appendChild(a);\";b.getElementsByTagName('head')[0].appendChild(d)}}if(document.body){var a=document.createElement('iframe');a.height=1;a.width=1;a.style.position='absolute';a.style.top=0;a.style.left=0;a.style.border='none';a.style.visibility='hidden';document.body.appendChild(a);if('loading'!==document.readyState)c();else if(window.addEventListener)document.addEventListener('DOMContentLoaded',c);else{var e=document.onreadystatechange||function(){};document.onreadystatechange=function(b){e(b);'loading'!==document.readyState&&(document.onreadystatechange=e,c())}}}})();</script></body>\n</html>\n"
}
],
"source": "# Question 4: Use Webscraping to Extract GameStop Revenue Data\nurl_gme = \"https://www.macrotrends.net/stocks/charts/GME/gamestop/revenue\"\nhtml_data_2 = requests.get(url_gme).text\nprint(html_data_2)"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Parse the html data using `beautiful_soup` using parser i.e `html5lib` or `html.parser`.\n"
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"id": "c9e6f0ec-bf7e-481a-ad13-32871f9bc196"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": "[]\n"
}
],
"source": "soup_gme = BeautifulSoup(html_data_2, \"html.parser\")\ntables_gme = soup_gme.find_all(\"table\")\nprint(tables_gme)"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Using `BeautifulSoup` or the `read_html` function extract the table with `GameStop Revenue` and store it into a dataframe named `gme_revenue`. The dataframe should have columns `Date` and `Revenue`. Make sure the comma and dollar sign is removed from the `Revenue` column.\n"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "&gt; **Note: Use the method similar to what you did in question 2.** \n"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "<details><summary>Click here if you need help locating the table</summary>\n\n```\n \nBelow is the code to isolate the table, you will now need to loop through the rows and columns like in the previous lab\n \nsoup.find_all(\"tbody\")[1]\n \nIf you want to use the read_html function the table is located at index 1\n\n\n```\n\n</details>\n"
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"id": "8f859585-849b-48a6-86eb-1d1b62d25b45"
},
"outputs": [],
"source": "import yfinance as yf\n\ngme = yf.Ticker(\"GME\")\ngme_financials = gme.financials # Get financial data including revenue\n\n# Extract only revenue\ngme_revenue = gme_financials.loc[\"Total Revenue\"]\ngme_revenue = gme_revenue.reset_index()\ngme_revenue.columns = [\"Date\", \"Revenue\"]\n\n# Convert date format\ngme_revenue[\"Date\"] = pd.to_datetime(gme_revenue[\"Date\"])\n"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Display the last five rows of the `gme_revenue` dataframe using the `tail` function. Take a screenshot of the results.\n"
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"id": "3c0b6f50-b6a2-4ff2-837f-73a20f9d520d"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": " Date Revenue\n0 2024-01-31 5272800000.0\n1 2023-01-31 5927200000.0\n2 2022-01-31 6010700000.0\n3 2021-01-31 5089800000.0\n"
}
],
"source": "# Display last five rows\n# tools.display_dataframe_to_user(name=\"GameStop Revenue Data\", dataframe=gme_revenue.tail())\n\n# Display the last 5 rows\nprint(gme_revenue.tail())\n"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "## Question 5: Plot Tesla Stock Graph\n"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Use the `make_graph` function to graph the Tesla Stock Data, also provide a title for the graph. Note the graph will only show data upto June 2021.\n"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "<details><summary>Hint</summary>\n\n```\n\nYou just need to invoke the make_graph function with the required parameter to print the graphs.The structure to call the `make_graph` function is `make_graph(tesla_data, tesla_revenue, 'Tesla')`.\n\n```\n \n</details>\n"
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"id": "7fc1be83-8ca1-4346-bbc9-8d437984ded2"
},
"outputs": [],
"source": "import yfinance as yf\nimport pandas as pd\nimport plotly.graph_objects as go\nfrom plotly.subplots import make_subplots"
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"id": "996d567e-c53a-4525-8b21-ed45105b8a7a"
},
"outputs": [],
"source": "def make_graph(stock_data, revenue_data, stock_name):\n fig = make_subplots(rows=2, cols=1, shared_xaxes=True, subplot_titles=(\"Historical Share Price\", \"Historical Revenue\"), vertical_spacing=0.3)\n\n # Filter stock data up to June 2021\n stock_data_filtered = stock_data[stock_data[\"Date\"] <= \"2021-06-14\"]\n \n # Filter revenue data up to April 2021\n revenue_data_filtered = revenue_data[revenue_data[\"Date\"] <= \"2021-04-30\"]\n \n # Plot stock price\n fig.add_trace(go.Scatter(\n x=stock_data_filtered[\"Date\"],\n y=stock_data_filtered[\"Close\"],\n name=\"Share Price\"\n ), row=1, col=1)\n\n # Plot revenue\n fig.add_trace(go.Scatter(\n x=revenue_data_filtered[\"Date\"],\n y=revenue_data_filtered[\"Revenue\"],\n name=\"Revenue\"\n ), row=2, col=1)\n\n # Set axis labels\n fig.update_xaxes(title_text=\"Date\", row=1, col=1)\n fig.update_xaxes(title_text=\"Date\", row=2, col=1)\n fig.update_yaxes(title_text=\"Price ($US)\", row=1, col=1)\n fig.update_yaxes(title_text=\"Revenue ($US Millions)\", row=2, col=1)\n\n # Set layout properties\n fig.update_layout(\n showlegend=False,\n height=900,\n title=stock_name,\n xaxis_rangeslider_visible=True\n )\n \n fig.show()"
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"id": "6d5c75d0-59fd-4568-b9d5-a400d32ac9eb"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": " Date Open High Low Close \\\n0 2010-06-29 00:00:00-04:00 1.266667 1.666667 1.169333 1.592667 \n1 2010-06-30 00:00:00-04:00 1.719333 2.028000 1.553333 1.588667 \n2 2010-07-01 00:00:00-04:00 1.666667 1.728000 1.351333 1.464000 \n3 2010-07-02 00:00:00-04:00 1.533333 1.540000 1.247333 1.280000 \n4 2010-07-06 00:00:00-04:00 1.333333 1.333333 1.055333 1.074000 \n\n Volume Dividends Stock Splits \n0 281494500 0.0 0.0 \n1 257806500 0.0 0.0 \n2 123282000 0.0 0.0 \n3 77097000 0.0 0.0 \n4 103003500 0.0 0.0 \n"
}
],
"source": "# Fetch Tesla stock data using Yahoo Finance API\ntesla = yf.Ticker(\"TSLA\")\ntesla_data = tesla.history(period=\"max\")\n\n# Reset index for better visualization\ntesla_data.reset_index(inplace=True)\n\n# Display first five rows\nprint(tesla_data.head())\n# import ace_tools as tools\n# tools.display_dataframe_to_user(name=\"Tesla Stock Data\", dataframe=tesla_data.head())"
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"id": "c2a9dce0-cc81-497e-932f-8d1df60e20ca"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": " Date Revenue\n0 2024-12-31 97690000000.0\n1 2023-12-31 96773000000.0\n2 2022-12-31 81462000000.0\n3 2021-12-31 53823000000.0\n4 2020-12-31 NaN\n"
}
],
"source": "tesla_revenue = tesla.financials.loc[\"Total Revenue\"].reset_index()\ntesla_revenue.columns = [\"Date\", \"Revenue\"]\ntesla_revenue[\"Date\"] = pd.to_datetime(tesla_revenue[\"Date\"])\n\n# Display Tesla revenue data\nprint(tesla_revenue.tail())\n# tools.display_dataframe_to_user(name=\"Tesla Revenue Data\", dataframe=tesla_revenue.tail())\n"
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"id": "5c1af60a-3f1f-4731-a57e-1365d1506591"
},
"outputs": [
{
"data": {
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment