Kelechiede · February 4, 2025 07:22
diff --git a/Extracting_and_Visualizing_Stock_Data.ipynb b/Extracting_and_Visualizing_Stock_Data.ipynb
 {
    "cells": [
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "<p style=\"\">\n    <a href=\"https://skills.network/?utm_medium=Exinfluencer&amp;utm_source=Exinfluencer&amp;utm_content=000026UJ&amp;utm_term=10006555&amp;utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkPY0220ENSkillsNetwork900-2022-01-01\" target=\"_blank\">\n    <img src=\"https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png\" width=\"200\" alt=\"Skills Network Logo\">\n    </a>\n</p>\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "<h1>Extracting and Visualizing Stock Data</h1>\n<h2>Description</h2>\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "Extracting essential data from a dataset and displaying it is a necessary part of data science; therefore individuals can make correct decisions based on the data. In this assignment, you will extract some stock data, you will then display this data in a graph.\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "<h2>Table of Contents</h2>\n<div class=\"alert alert-block alert-info\" style=\"\">\n    <ul>\n        <li>Define a Function that Makes a Graph</li>\n        <li>Question 1: Use yfinance to Extract Stock Data</li>\n        <li>Question 2: Use Webscraping to Extract Tesla Revenue Data</li>\n        <li>Question 3: Use yfinance to Extract Stock Data</li>\n        <li>Question 4: Use Webscraping to Extract GME Revenue Data</li>\n        <li>Question 5: Plot Tesla Stock Graph</li>\n        <li>Question 6: Plot GameStop Stock Graph</li>\n    </ul>\n<p>\n    Estimated Time Needed: <strong>30 min</strong></p>\n</div>\n\n<hr>\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "***Note***:- If you are working Locally using anaconda, please uncomment the following code and execute it.\nUse the version as per your python version.\n"
        },
        {
            "cell_type": "code",
            "execution_count": 1,
            "metadata": {
                "id": "1bf71f97-58bd-4a9e-81e7-ce570ad376b3"
            },
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": "Collecting yfinance\n  Downloading yfinance-0.2.52-py2.py3-none-any.whl.metadata (5.8 kB)\nCollecting pandas>=1.3.0 (from yfinance)\n  Downloading pandas-2.2.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (89 kB)\nCollecting numpy>=1.16.5 (from yfinance)\n  Downloading numpy-2.2.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (62 kB)\nRequirement already satisfied: requests>=2.31 in /opt/conda/lib/python3.12/site-packages (from yfinance) (2.32.3)\nCollecting multitasking>=0.0.7 (from yfinance)\n  Downloading multitasking-0.0.11-py3-none-any.whl.metadata (5.5 kB)\nCollecting lxml>=4.9.1 (from yfinance)\n  Downloading lxml-5.3.0-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (3.8 kB)\nRequirement already satisfied: platformdirs>=2.0.0 in /opt/conda/lib/python3.12/site-packages (from yfinance) (4.3.6)\nRequirement already satisfied: pytz>=2022.5 in /opt/conda/lib/python3.12/site-packages (from yfinance) (2024.2)\nRequirement already satisfied: frozendict>=2.3.4 in /opt/conda/lib/python3.12/site-packages (from yfinance) (2.4.6)\nCollecting peewee>=3.16.2 (from yfinance)\n  Downloading peewee-3.17.8.tar.gz (948 kB)\n\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m948.2/948.2 kB\u001b[0m \u001b[31m44.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n  Installing build dependencies ... \u001b[?done\n\u001b[?25h  Getting requirements to build wheel ... \u001b[?25ldone\n\u001b[?25h  Preparing metadata (pyproject.toml) ... \u001b[?25ldone\n\u001b[?25hRequirement already satisfied: beautifulsoup4>=4.11.1 in /opt/conda/lib/python3.12/site-packages (from yfinance) (4.12.3)\nCollecting html5lib>=1.1 (from yfinance)\n  Downloading html5lib-1.1-py2.py3-none-any.whl.metadata (16 kB)\nRequirement already satisfied: soupsieve>1.2 in /opt/conda/lib/python3.12/site-packages (from beautifulsoup4>=4.11.1->yfinance) (2.5)\nRequirement already satisfied: six>=1.9 in /opt/conda/lib/python3.12/site-packages (from html5lib>=1.1->yfinance) (1.17.0)\nRequirement already satisfied: webencodings in /opt/conda/lib/python3.12/site-packages (from html5lib>=1.1->yfinance) (0.5.1)\nRequirement already satisfied: python-dateutil>=2.8.2 in /opt/conda/lib/python3.12/site-packages (from pandas>=1.3.0->yfinance) (2.9.0.post0)\nCollecting tzdata>=2022.7 (from pandas>=1.3.0->yfinance)\n  Downloading tzdata-2025.1-py2.py3-none-any.whl.metadata (1.4 kB)\nRequirement already satisfied: charset_normalizer<4,>=2 in /opt/conda/lib/python3.12/site-packages (from requests>=2.31->yfinance) (3.4.1)\nRequirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.12/site-packages (from requests>=2.31->yfinance) (3.10)\nRequirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/lib/python3.12/site-packages (from requests>=2.31->yfinance) (2.3.0)\nRequirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.12/site-packages (from requests>=2.31->yfinance) (2024.12.14)\nDownloading yfinance-0.2.52-py2.py3-none-any.whl (108 kB)\nDownloading html5lib-1.1-py2.py3-none-any.whl (112 kB)\nDownloading lxml-5.3.0-cp312-cp312-manylinux_2_28_x86_64.whl (4.9 MB)\n\u001b[2K   \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m4.9/4.9 MB\u001b[0m \u001b[31m129.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n\u001b[?25hDownloading multitasking-0.0.11-py3-none-any.whl (8.5 kB)\nDownloading numpy-2.2.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.1 MB)\n\u001b[2K   \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m16.1/16.1 MB\u001b[0m \u001b[31m124.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n\u001b[?25hDownloading pandas-2.2.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.7 MB)\n\u001b[2K   \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m12.7/12.7 MB\u001b[0m \u001b[31m125.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n\u001b[?25hDownloading tzdata-2025.1-py2.py3-none-any.whl (346 kB)\nBuilding wheels for collected packages: peewee\n  Building wheel for peewee (pyproject.toml) ... \u001b[?done\n\u001b[?25h  Created wheel for peewee: filename=peewee-3.17.8-cp312-cp312-linux_x86_64.whl size=303769 sha256=716adbdf3e79bb0243f78d62c3b41d2f69ae81cf037b9987904c97377bf57755\n  Stored in directory: /home/jupyterlab/.cache/pip/wheels/8f/65/34/456800445efeafb05164fe95285c70e81ba1d96bae30f43917\nSuccessfully built peewee\nInstalling collected packages: peewee, multitasking, tzdata, numpy, lxml, html5lib, pandas, yfinance\nSuccessfully installed html5lib-1.1 lxml-5.3.0 multitasking-0.0.11 numpy-2.2.2 pandas-2.2.3 peewee-3.17.8 tzdata-2025.1 yfinance-0.2.52\nCollecting bs4\n  Downloading bs4-0.0.2-py2.py3-none-any.whl.metadata (411 bytes)\nRequirement already satisfied: beautifulsoup4 in /opt/conda/lib/python3.12/site-packages (from bs4) (4.12.3)\nRequirement already satisfied: soupsieve>1.2 in /opt/conda/lib/python3.12/site-packages (from beautifulsoup4->bs4) (2.5)\nDownloading bs4-0.0.2-py2.py3-none-any.whl (1.2 kB)\nInstalling collected packages: bs4\nSuccessfully installed bs4-0.0.2\nRequirement already satisfied: nbformat in /opt/conda/lib/python3.12/site-packages (5.10.4)\nRequirement already satisfied: fastjsonschema>=2.15 in /opt/conda/lib/python3.12/site-packages (from nbformat) (2.21.1)\nRequirement already satisfied: jsonschema>=2.6 in /opt/conda/lib/python3.12/site-packages (from nbformat) (4.23.0)\nRequirement already satisfied: jupyter-core!=5.0.*,>=4.12 in /opt/conda/lib/python3.12/site-packages (from nbformat) (5.7.2)\nRequirement already satisfied: traitlets>=5.1 in /opt/conda/lib/python3.12/site-packages (from nbformat) (5.14.3)\nRequirement already satisfied: attrs>=22.2.0 in /opt/conda/lib/python3.12/site-packages (from jsonschema>=2.6->nbformat) (25.1.0)\nRequirement already satisfied: jsonschema-specifications>=2023.03.6 in /opt/conda/lib/python3.12/site-packages (from jsonschema>=2.6->nbformat) (2024.10.1)\nRequirement already satisfied: referencing>=0.28.4 in /opt/conda/lib/python3.12/site-packages (from jsonschema>=2.6->nbformat) (0.36.2)\nRequirement already satisfied: rpds-py>=0.7.1 in /opt/conda/lib/python3.12/site-packages (from jsonschema>=2.6->nbformat) (0.22.3)\nRequirement already satisfied: platformdirs>=2.5 in /opt/conda/lib/python3.12/site-packages (from jupyter-core!=5.0.*,>=4.12->nbformat) (4.3.6)\nRequirement already satisfied: typing-extensions>=4.4.0 in /opt/conda/lib/python3.12/site-packages (from referencing>=0.28.4->jsonschema>=2.6->nbformat) (4.12.2)\n"
                }
            ],
            "source": "!pip install yfinance\n!pip install bs4\n!pip install nbformat"
        },
        {
            "cell_type": "code",
            "execution_count": 2,
            "metadata": {
                "id": "2814ec9f-8d4a-44a0-b292-bfaa353fd779"
            },
            "outputs": [],
            "source": "import yfinance as yf\nimport pandas as pd\nimport requests\nfrom bs4 import BeautifulSoup\nimport plotly.graph_objects as go\nfrom plotly.subplots import make_subplots"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "In Python, you can ignore warnings using the warnings module. You can use the filterwarnings function to filter or ignore specific warning messages or categories.\n"
        },
        {
            "cell_type": "code",
            "execution_count": 3,
            "metadata": {
                "id": "7c4c608d-8817-41ae-a239-0c8999971e85"
            },
            "outputs": [],
            "source": "import warnings\n# Ignore all warnings\nwarnings.filterwarnings(\"ignore\", category=FutureWarning)"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "## Define Graphing Function\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "In this section, we define the function `make_graph`. **You don't have to know how the function works, you should only care about the inputs. It takes a dataframe with stock data (dataframe must contain Date and Close columns), a dataframe with revenue data (dataframe must contain Date and Revenue columns), and the name of the stock.**\n"
        },
        {
            "cell_type": "code",
            "execution_count": 4,
            "metadata": {
                "id": "549b0dc1-cdda-44e3-931f-4937749aa6f2"
            },
            "outputs": [],
            "source": "def make_graph(stock_data, revenue_data, stock):\n    fig = make_subplots(rows=2, cols=1, shared_xaxes=True, subplot_titles=(\"Historical Share Price\", \"Historical Revenue\"), vertical_spacing = .3)\n    stock_data_specific = stock_data[stock_data.Date <= '2021-06-14']\n    revenue_data_specific = revenue_data[revenue_data.Date <= '2021-04-30']\n    fig.add_trace(go.Scatter(x=pd.to_datetime(stock_data_specific.Date, infer_datetime_format=True), y=stock_data_specific.Close.astype(\"float\"), name=\"Share Price\"), row=1, col=1)\n    fig.add_trace(go.Scatter(x=pd.to_datetime(revenue_data_specific.Date, infer_datetime_format=True), y=revenue_data_specific.Revenue.astype(\"float\"), name=\"Revenue\"), row=2, col=1)\n    fig.update_xaxes(title_text=\"Date\", row=1, col=1)\n    fig.update_xaxes(title_text=\"Date\", row=2, col=1)\n    fig.update_yaxes(title_text=\"Price ($US)\", row=1, col=1)\n    fig.update_yaxes(title_text=\"Revenue ($US Millions)\", row=2, col=1)\n    fig.update_layout(showlegend=False,\n    height=900,\n    title=stock,\n    xaxis_rangeslider_visible=True)\n    fig.show()"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "Use the make_graph function that we\u2019ve already defined. You\u2019ll need to invoke it in questions 5 and 6 to display the graphs and create the dashboard. \n&gt; **Note: You don\u2019t need to redefine the function for plotting graphs anywhere else in this notebook; just use the existing function.**\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "## Question 1: Use yfinance to Extract Stock Data\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "Using the `Ticker` function enter the ticker symbol of the stock we want to extract data on to create a ticker object. The stock is Tesla and its ticker symbol is `TSLA`.\n"
        },
        {
            "cell_type": "code",
            "execution_count": 5,
            "metadata": {
                "id": "a5a83f58-fc26-4d8c-845a-d679a51e84e0"
            },
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": "yfinance.Ticker object <TSLA>\n"
                }
            ],
            "source": "tesla = yf.Ticker(\"TSLA\")\nprint(tesla)"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "Using the ticker object and the function `history` extract stock information and save it in a dataframe named `tesla_data`. Set the `period` parameter to ` \"max\" ` so we get information for the maximum amount of time.\n"
        },
        {
            "cell_type": "code",
            "execution_count": 6,
            "metadata": {
                "id": "3a3fe378-3552-4f89-9cff-a42f3473d9e7"
            },
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": "                                 Open        High         Low       Close  \\\nDate                                                                        \n2010-06-29 00:00:00-04:00    1.266667    1.666667    1.169333    1.592667   \n2010-06-30 00:00:00-04:00    1.719333    2.028000    1.553333    1.588667   \n2010-07-01 00:00:00-04:00    1.666667    1.728000    1.351333    1.464000   \n2010-07-02 00:00:00-04:00    1.533333    1.540000    1.247333    1.280000   \n2010-07-06 00:00:00-04:00    1.333333    1.333333    1.055333    1.074000   \n...                               ...         ...         ...         ...   \n2025-01-27 00:00:00-05:00  394.799988  406.690002  389.000000  397.149994   \n2025-01-28 00:00:00-05:00  396.910004  400.589996  386.500000  398.089996   \n2025-01-29 00:00:00-05:00  395.209991  398.589996  384.480011  389.100006   \n2025-01-30 00:00:00-05:00  410.779999  412.500000  384.410004  400.279999   \n2025-01-31 00:00:00-05:00  401.529999  419.989990  401.339996  404.600006   \n\n                              Volume  Dividends  Stock Splits  \nDate                                                           \n2010-06-29 00:00:00-04:00  281494500        0.0           0.0  \n2010-06-30 00:00:00-04:00  257806500        0.0           0.0  \n2010-07-01 00:00:00-04:00  123282000        0.0           0.0  \n2010-07-02 00:00:00-04:00   77097000        0.0           0.0  \n2010-07-06 00:00:00-04:00  103003500        0.0           0.0  \n...                              ...        ...           ...  \n2025-01-27 00:00:00-05:00   58125500        0.0           0.0  \n2025-01-28 00:00:00-05:00   48910700        0.0           0.0  \n2025-01-29 00:00:00-05:00   68033600        0.0           0.0  \n2025-01-30 00:00:00-05:00   98092900        0.0           0.0  \n2025-01-31 00:00:00-05:00   83283600        0.0           0.0  \n\n[3672 rows x 7 columns]\n"
                }
            ],
            "source": "# Extract historical stock data\ntesla_data = tesla.history(period=\"max\")\nprint(tesla_data)"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "**Reset the index** using the `reset_index(inplace=True)` function on the tesla_data DataFrame and display the first five rows of the `tesla_data` dataframe using the `head` function. Take a screenshot of the results and code from the beginning of Question 1 to the results below.\n"
        },
        {
            "cell_type": "code",
            "execution_count": 7,
            "metadata": {
                "id": "597d4a06-3465-4ac1-8da6-bc116eb8fc41"
            },
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": "                       Date      Open      High       Low     Close  \\\n0 2010-06-29 00:00:00-04:00  1.266667  1.666667  1.169333  1.592667   \n1 2010-06-30 00:00:00-04:00  1.719333  2.028000  1.553333  1.588667   \n2 2010-07-01 00:00:00-04:00  1.666667  1.728000  1.351333  1.464000   \n3 2010-07-02 00:00:00-04:00  1.533333  1.540000  1.247333  1.280000   \n4 2010-07-06 00:00:00-04:00  1.333333  1.333333  1.055333  1.074000   \n\n      Volume  Dividends  Stock Splits  \n0  281494500        0.0           0.0  \n1  257806500        0.0           0.0  \n2  123282000        0.0           0.0  \n3   77097000        0.0           0.0  \n4  103003500        0.0           0.0  \n"
                }
            ],
            "source": "# Reset index\ntesla_data.reset_index(inplace=True)\n\n# Display first five rows\nimport pandas as pd\nprint(tesla_data.head())  # This prints the first five rows of the DataFrame\n"
        },
        {
            "cell_type": "code",
            "execution_count": 8,
            "metadata": {
                "id": "ea5fdf46-9b93-4b75-ba81-33630684987f"
            },
            "outputs": [
                {
                    "data": {
                        "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>Date</th>\n      <th>Open</th>\n      <th>High</th>\n      <th>Low</th>\n      <th>Close</th>\n      <th>Volume</th>\n      <th>Dividends</th>\n      <th>Stock Splits</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>2010-06-29 00:00:00-04:00</td>\n      <td>1.266667</td>\n      <td>1.666667</td>\n      <td>1.169333</td>\n      <td>1.592667</td>\n      <td>281494500</td>\n      <td>0.0</td>\n      <td>0.0</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>2010-06-30 00:00:00-04:00</td>\n      <td>1.719333</td>\n      <td>2.028000</td>\n      <td>1.553333</td>\n      <td>1.588667</td>\n      <td>257806500</td>\n      <td>0.0</td>\n      <td>0.0</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>2010-07-01 00:00:00-04:00</td>\n      <td>1.666667</td>\n      <td>1.728000</td>\n      <td>1.351333</td>\n      <td>1.464000</td>\n      <td>123282000</td>\n      <td>0.0</td>\n      <td>0.0</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>2010-07-02 00:00:00-04:00</td>\n      <td>1.533333</td>\n      <td>1.540000</td>\n      <td>1.247333</td>\n      <td>1.280000</td>\n      <td>77097000</td>\n      <td>0.0</td>\n      <td>0.0</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>2010-07-06 00:00:00-04:00</td>\n      <td>1.333333</td>\n      <td>1.333333</td>\n      <td>1.055333</td>\n      <td>1.074000</td>\n      <td>103003500</td>\n      <td>0.0</td>\n      <td>0.0</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
                        "text/plain": "                       Date      Open      High       Low     Close  \\\n0 2010-06-29 00:00:00-04:00  1.266667  1.666667  1.169333  1.592667   \n1 2010-06-30 00:00:00-04:00  1.719333  2.028000  1.553333  1.588667   \n2 2010-07-01 00:00:00-04:00  1.666667  1.728000  1.351333  1.464000   \n3 2010-07-02 00:00:00-04:00  1.533333  1.540000  1.247333  1.280000   \n4 2010-07-06 00:00:00-04:00  1.333333  1.333333  1.055333  1.074000   \n\n      Volume  Dividends  Stock Splits  \n0  281494500        0.0           0.0  \n1  257806500        0.0           0.0  \n2  123282000        0.0           0.0  \n3   77097000        0.0           0.0  \n4  103003500        0.0           0.0  "
                    },
                    "metadata": {},
                    "output_type": "display_data"
                }
            ],
            "source": "from IPython.display import display\ndisplay(tesla_data.head())\n"
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "id": "f0e1c2ec-707a-4f81-a2ea-da39d0dba802"
            },
            "outputs": [],
            "source": ""
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "## Question 2: Use Webscraping to Extract Tesla Revenue Data\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "Use the `requests` library to download the webpage https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/revenue.htm Save the text of the response as a variable named `html_data`.\n"
        },
        {
            "cell_type": "code",
            "execution_count": 9,
            "metadata": {
                "id": "ce025a42-bf33-4a4d-9d8d-3a550025a0c6"
            },
            "outputs": [],
            "source": "# Re-import required libraries after execution state reset\nimport pandas as pd\nimport requests\nfrom bs4 import BeautifulSoup\n\n# Step 1: Download the webpage using requests\nurl = \"https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/revenue.htm\"\nhtml_data = requests.get(url).text\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "Parse the html data using `beautiful_soup` using parser i.e `html5lib` or `html.parser`.\n"
        },
        {
            "cell_type": "code",
            "execution_count": 10,
            "metadata": {
                "id": "21f804bd-b1c5-4031-b10d-2d0dc3d50f54"
            },
            "outputs": [],
            "source": "# Step 2: Parse the html data using BeautifulSoup\nsoup = BeautifulSoup(html_data, \"html.parser\")"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "Using `BeautifulSoup` or the `read_html` function extract the table with `Tesla Revenue` and store it into a dataframe named `tesla_revenue`. The dataframe should have columns `Date` and `Revenue`.\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "<details><summary>Step-by-step instructions</summary>\n\n```\n\nHere are the step-by-step instructions:\n\n1. Create an Empty DataFrame\n2. Find the Relevant Table\n3. Check for the Tesla Quarterly Revenue Table\n4. Iterate Through Rows in the Table Body\n5. Extract Data from Columns\n6. Append Data to the DataFrame\n\n```\n</details>\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "<details><summary>Click here if you need help locating the table</summary>\n\n```\n    \nBelow is the code to isolate the table, you will now need to loop through the rows and columns like in the previous lab\n    \nsoup.find_all(\"tbody\")[1]\n    \nIf you want to use the read_html function the table is located at index 1\n\nWe are focusing on quarterly revenue in the lab.\n```\n\n</details>\n"
        },
        {
            "cell_type": "code",
            "execution_count": 11,
            "metadata": {
                "id": "15263934-8f52-4bb1-8e1a-b81a0e88b623"
            },
            "outputs": [],
            "source": "# Step 3: Extract the Tesla Revenue Table\ntables = soup.find_all(\"table\")\ntesla_revenue = pd.read_html(str(tables[1]))[0]  # The required table is at index 1\n\n# Step 4: Rename columns to 'Date' and 'Revenue'\ntesla_revenue.columns = [\"Date\", \"Revenue\"]"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "Execute the following line to remove the comma and dollar sign from the `Revenue` column. \n"
        },
        {
            "cell_type": "code",
            "execution_count": 12,
            "metadata": {
                "id": "0eb97880-9d34-4a8c-a220-c034bb9b991a"
            },
            "outputs": [],
            "source": "# tesla_revenue[\"Revenue\"] = tesla_revenue['Revenue'].str.replace(',|\\$',\"\")\n\n# Step 5: Clean the Revenue column\ntesla_revenue[\"Revenue\"] = tesla_revenue[\"Revenue\"].str.replace(\",|\\$\", \"\", regex=True)\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "Execute the following lines to remove an null or empty strings in the Revenue column.\n"
        },
        {
            "cell_type": "code",
            "execution_count": 13,
            "metadata": {
                "id": "c13e1d8a-4e7d-460f-959f-7a04ba578b0a"
            },
            "outputs": [],
            "source": "# tesla_revenue.dropna(inplace=True)\n\n# tesla_revenue = tesla_revenue[tesla_revenue['Revenue'] != \"\"]\n# Step 6: Remove null or empty strings in the Revenue column\ntesla_revenue.dropna(inplace=True)\ntesla_revenue = tesla_revenue[tesla_revenue[\"Revenue\"] != \"\"]"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "Display the last 5 row of the `tesla_revenue` dataframe using the `tail` function. Take a screenshot of the results.\n"
        },
        {
            "cell_type": "code",
            "execution_count": 14,
            "metadata": {
                "id": "1e4aa44b-4dc1-4ee0-9a10-eb480fbba6c9"
            },
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": "          Date Revenue\n48  2010-09-30      31\n49  2010-06-30      28\n50  2010-03-31      21\n52  2009-09-30      46\n53  2009-06-30      27\n"
                }
            ],
            "source": "# Display the last 5 rows\nprint(tesla_revenue.tail())\n"
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "id": "4aa68120-a804-4f79-9af8-e38fc47ccfd6"
            },
            "outputs": [],
            "source": ""
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "## Question 3: Use yfinance to Extract Stock Data\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "Using the `Ticker` function enter the ticker symbol of the stock we want to extract data on to create a ticker object. The stock is GameStop and its ticker symbol is `GME`.\n"
        },
        {
            "cell_type": "code",
            "execution_count": 15,
            "metadata": {
                "id": "e3effc81-9c5c-4f54-89e1-9f36f721df31"
            },
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": "yfinance.Ticker object <GME>\n"
                }
            ],
            "source": "# Question 3: Use yfinance to Extract GameStop Stock Data\ngme = yf.Ticker(\"GME\")\nprint(gme)"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "Using the ticker object and the function `history` extract stock information and save it in a dataframe named `gme_data`. Set the `period` parameter to ` \"max\" ` so we get information for the maximum amount of time.\n"
        },
        {
            "cell_type": "code",
            "execution_count": 16,
            "metadata": {
                "id": "7396daa6-6a2d-484d-9dbb-8e3b9438d212"
            },
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": "                                Open       High        Low      Close  \\\nDate                                                                    \n2002-02-13 00:00:00-05:00   1.620129   1.693350   1.603296   1.691667   \n2002-02-14 00:00:00-05:00   1.712707   1.716073   1.670625   1.683250   \n2002-02-15 00:00:00-05:00   1.683251   1.687459   1.658002   1.674835   \n2002-02-19 00:00:00-05:00   1.666417   1.666417   1.578047   1.607504   \n2002-02-20 00:00:00-05:00   1.615920   1.662210   1.603296   1.662210   \n...                              ...        ...        ...        ...   \n2025-01-27 00:00:00-05:00  26.920000  27.680000  26.799999  26.969999   \n2025-01-28 00:00:00-05:00  27.000000  27.590000  26.650000  27.459999   \n2025-01-29 00:00:00-05:00  27.410000  27.740000  27.059999  27.510000   \n2025-01-30 00:00:00-05:00  27.840000  28.230000  27.709999  27.990000   \n2025-01-31 00:00:00-05:00  27.790001  28.180000  26.900000  26.900000   \n\n                             Volume  Dividends  Stock Splits  \nDate                                                          \n2002-02-13 00:00:00-05:00  76216000        0.0           0.0  \n2002-02-14 00:00:00-05:00  11021600        0.0           0.0  \n2002-02-15 00:00:00-05:00   8389600        0.0           0.0  \n2002-02-19 00:00:00-05:00   7410400        0.0           0.0  \n2002-02-20 00:00:00-05:00   6892800        0.0           0.0  \n...                             ...        ...           ...  \n2025-01-27 00:00:00-05:00   5060300        0.0           0.0  \n2025-01-28 00:00:00-05:00   3169900        0.0           0.0  \n2025-01-29 00:00:00-05:00   3220400        0.0           0.0  \n2025-01-30 00:00:00-05:00   3343900        0.0           0.0  \n2025-01-31 00:00:00-05:00   4472100        0.0           0.0  \n\n[5780 rows x 7 columns]\n"
                }
            ],
            "source": "# Extract historical stock data\ngme_data = gme.history(period=\"max\")\nprint(gme_data)"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "**Reset the index** using the `reset_index(inplace=True)` function on the gme_data DataFrame and display the first five rows of the `gme_data` dataframe using the `head` function. Take a screenshot of the results and code from the beginning of Question 3 to the results below.\n"
        },
        {
            "cell_type": "code",
            "execution_count": 17,
            "metadata": {
                "id": "b021442a-e4c8-4e2d-b798-8e8e799a2171"
            },
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": "                       Date      Open      High       Low     Close    Volume  \\\n0 2002-02-13 00:00:00-05:00  1.620129  1.693350  1.603296  1.691667  76216000   \n1 2002-02-14 00:00:00-05:00  1.712707  1.716073  1.670625  1.683250  11021600   \n2 2002-02-15 00:00:00-05:00  1.683251  1.687459  1.658002  1.674835   8389600   \n3 2002-02-19 00:00:00-05:00  1.666417  1.666417  1.578047  1.607504   7410400   \n4 2002-02-20 00:00:00-05:00  1.615920  1.662210  1.603296  1.662210   6892800   \n\n   Dividends  Stock Splits  \n0        0.0           0.0  \n1        0.0           0.0  \n2        0.0           0.0  \n3        0.0           0.0  \n4        0.0           0.0  \n"
                }
            ],
            "source": "# Reset index\ngme_data.reset_index(inplace=True)\n\n# Display first five rows of GameStop Stock Data\nprint(gme_data.head())"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "## Question 4: Use Webscraping to Extract GME Revenue Data\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "Use the `requests` library to download the webpage https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/stock.html. Save the text of the response as a variable named `html_data_2`.\n"
        },
        {
            "cell_type": "code",
            "execution_count": 18,
            "metadata": {
                "id": "2e745adb-09de-4966-a7d2-d42b947030b6"
            },
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": "<html>\n<head>\n\n</head>\n<body>\n<div style=\"margin: 50px auto; width: 50%; border: 1px solid #dfdfdf; padding: 20px 50px 30px 50px; font-family:helvetica;\">\n\n<h1>We do not allow automated access to our servers.</h1>\n<h2><p>Automated access to our data is prohibited by our data provider.</p>  \n<p>If you are a user attempting to access the site via a browser, please follow this process to regain access:</p>\n\n<ul>\n  <li>Go to <a href='https://whatismyipaddress.com/' target='_blank' rel='noopener noreferrer'>whatismyipaddress</a> and obtain your IPv4 address</li>\n  <li>Email us your IPv4 address at <a href=\"/cdn-cgi/l/email-protection\" class=\"__cf_email__\" data-cfemail=\"4821262e270825292b3a273c3a2d262c3b66262d3c\">[email&#160;protected]</a></li>\n  <li>We will add you to our whitelist within 24 hours</li>\n</ul>\n</h2>\n</div>\n<script data-cfasync=\"false\">!function(){\"use strict\";function e(e){try{if(\"undefined\"==typeof console)return;\"error\"in console?console.error(e):console.log(e)}catch(e){}}function t(e,t){var r=e.substr(t,2);return parseInt(r,16)}function r(r,n){for(var c=\"\",o=t(r,n),a=n+2;a<r.length;a+=2){var l=t(r,a)^o;c+=String.fromCharCode(l)}try{c=decodeURIComponent(escape(c))}catch(t){e(t)}return function(e){return i.innerHTML='<a href=\"'+e.replace(/\"/g,\"&quot;\")+'\"></a>',i.childNodes[0].getAttribute(\"href\")||\"\"}(c)}function n(t){try{(function(t){for(var n=t.querySelectorAll(\"a\"),o=0;o<n.length;o++)try{var a=n[o],i=a.href.indexOf(c);i>-1&&(a.href=\"mailto:\"+r(a.href,i+c.length))}catch(t){e(t)}})(t),function(t){for(var n=t.querySelectorAll(o),c=0;c<n.length;c++)try{var i=n[c],l=i.parentNode,u=i.getAttribute(a);if(u){var f=r(u,0),d=document.createTextNode(f);l.replaceChild(d,i)}}catch(t){e(t)}}(t),function(t){for(var r=t.querySelectorAll(\"template\"),c=0;c<r.length;c++)try{n(r[c].content)}catch(t){e(t)}}(t)}catch(t){e(t)}}var c=\"/cdn-cgi/l/email-protection#\",o=\".__cf_email__\",a=\"data-cfemail\",i=document.createElement(\"div\");n(document),function(){var e=document.currentScript||document.scripts[document.scripts.length-1];e.parentNode.removeChild(e)}()}();</script><script>(function(){function c(){var b=a.contentDocument||a.contentWindow.document;if(b){var d=b.createElement('script');d.innerHTML=\"window.__CF$cv$params={r:'90b748f90ffa5a39',t:'MTczODQ2ODkzOS4wMDAwMDA='};var a=document.createElement('script');a.nonce='';a.src='/cdn-cgi/challenge-platform/scripts/jsd/main.js';document.getElementsByTagName('head')[0].appendChild(a);\";b.getElementsByTagName('head')[0].appendChild(d)}}if(document.body){var a=document.createElement('iframe');a.height=1;a.width=1;a.style.position='absolute';a.style.top=0;a.style.left=0;a.style.border='none';a.style.visibility='hidden';document.body.appendChild(a);if('loading'!==document.readyState)c();else if(window.addEventListener)document.addEventListener('DOMContentLoaded',c);else{var e=document.onreadystatechange||function(){};document.onreadystatechange=function(b){e(b);'loading'!==document.readyState&&(document.onreadystatechange=e,c())}}}})();</script></body>\n</html>\n"
                }
            ],
            "source": "# Question 4: Use Webscraping to Extract GameStop Revenue Data\nurl_gme = \"https://www.macrotrends.net/stocks/charts/GME/gamestop/revenue\"\nhtml_data_2 = requests.get(url_gme).text\nprint(html_data_2)"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "Parse the html data using `beautiful_soup` using parser i.e `html5lib` or `html.parser`.\n"
        },
        {
            "cell_type": "code",
            "execution_count": 19,
            "metadata": {
                "id": "c9e6f0ec-bf7e-481a-ad13-32871f9bc196"
            },
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": "[]\n"
                }
            ],
            "source": "soup_gme = BeautifulSoup(html_data_2, \"html.parser\")\ntables_gme = soup_gme.find_all(\"table\")\nprint(tables_gme)"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "Using `BeautifulSoup` or the `read_html` function extract the table with `GameStop Revenue` and store it into a dataframe named `gme_revenue`. The dataframe should have columns `Date` and `Revenue`. Make sure the comma and dollar sign is removed from the `Revenue` column.\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "&gt; **Note: Use the method similar to what you did in question 2.**  \n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "<details><summary>Click here if you need help locating the table</summary>\n\n```\n    \nBelow is the code to isolate the table, you will now need to loop through the rows and columns like in the previous lab\n    \nsoup.find_all(\"tbody\")[1]\n    \nIf you want to use the read_html function the table is located at index 1\n\n\n```\n\n</details>\n"
        },
        {
            "cell_type": "code",
            "execution_count": 20,
            "metadata": {
                "id": "8f859585-849b-48a6-86eb-1d1b62d25b45"
            },
            "outputs": [],
            "source": "import yfinance as yf\n\ngme = yf.Ticker(\"GME\")\ngme_financials = gme.financials  # Get financial data including revenue\n\n# Extract only revenue\ngme_revenue = gme_financials.loc[\"Total Revenue\"]\ngme_revenue = gme_revenue.reset_index()\ngme_revenue.columns = [\"Date\", \"Revenue\"]\n\n# Convert date format\ngme_revenue[\"Date\"] = pd.to_datetime(gme_revenue[\"Date\"])\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "Display the last five rows of the `gme_revenue` dataframe using the `tail` function. Take a screenshot of the results.\n"
        },
        {
            "cell_type": "code",
            "execution_count": 22,
            "metadata": {
                "id": "3c0b6f50-b6a2-4ff2-837f-73a20f9d520d"
            },
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": "        Date       Revenue\n0 2024-01-31  5272800000.0\n1 2023-01-31  5927200000.0\n2 2022-01-31  6010700000.0\n3 2021-01-31  5089800000.0\n"
                }
            ],
            "source": "# Display last five rows\n# tools.display_dataframe_to_user(name=\"GameStop Revenue Data\", dataframe=gme_revenue.tail())\n\n# Display the last 5 rows\nprint(gme_revenue.tail())\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "## Question 5: Plot Tesla Stock Graph\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "Use the `make_graph` function to graph the Tesla Stock Data, also provide a title for the graph. Note the graph will only show data upto June 2021.\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "<details><summary>Hint</summary>\n\n```\n\nYou just need to invoke the make_graph function with the required parameter to print the graphs.The structure to call the `make_graph` function is `make_graph(tesla_data, tesla_revenue, 'Tesla')`.\n\n```\n    \n</details>\n"
        },
        {
            "cell_type": "code",
            "execution_count": 23,
            "metadata": {
                "id": "7fc1be83-8ca1-4346-bbc9-8d437984ded2"
            },
            "outputs": [],
            "source": "import yfinance as yf\nimport pandas as pd\nimport plotly.graph_objects as go\nfrom plotly.subplots import make_subplots"
        },
        {
            "cell_type": "code",
            "execution_count": 24,
            "metadata": {
                "id": "996d567e-c53a-4525-8b21-ed45105b8a7a"
            },
            "outputs": [],
            "source": "def make_graph(stock_data, revenue_data, stock_name):\n    fig = make_subplots(rows=2, cols=1, shared_xaxes=True, subplot_titles=(\"Historical Share Price\", \"Historical Revenue\"), vertical_spacing=0.3)\n\n    # Filter stock data up to June 2021\n    stock_data_filtered = stock_data[stock_data[\"Date\"] <= \"2021-06-14\"]\n    \n    # Filter revenue data up to April 2021\n    revenue_data_filtered = revenue_data[revenue_data[\"Date\"] <= \"2021-04-30\"]\n    \n    # Plot stock price\n    fig.add_trace(go.Scatter(\n        x=stock_data_filtered[\"Date\"],\n        y=stock_data_filtered[\"Close\"],\n        name=\"Share Price\"\n    ), row=1, col=1)\n\n    # Plot revenue\n    fig.add_trace(go.Scatter(\n        x=revenue_data_filtered[\"Date\"],\n        y=revenue_data_filtered[\"Revenue\"],\n        name=\"Revenue\"\n    ), row=2, col=1)\n\n    # Set axis labels\n    fig.update_xaxes(title_text=\"Date\", row=1, col=1)\n    fig.update_xaxes(title_text=\"Date\", row=2, col=1)\n    fig.update_yaxes(title_text=\"Price ($US)\", row=1, col=1)\n    fig.update_yaxes(title_text=\"Revenue ($US Millions)\", row=2, col=1)\n\n    # Set layout properties\n    fig.update_layout(\n        showlegend=False,\n        height=900,\n        title=stock_name,\n        xaxis_rangeslider_visible=True\n    )\n    \n    fig.show()"
        },
        {
            "cell_type": "code",
            "execution_count": 25,
            "metadata": {
                "id": "6d5c75d0-59fd-4568-b9d5-a400d32ac9eb"
            },
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": "                       Date      Open      High       Low     Close  \\\n0 2010-06-29 00:00:00-04:00  1.266667  1.666667  1.169333  1.592667   \n1 2010-06-30 00:00:00-04:00  1.719333  2.028000  1.553333  1.588667   \n2 2010-07-01 00:00:00-04:00  1.666667  1.728000  1.351333  1.464000   \n3 2010-07-02 00:00:00-04:00  1.533333  1.540000  1.247333  1.280000   \n4 2010-07-06 00:00:00-04:00  1.333333  1.333333  1.055333  1.074000   \n\n      Volume  Dividends  Stock Splits  \n0  281494500        0.0           0.0  \n1  257806500        0.0           0.0  \n2  123282000        0.0           0.0  \n3   77097000        0.0           0.0  \n4  103003500        0.0           0.0  \n"
                }
            ],
            "source": "# Fetch Tesla stock data using Yahoo Finance API\ntesla = yf.Ticker(\"TSLA\")\ntesla_data = tesla.history(period=\"max\")\n\n# Reset index for better visualization\ntesla_data.reset_index(inplace=True)\n\n# Display first five rows\nprint(tesla_data.head())\n# import ace_tools as tools\n# tools.display_dataframe_to_user(name=\"Tesla Stock Data\", dataframe=tesla_data.head())"
        },
        {
            "cell_type": "code",
            "execution_count": 26,
            "metadata": {
                "id": "c2a9dce0-cc81-497e-932f-8d1df60e20ca"
            },
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": "        Date        Revenue\n0 2024-12-31  97690000000.0\n1 2023-12-31  96773000000.0\n2 2022-12-31  81462000000.0\n3 2021-12-31  53823000000.0\n4 2020-12-31            NaN\n"
                }
            ],
            "source": "tesla_revenue = tesla.financials.loc[\"Total Revenue\"].reset_index()\ntesla_revenue.columns = [\"Date\", \"Revenue\"]\ntesla_revenue[\"Date\"] = pd.to_datetime(tesla_revenue[\"Date\"])\n\n# Display Tesla revenue data\nprint(tesla_revenue.tail())\n# tools.display_dataframe_to_user(name=\"Tesla Revenue Data\", dataframe=tesla_revenue.tail())\n"
        },
        {
            "cell_type": "code",
            "execution_count": 27,
            "metadata": {
                "id": "5c1af60a-3f1f-4731-a57e-1365d1506591"
            },
            "outputs": [
                {
                    "data": {
No results found